Project Design Overview

Fig 1: Project Design Overview

Our project is divided into three major phases: Preparation, Model Building, and Quality Measurement as shown in Fig. 1. The purpose of these three phases are to categorize each machine learning process into high-level chunks. The major phases in machine learning are the data, model buidling, and the evaluation phase.

Preparation

Fig 2: Project Preparation Design

In machine learning, the data phase is the most crucial phase because machine learning models use the data to form predictions. If the data is unusable or preprocessed incorrectly, the results will be inaccurate leading to an invalid project. In this project, the Preparation Phase handles all the data related tasks. The high-level tasks are data exploration and data preprocessing .

Model Building

Fig 3: Project Model Building Design: Neural Network
Fig 4: Project Model Building Design: Support Vector Machine

The modeling phase in machine learning is when the machine learning algorithms are applied on the preprocessed data. In most cases, the algorithms are provided in various different API's and allows the users to import the packages, plug in the data, and use the algorithms with little effort. The modeling phase takes three types of sub-datasets from the data preprocessed stage: train, validation, and test sub-datasets. The models are trained using the the train dataset, validated with the validation dataset, and tested using the test dataset. The datasets used in the modeling phase are constructed using historical dataset, so the models can derived or recognize patterns within the datasets. In our project, the machine learning process is the same as the general stage. We train, valid, and test the models using the three types of preprocessed datasets.

Quality Measurement

Fig 5: Project Quality Measurement Design

The evaluation phase in machine learning is the stage when the prediction results from models, during the modeling phase, are compared to the actual dataset and evaluated for accuracy. This is the phase when we explore the usability of the models. There are a wide range of evaluation metrics that can be used that also depends on the type of problem being tackled (classification or regression). In our project, we will be using the classification metric, cross-validation and possibly regularization to validate and evaluation our models.