The seven emotions for this classification problem are happy, sad, surprised, scared, angry, disgusted, and neutral. Using ensemble methods, the machine learning model classified images of faces based on these expressions.
The training set consisted of 2,925 labeled and 98,058 unlabeled images grayscale images. Implementation of the machine learning model used AdaBoost in conjunction with Support Vector Machines (SVM) using a polynomial kernel, and then applied 3-fold cross validation. It also incorporated preprocessing techniques (homography and Gabor filters) in order to standardize the images so as to improve feature extraction as well as feature scaling to reduce the bias.
Homography was used to align the faces to a common frame of references. Gabor filters - helps to emphasizes edges - were applied in order to better capture key features. These preprocessing techniques were used, and then feature selection was conducted on the Gabor filter banks before inputed into the machine learning model.
SVMs are useful for pattern recognition and classification problems. Since SVMs are typically used for binary classifcation, we can combine several binary classifiers by using the 1-vs-1 approach (happy vs sad, happy vs surprised, happy vs neutral, etc) or 1-vs-all (happy or not happy, sad or not sad, etc). After running multiple iterations, 1-vs-1 was selected as it resulted in higher accuracy. 1-vs-all produced high accuracy rates (90%+) for each class, but the total accuracy was much lower (~ 71%). This is because the binary classifiers have a larger set of negative cases (eg. not happy), compared to positive cases (eg. happy). This can make the separation of the two classes with a hyperplane much easier. Although 1-vs-all is less computationally expensive (1-vs-1 causes n(n-1) decision functions, where n is the number of classes), the 5% increase in accuracy from 1-vs-1 SVM is a good trade-off. This method was combined with 3-fold cross validation to find an optimal setting of the parameters C and gamma, where C is the penalty parameter and gamma is the kernel. Finally, ensemble learning methods were applied to 1-vs-1 SVM with the use of bagging.
Feature scaling was applied by first taking the data and reshaping it from a (mxmxn)-matrix to a (dxn)-matrix, where d=mxm. After looking at the data, feature scaling seemed appropriate because the values were large compared to a more resonable range such as [-1, 1]. The scaling would help the kernel and the model perform better (less biased). One way to scale would be to subtract the minimum and then divide by the range. However, this resulted in errors (division by zero), so another solution around this was to scale everything down by 100. This seemed reasonable given the images were grayscale -- this meant that the largest possible value would be 255. The test data was scaled by the same factor as it is important to scale using the same factor as that applied to the training data.
The training data was separated randomly into two sets: 2/3 for the training set and 1/3 for the validation set. The kernel chosen was a 3-degree polynomial. Other settings for polynomial, linear, sigmoid function, and radial basis function did not perform well. A semi-manual selection for the best (C, gamma)-pair was found through 3-fold cross validation. The final model performed above the baseline at 75%. Even though the model incorporated feature scaling to enable a better condition on the kernel and less bias on the model, as well as 3-fold cross validation to find the best (C,gamma)-values, the model was overfitting to the training data. Incorporating early stopping during training to prevent overfitting and finding a (C, gamma)-pair automatically would improve results.