Image recognition using Convolutional Neural Network

With huge dataset comes a huge responsibility! - Anonymous

Phase 1 - Data collection

A couple of weeks ago, I started collecting photos of some of my friends. Although skeptical at first, three of my close friends decided to opt in to give me their photos after I told that I was building a face recognition system. A quick search in Google Photos helped us to find photos easily. Thanks' google!!

Each photo consisted of only one person.

So with more than 100 examples for each classes, we had about 500 examples (500 mb) dataset. I made a folder for each class, and within each class were its examples. Also, data augmentation technique was used to increase the number of instances while training.

Phase 2 - Model Selection

For the training, I choose pretrained Google's Inception V3 model. However, I droped its last output layer, and added an output layer with 4 neurons.

Phase 3 - Training

Since, it was taking eons to train the model in my local machine, I decided to train it in GPU. However, I did not have an access to GPU at home or at school. So, I decided to train it in the cloud.

AWS EC2 instances are industry standard computing platform from Amazon. With its g4.xlarge Deep Learning (Ubuntu) AMI, I decided to give it a go. Although I have used AWS S3 buckets in the past to host my website, I never used other aws services. After toiling through hours to figure out proper configurations, I was finally able to put dataset to the remote machine, and train it.

Phase 4 - Evaluation

Training Accuracy: ~68%
Test Accuracy: ~67%

Although the prediction was too bad, there are improvements that could have been implemented to increase accuracy. Some of them are listed below:

  1. Increasing training examples (the most effective and costly part of any ML problem. Bummer!)
  2. Use regularization technique such as drop out.
  3. Current number of epoch is 10. For more accuracy, it may be increased to about 50 using Early Stopping technique.
  4. Current Data Augmentation technique only uses rotations, zooms, and flips. We can augment our data by changing hue and brightness.

Code: click here

If time permits I will implement these techniques too. However, for now, I got to read about RNN. So, see you guys!!

Show Comments