Face Recognition with HaarCascade

HaarCascade

 

 

Haar Cascade is a machine learning object detection algorithm used to identify objects in an image or video and based on the concept of ​​ features.

 

It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

 

The algorithm has four stages:7

1.     Haar Feature Selection

2.     Creating  Integral Images

3.     Adaboost Training

4.     Cascading Classifiers

It is well known for being able to detect faces and body parts in an image, but can be trained to identify almost any object.

 

1.   Haar Feature SelectionFirst step is to collect the Haar Features.  A Haar feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums.   


2.    Creating Internal ImagesInternal images are created to make the model training super fast. But most of the created images are irrelevant. So we need something to choose the right features. 
                                   
           
3.   Adaboost TrainingAdaboost selects the best features from a image. This algorithm constructs a “Strong” classifier as linear combinations of weighted simple “weak” classifires.
 

4.   Cascade ClassifierThe Cascade Classifier consists of different stages, where each stage is an ensemble of weak learners. The weak learners are simple classifiers called decision stumps. Each stage is trained using a technique called boosting. Boosting provides the ability to train a highly accurate classifier by taking a weighted average of the decisions made by the weak learners.

 

The stages are designed to reject negative samples as fast as possible. The assumption is that the vast majority of windows do not contain the object of interest. Conversely, true positives are rare and worth taking the time to verify.

·        true positive occurs when a positive sample is correctly classified.

·        false positive occurs when a negative sample is mistakenly classified as positive.

·        false negative occurs when a positive sample is mistakenly classified as negative.


 

 

VGG16

 

 

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.

VGG16

 VGG16

 

The Architecture Of VGG

 

The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv.  layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.

Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.

All hidden layers are equipped with the rectification (ReLU) non-linearity. It is also noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.


VGG16
Object Detection

 

FACE RECOGNITION WITH VGG


Tools Used – Jupyter NoteBook, Imagenet Dataset, HaarCascade Frontal face, TEnsorFlow, Keras, OpenCV



·                   COLLECTION OF DATAWe Run a loop to Click 100 pictures through camera and collects the dataset for us.HaarCascade Model is used to Detect the Face.

 

·               LOADING VGG16 MODEL – VGG16 model is imported from library keras.applications. We can now use the model in our program.


 

·               FREEZING PRETRAINED LAYERS- We freeze the pretrained layers of VGG so that they keeps their same weights for our program. The layers can be freezed by using for loop.


 

·               ADDING DENSE LAYERS – We add 3 Dense layers to our model. The activation function used is “RELU and SOFTMAX”. RELU activation function for hidden layers and SOFTMAX activation function for output layer.


 

·               MODEL AND LAYERS – We import SEQUENTIAL model and different layers for our final model.

 


·               LOADING THE DATASET – We makes 2 partitions of the data – “TRAIN , VALIDATION(TEST)”. Then load the TRAIN data for model training.

 

         TRAINING AND TESTING THE MODEL – Model is trained using the TRAIN dataset and after training we can test the model. Here the model accuracy is found to be more than 90%. 

MODEL SUCCESS.




Comments