Face and eyes detection with Viola-Jones along with python code

Sanpreet Singh
10 min readMar 10, 2020
Face and eyes detection with Viola Jones
Face and eyes detection with Viola-Jones

In this tutorial, face, as well as eye detection with viola jones, will be implemented using python coding. Light will be thrown on haar features, integral image, viola jones algorithm AdaBoost training as well as cascading. OpenCV will be used to draw the rectangle on the face as well as the eyes. OpenCV as well as haar cascade eye and face XML files will be used to locate the coordinates of both face as well as eyes. There will be some discussion on how viola jones trained the algorithm. Limitations of the algorithm will be also discussed and other options will be also presented to overcome former limitations. In short, the t will be thrown on deep learning object detection models. I hope readers will gain some useful knowledge while reading this blog and their concepts on computer vision will get a supporting end.

Who discovered this algorithm?

This algorithm was developed by two people named Paul Viola and Michael Jones. Developed in 2001. Even though it was discovered 19 years ago and deep learning made huge progress in object detection, still it holds a special place in detecting faces, eyes, and other haar features. If one has to start learning about detecting objects, I recommend them to learn viola jones first before diving deep into deep learning powerful object detection models. This algorithm is composed of two parts — training as well as detection. It not only detects faces in the images but is applicable to videos also. There is a limitation of this algorithm as it works with only frontal faces which are overcome by deep learning models such as SSD, Yolo, faster RCNN. Please refer to this link to see these deep learning models. I hope readers have gained some crucial points from this part. Let us summarize it using the points below.

Key points from this heading

  1. Developed by Paul Viola and Michael Jones.
  2. Developed in 2001 and is popular now also.
  3. Works for both images as well as videos.
  4. Algorithm is composed of two parts — Training and Detection
  5. It is succeeded by deep learning powerful object detection models.

I also suggest readers read the below hot burning topics in Artificial Intelligence before proceeding further

Counting Number of parameters in feed-forward deep neural network | keras

Simple way to save and load models in PyTorch

Understanding NumPy arrays in simple way

Save and Restore Tensorflow Model

Haar features and Integral Image

Word haar came from Alfred Haar who is a Hungarian mathematician who developed the haar wavelet. Let us see haar features from the below diagram and how these are related to viola jones's algorithm.

haar-like features
haar-like features

These haar-like features are developed by viola and jones with respect to the face in an image. Let us understand these in a general sense and then we will apply them to an image.

General understanding of edge and line haar like features

Suppose one is looking at the floor in a room. If one looks at the middle of the floor and at the end of the floor, there will be a difference between colors. The floor is brighter everywhere and has darker corner sides. The same is true while one is looking at the table. Edges have different colors with respect to the rest of the table. The same holds true for the image also. Now let us continue this concept with respect to faces in the images.

Applying haar like features to face in the images

If one looks at the lips, there is a change of colors between upper and lower lips, and one can identify line features there. If one looks carefully at the eyebrow, there is a contrast between forehead and eyebrow and hence edge feature can be identified there. Similarly with visualization one can locate line features in the eyes and edge features in the nose.

How these features help viola jones in detecting different parts of the face

From the above discussion, I hope haar-like features are cleared to readers. Now the question is how these features help viola and jones in detecting different parts of the face. Let us start with the nose. One side of the nose will be brighter while the other side of the nose will be darker while one looks at the nose in an image. This implies an edge feature is present there. The left side and right side of the nose are represented by the pixel value while talking with respect to computer vision terms. Differences in the average of pixels on the left and right side will provide a threshold for the detection of the nose in the face and the whole image. This helps in the training phase where the nose is detected by an algorithm using a threshold. The same holds true for eyes, lips, jaws, and other facial features. We will see further that during training, viola and jones took both face and non-face images. Threshold helps to differentiate between face and non-face during training and hence lies the significance of haar-like features.

What is the relationship between an integral image with calculations involved in the threshold?

I hope readers are with me and understand each and everything with ease. If you are facing some difficulty in understanding the concepts, take a deep breath, and start reading again. You may also do some outing for some time and come back again and start reading fresh minded. If you look at the heading above this heading to identify any facial part first of all haar like feature is to be chosen and calculations on pixels are done to identify it as well as its position in the image. To decrease the number of computations involved in the calculation of threshold and haar-like features, an integral image is used. I am not going deep into its concept, this is because a lot is to be covered in this tutorial but if readers are interested in knowing about it, they may purchase a cheap price course on Deep Learning and Computer Vision A-ZTM: OpenCV, SSD, and GANs by Hadelin and Kirill. Great teachers and huge appreciation from my side also.

Training Viola-Jones Algorithm

The algorithm is trained on both face and non-face images. They fed 4960 manually labeled images along with 9544 non-facial images so that it can distinguish between the face and non-face. During training, images are scaled down to 24 * 24 while predictions features are scaled up. There is a reason why images are scaled to 24 * 24 during training. For 24*24 windows of the image, there are 162336 features. Hence scaling is a necessity. This means that possibility of such a rectangle is present in 24*24 image. Hence training will be very expensive. To overcome this, Adaboosting comes into play. It takes a couple of features out of 162336, multiples them with the weights. Weights decide how important features are facial features and come before other features. All these features are called weak features and when these are added these form strong features and this process is called an ensemble. In this way, training will not be expensive and done within a limited time. Let us see the below equation to understand this.

F(X) = a1f1(x) + a2f2(x) + a3f3(x) + …………….

f1(x) is weak classifier 1 and it comes before others because its weights are large compared to others.

F(X) is a strong classifier and this process is called an ensemble

Cascading

This is the process of speeding up the process. In the previous heading, we studied that strong classifiers are made using weak classifiers which are arranged according to their importance which implies that the most important features are placed ahead of others. In cascading if the most important feature is not present in the sub-window, then the sub-window is rejected. If that feature is present, then look for the second important feature. If it is present then go for the third one else reject the sub-window. This process is called cascading.

Implementation using python and OpenCV

I recommend readers to install a virtual environment for this session. All packages will be installed in the virtual environment. This will help to keep the system safe because if there would be any problem, it will happen within a virtual environment.

To create virtual environment in Ubuntu, first, install virtual environment using the below command

sudo apt-get install virtualenv

Creating the new virtual environment

virtualenv --python python3 viola_jones_env

Activate virtualenv using the below command

source viola_jones_env/bin/activate

Now install cv2 as the whole implementation is done with the help of cv2. To install cv2 please use the below command

pip install opencv-python

To install cv2 on windows, please follow the below link

Installing OpenCV in python 2.7 in windows with proper documentation

Let us dive into the coding part to detect face and eyes using viola jones algorithm

Note: Haar cascade XML files can be downloaded from this link. I have used XML files for face and eyes only. One can choose according to his/her requirement

Link for haarcascade_frontalface_default.xml is face.xml

Link for haarcascade_eye.xml is eye.xml

importing cv2 and haar xml files
importing cv2 and haar XML files

Explanation of above code snippet

Line 1: cv2 is imported
Line2: Loading the cascade for face using cv2 function CascadeClassifier
Line 3: Loading the cascade for eyes using cv2 function CascadeClassifier

detect function
detect function

Explanation of above code snippet

Line 4: Function is defined to detect face from the image or video. It has two arguments one is a grayscale image and another is a colored image (frame). Remember viola jones works on a grayscale image. Coordinates of the rectangle are calculated on the grayscale image and are drawn on the colored image. In this way, detection is done on the colored image. The same procedure is applicable to videos as in that case detection is done frame by frame. Video is composed of frames of images.

Line 5: If you remember we have made a variable face_cascade from line no 2. It uses the function detectMultiScale to detect the upper left corner of the rectangle (x, y) on the face as well as the width and height of the rectangle. detectMultiScale function has parameters such as scaling and neighbors.

Line 6 and 7: In the previous line one will get the coordinates of the rectangle. Now it is time to draw the rectangle with these coordinates on the colored image (frame). For this, we will make the use of cv2.rectangle where we can specify the color of the rectangle as well as the width of the bounding box. This is executed in a loop.

Line 8: After the rectangle is drawn on the image, the frame is returned by the function.

detecting face and eyes function
detecting face and eyes function

Explanation of above code snippet

Line 9: Till line 8, coordinates of the rectangle for the face is calculated. In this line such area is sliced from the grayscale image.

Line 10: Face area for the colored image is sliced.

Line 11: Using eye_cascade.detectMultiScale function coordinates of eyes is calculated.

Line 12 and 13: Loop is applied and a rectangle is drawn over the eye inside the face.

With these function is completed and now let us provide frame and grayscale image to this function to detect face and eyes.

creating object to read video
creating object to read video

Explanation of above code snippet

Line 14: VideoCapture object is created to read the video either from the camera (0) or you can also pass the video file. The argument of cv2.VideoCapture is either the device index or the name of a video file.

Reading video in a loop

Explanation of above code snippet

Line 15: While loop is applied which will run until break condition is not applied.

Line 16: From the video_capture object frames are read one by one. Remember a video is composed of frames of images.

Line 17: As I told above, viola jones algorithm is applied to grayscale images and then results are shown on colored images. Hence. frames or colored images from the webcam are converted to grayscale at this line. Now, detect function (Refer to line no 4) will get both frame, as well as grayscale image.

breaking while loop

Explanation of above code snippet

Line 18: Both grayscale image and frame is passed to the detect function one by one and detection is returned and showed using cv2.imshow function. Since we are looping in while condition hence to stop prediction if condition is applied which tells that if q key is pressed, video will get closed. If I don’t apply this condition, prediction in the webcam will never stop which we do not want.

Final Note

Take a deep breath. Congratulations you have read the whole article and give applause to yourself. To sum up, in this short article readers have learned about viola jones's algorithm and how it is trained. Discussion on strong classifier, weak classifier, ensemble method as well as cascading is also done. Then came the best part of this tutorial which is implementing viola jones face detection algorithm using python coding and cv2. Each line is explained. I hope this has made the day. I also urge the readers to make a strong hand in probability and statistics as whole machine learning and computer vision stands on probability which is the backbone of it. To make things more simpler, I have made a small video reflecting practical applications of random variable probability and statistics whose link I am giving below. Spare some time to watch this video.

Originally published at http://ersanpreet.wordpress.com on March 10, 2020.

--

--