Tag Archives: HOG

Multi scale face detector using HOG features and support vector machine

In this part, I trained an SVM over images of  “face” or “not face” (36 × 36 pixels), using HOG features. I used VLFeat library for both HOG and the SVM.

Example of face images:

Example of nonface images:

I divided the dataset into a training and a test set (80% and 20% respectively) and computed the HOG features for all of training and validation images. To improve the performance, images in training set could be flipped so we double the number of items in the training set.

Then I trained an SVM on the features from the training set. After finding weights and bias, I tested validation set and these are the results:

Classifier performance on train data:

accuracy: 0.990
true positive rate: 0.493
false positive rate: 0.003
true negative rate: 0.497
false negative rate: 0.007

Classifier performance on validation data:

accuracy: 0.987
true positive rate: 0.490
false positive rate: 0.003
true negative rate: 0.497
false negative rate: 0.010



Adding non-maximum suppression

“non-maximum suppression” means if two bounding boxes have overlap and the overlap is higher than a given threshold, we take the one has a more confident scorer.

The after and before result of this step can be seen in the following samples:

before non-maximum suppression.

After non-maximum suppression.

before non-maximum suppression.

After non-maximum suppression.

Multiscaling face detection

Our face detector has been trained on the face of size 36×36, so we might get a poor result if the size of person’s face in the image is very larger than 36×36, so we resize our images and then we apply the face detector.

And the results improved:


Reference for materials and scripts:

[1] http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/

[2] http://www.cc.gatech.edu/~hays/compvision/proj5/


Human detection on mobile camera using HOG and tracking them using Kalman filter

This is the part I of the work that I did for my master thesis (part II). In this work first, I computed HOG (Histogram of oriented gradients) on my images and then sent the computed histogram to a linear SVM (support vector machine). The SVM was trained with human and non-human images. The output of the classifier was abounding box if there was any human in the image.

Feature extraction and object detection in HOG, Tiling the detection window in an overlapping grid of HOG descriptors and then using a SVM based window classifier gives the human detection chain. Image acquired from [1].

Overview of HOG, The detector window is tiled with a grid of overlapping blocks, Each block contains a grid of spatial cells. For each cell, the weighted vote of image gradients in orientation histogram is accumulated. These 31 are locally normalized and collected into one big feature vector. Images acquired from [2].

In the next, I used Kalman filter to track the detected human. To check the accuracy of my work, I created a ground truth based on the color tracker. You can read and download a similar one on my website here.


The bounding box shows the Kalman filter prediction while the letter 1 or 2 indicate the human detection by HOG and letter R and Y are locations of the player detected by the color tracker.

All text and images in this article are taken from my master thesis or respective publications, the full document can be downloaded here.

[1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886 –893 vol. 1, June 2005. doi: 10.1109/CVPR.2005.177.

[2] N. Dalal and B . Triggs. Histograms of oriented gradients for human detection., 2005.