Tag Archives: machine learning

Finding optimal number of Clusters by using Cluster validation

\( \)
This module finds the optimal number of components (number of clusters) for a given dataset.
In order to find the optimal number of components for, first we used k-means algorithm
with a different number of clusters, starting from 1 to a fixed max number. Then we checked the cluster validity by deploying \( C-index \) algorithm and select the optimal number of
clusters with lowest \( C-index\). This index is defined as follows:

\begin{equation} \label{C-index}
C = \frac{S-S_{min}}{S_{max}-S_{min}}

where \( S \) is the sum of distances over all pairs of patterns from the same cluster. Let \( l \)
be the number of those pairs. Then \( S_{min} \) is the sum of the l smallest distances if all
pairs of patterns are considered (i.e. if the patterns can belong to different clusters).
Similarly,\(Smax \) is the sum of the\( l\) largest distance out of all pairs. Hence a small value
of \( C \) indicates a good clustering. In the following code, I have generated 4 clusters, but since two of them are very close, they packed into one and the optimal number of clusters is 3.

Continue reading

Car Detection Using Single Shot MultiBox Detector (SSD Convolutional Neural Network) in ROS Using Caffe

This work is similar to the previous work here, but this time I used Single Shot MultiBox Detector (SSD) for car detection. Installation is similar, clone the  SSD Caffe:

add the following lines to your Makefile.config

and build it:

used video_stream_opencv to stream your video:

download the trained model from here and put them in the model directory.

In my ssd.launch, I have changed my trained network into:

Now run the following to open rviz:

in the rviz, go to add a panel, and add integrated viewer>ImageViewrPlugin.

Now correct the topic in the added panel and you should see detected cars:

Car Detection Using Fast Region-based Convolutional Networks (R-CNN) in ROS with Caffe

To run this, you need to install Fast-RCNN and Autoware. Just in case you got error regarding hd5f when making Fast-RCNN, add the following lines to your Makefile.config

Now run the following command to start:

if you got an error like :

That means your graphics card is not ready or accessible, in my everytime I suspend my notebook I get that error and I need a restart :/

now you should publish your video stream on the topic “image_raw”, for that purpose I used video_stream_opencv. Here is my launch file:

Now run the following to open rviz:

in the rviz, go to add a panel, and add integrated viewer>ImageViewrPlugin.

Now correct the topic in the added panel and you should see detected cars:

Octomap explanierend

In this tutorial, I explain the concept, probabilistic sensor fusion model and the sensor model used in Octomap library.

related publication: OctoMap: An Efficient Probabilistic 3D Mapping Framework Based on Octrees

1)Octamap Volumetric Model

octree storing free (shaded white) and occupied (black) cells. Image is taken from Ref [1]

2)Probabilistic Sensor Fusion Model

3)Sensor Model for Laser Range Data

Image is taken from Ref [1].


Gaussian Mixture Regression

Gaussian Mixture Regression is basically Multivariate normal distribution with Conditional distribution. The more about the theory could be found at  [1], [2], [3], [4]. For this work, I have added the functionality of adding Gaussian Mixture Regression to this project on the GitHub by forking the main project, my forked project can be download at here Github

The main changes are:


References [1], [2], [3], [4]

Expectation Maximization algorithm to obtain Gaussian mixture models for ROS

I found a really good code at GitHub for fitting a Gaussian Mixture Model (GMM) with Expectation Maximization (EM) for ROS. There are so many parameters that you can change. Some of the most important ones are:

To find the optimal number of components, it uses Bayesian information criterion (BIC). There are other methods to find the optimal number of components: Minimum description length (MDL),  Akaike information criterion (AIC),  Minimum message length (MML).

Here is my code for generating a 2 Gaussian and sending them to this node:


and you need to put them in to send them to the node:


and the results are what we expect:

It also makes it possible to visualize the data in RVIZ, but first, you have to publish your tf data and set the frame name and topic names correctly in gmm_rviz_converter.h

and add a MarkerArray in RVIZ and set the topic “gmm_rviz_converter_output


References: [1], [2]


Multi scale face detector using HOG features and support vector machine

In this part, I trained an SVM over images of  “face” or “not face” (36 × 36 pixels), using HOG features. I used VLFeat library for both HOG and the SVM.

Example of face images:

Example of nonface images:

I divided the dataset into a training and a test set (80% and 20% respectively) and computed the HOG features for all of training and validation images. To improve the performance, images in training set could be flipped so we double the number of items in the training set.

Then I trained an SVM on the features from the training set. After finding weights and bias, I tested validation set and these are the results:

Classifier performance on train data:

accuracy: 0.990
true positive rate: 0.493
false positive rate: 0.003
true negative rate: 0.497
false negative rate: 0.007

Classifier performance on validation data:

accuracy: 0.987
true positive rate: 0.490
false positive rate: 0.003
true negative rate: 0.497
false negative rate: 0.010



Adding non-maximum suppression

“non-maximum suppression” means if two bounding boxes have overlap and the overlap is higher than a given threshold, we take the one has a more confident scorer.

The after and before result of this step can be seen in the following samples:

before non-maximum suppression.

After non-maximum suppression.

before non-maximum suppression.

After non-maximum suppression.

Multiscaling face detection

Our face detector has been trained on the face of size 36×36, so we might get a poor result if the size of person’s face in the image is very larger than 36×36, so we resize our images and then we apply the face detector.

And the results improved:


Reference for materials and scripts:

[1] http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/

[2] http://www.cc.gatech.edu/~hays/compvision/proj5/


2D pose estimation of human body using CNS and PCA

This work is the second part of my master thesis (part I). In this part, I developed an algorithm for 2D pose estimation of the human body. To do this, I created a software with QT that could generate 2D contours representing human body. Then I send these contours for evaluation to CNS(Contrast Normalized Sobel) [1] and finally picked the contours with the highest response.

A 2D contour is a set of points connected to each other. By changing the relative
position of these points we can generate contours that can describe a human, car,
tree or any arbitrary shape in 2D space. In this work first, we need a system that
can generate 2D contour of a human in the different pose. After this step, we need
a system for evaluating the generated contour on an image to see how close is
the contour to the pose of player in the image.

We considered 44 points on the human body to create a contour. These points are
pointing prominent part of the human body like the top of the head, ears, neck, shoulder,
elbow, wrist, knee, ankle and feet. There are several ways to connect these points
to each other i.e. straight lines, pronominal, different kind of curve fitting. In
order to make the contour smooth in a way, it could describe human body curves-
turns meanwhile keep it computationally inexpensive we used cubic splines for
connecting these point.

Now by changing the location of any these 44 points, the interpolated points would
also change and a new contour would be formed. But creating the contour by this method is pretty complicated and computationally expensive. We don’t know
how should we move these points to generate a contour of a human, furthermore
searching through 44 dimensions is computationally expensive.
To make an automatic way for generating human contours we created a training
set of several images of a player in different poses. Then we manually registered
these 44 key points on the body of the player in each individual image. Based on
the items in the training set, we can generate contours of human in the different pose
in detection phase.
We also tried to reduce the dimensions of our problem from 44 to some smaller
meaningful number. For that purpose, we used dimensionality reduction by PCA
(Principal component analysis).

We put the data regarding the x and y position of the point in these contours plus
the interpolated point as a raw entry in a matrix of data. In the next step, we tried to
find principal components of these data by finding the eigenvalue and eigenvector
of these data. After calculating the eigenvalues of these data in the matrix, we
sort them from largest value to smallest one and we pick those who contribute
90%. In the other words we sum up all the eigenvalues and then from sorted
eigenvalues in descending order (largest first) we contribute eigenvalues until the
sum of them is less than 90% of total eigenvalues.

To create a training set for the PCA (principal component analysis) we recorded
the activity of a player in different poses and we manually labeled these 44 key

By selecting 44 points on each image and generating 3 interpolated points for each
pair of points and considering the (x,y) coordinates of each point our data would
have 44 × 2 × 4 = 352 dimension. To calculate PCA we have used OpenCV.
By taking into account the 90%, we found that
The optimal number of dimensions is 4 and we reduce our problem into 4 DOF.


To visualize the generated contour with these 4 parameters a software has been
designed with four sliders for each principal component respectively. By chang-
ing the position of each slider, a new value would be set for respective principal
component and by doing a back projecting with PCA, a new contour would be
generated and displayed on the image.


In figure a principal component values are set to p Φ = [143, 0, 0, 0] and In figure b principal component values are set to p Φ = [−149, 0, 0, 0].

CNS response from optimal contour on image with clutter noise

In this experiment for each image, several contours have been generated by brute
forcing all the four principal components in PCA space and the CNS response for
them has been calculated. The contour with the highest CNS response has been
selected. It should be remembered that the variance of each component is equal
to the corresponding eigenvalue. In other words, the standard deviation of the
component is equal to the square root of the eigenvalue. To determine the range
of values for each principal component, first, we calculate the square root of each
eigenvalue. Then the range of the search space for each principal component is
set to two times of the standard deviation. The maximum and minimum value for
each principal component observed in the labeled contour also endorses this range
and falls within this interval range.

CNS response for optimal contour on images with clutter background.

CNS response from optimal contour on image with a clear background

CNS response from optimal contour on the image with the clear background.

The CNS images of the experiment with a clear background. Highlighted part on the image indicate the magnitude of gradient on the image (the more highlighted, the greater gradient magnitude)

estimated contour from brute force has been labeled rejected if at least one of the limbs of player (arms, hands, legs or feet) has been wrongly estimated otherwise it has been labeled as accepted. Overall 0.75% of estimated contours labeled as accepted contours.


All text and images in this article are taken from my master thesis, the full document can be downloaded here.

[1]Thomas Röfer Judith Müller, Udo Frese. Grab a mug – object detection and grasp motion planning with the nao robot. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2012), Osaka, Japan, 2012. URL http://www.informatik.uni-bremen.de/agebv2/downloads/published/muellerhumanoids12.pdf.

Human detection on mobile camera using HOG and tracking them using Kalman filter

This is the part I of the work that I did for my master thesis (part II). In this work first, I computed HOG (Histogram of oriented gradients) on my images and then sent the computed histogram to a linear SVM (support vector machine). The SVM was trained with human and non-human images. The output of the classifier was abounding box if there was any human in the image.

Feature extraction and object detection in HOG, Tiling the detection window in an overlapping grid of HOG descriptors and then using a SVM based window classifier gives the human detection chain. Image acquired from [1].

Overview of HOG, The detector window is tiled with a grid of overlapping blocks, Each block contains a grid of spatial cells. For each cell, the weighted vote of image gradients in orientation histogram is accumulated. These 31 are locally normalized and collected into one big feature vector. Images acquired from [2].

In the next, I used Kalman filter to track the detected human. To check the accuracy of my work, I created a ground truth based on the color tracker. You can read and download a similar one on my website here.


The bounding box shows the Kalman filter prediction while the letter 1 or 2 indicate the human detection by HOG and letter R and Y are locations of the player detected by the color tracker.

All text and images in this article are taken from my master thesis or respective publications, the full document can be downloaded here.

[1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886 –893 vol. 1, June 2005. doi: 10.1109/CVPR.2005.177.

[2] N. Dalal and B . Triggs. Histograms of oriented gradients for human detection., 2005.