# Kernel Density Estimation (KDE) for estimating probability distribution function



There are several approaches for estimating the probability distribution function of a given data:

1)Parametric
2)Semi-parametric
3)Non-parametric

A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization.

Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within an interval the height of histogram will be determined.

Kernel Density Estimation (KDE) is an example of a non-parametric method for estimating the probability distribution function. It is very similar to histogram but we don't assign each data to only to a bin. In KDE we use a kernel function which weights data point, depending on how far are they from the point $$x$$.

\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n k\bigg(\frac{ x-x_i  }{h}\bigg)

where $$h$$ is a bandwidth parameter and $$k$$ is the kernel function. One choice for kernel function is the Gaussian (normal distribution)  but there are other kernel functions (uniform, triangular, biweight, triweight, Epanechnikov) that can be used as well. Choosing too small or too bog values for bandwidth might overfit or under fit our estimation. A rule of thumb for choosing bandwidth is Silverman rule.

# Expectation Maximization algorithm to obtain Gaussian mixture models for ROS

I found a really good code at GitHub for fitting a Gaussian Mixture Model (GMM) with Expectation Maximization (EM) for ROS. There are so many parameters that you can change. Some of the most important ones are:

To find the optimal number of components, it uses Bayesian information criterion (BIC). There are other methods to find the optimal number of components: Minimum description length (MDL),  Akaike information criterion (AIC),  Minimum message length (MML).

Here is my code for generating a 2 Gaussian and sending them to this node:

and you need to put them in to send them to the node:

and the results are what we expect:

It also makes it possible to visualize the data in RVIZ, but first, you have to publish your tf data and set the frame name and topic names correctly in gmm_rviz_converter.h

and add a MarkerArray in RVIZ and set the topic "gmm_rviz_converter_output"

References: [1], [2]