# Kernel Density Estimation (KDE) for estimating probability distribution function



There are several approaches for estimating the probability distribution function of a given data:

1)Parametric
2)Semi-parametric
3)Non-parametric

A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization.

Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within an interval the height of histogram will be determined.

Kernel Density Estimation (KDE) is an example of a non-parametric method for estimating the probability distribution function. It is very similar to histogram but we don’t assign each data to only to a bin. In KDE we use a kernel function which weights data point, depending on how far are they from the point $$x$$.

\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n k\bigg(\frac{ x-x_i  }{h}\bigg)

where $$h$$ is a bandwidth parameter and $$k$$ is the kernel function. One choice for kernel function is the Gaussian (normal distribution)  but there are other kernel functions (uniform, triangular, biweight, triweight, Epanechnikov) that can be used as well. Choosing too small or too bog values for bandwidth might overfit or under fit our estimation. A rule of thumb for choosing bandwidth is Silverman rule.