Kernel Density Estimation (KDE) for estimating probability distribution function

\(\)

There are several approaches for estimating the probability distribution function of a given data:

1)Parametric
2)Semi-parametric
3)Non-parametric

A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization.

Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within an interval the height of histogram will be determined.

Kernel Density Estimation (KDE) is an example of a non-parametric method for estimating the probability distribution function. It is very similar to histogram but we don’t assign each data to only to a bin. In KDE we use a kernel function which weights data point, depending on how far are they from the point \( x \).

\begin{equation}
\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n k\bigg(\frac{ x-x_i }{h}\bigg)
\end{equation}
where \( h \) is a bandwidth parameter and \( k \) is the kernel function. One choice for kernel function is the Gaussian (normal distribution) but there are other kernel functions (uniform, triangular, biweight, triweight, Epanechnikov) that can be used as well. Choosing too small or too bog values for bandwidth might overfit or under fit our estimation. A rule of thumb for choosing bandwidth is Silverman rule.

import numpy as np
from scipy import stats
import matplotlib.pylab as plt


n_basesample = 1000
np.random.seed(234234)
xn = np.random.randn(n_basesample)
gkde=stats.gaussian_kde(xn)

ind = np.linspace(-7,7,101)
kdepdf = gkde.evaluate(ind)

plt.figure()
# plot histgram of sample
plt.hist(xn, bins=20, normed=1)
# plot data generating density
plt.plot(ind, stats.norm.pdf(ind), color="r", label='Original Function')
# plot estimated density
plt.plot(ind, kdepdf, label='kde estimation', color="g")
plt.title('Kernel Density Estimation')
plt.legend()
plt.show()

import numpy as np

from scipy import stats

import matplotlib.pylab as plt

n_basesample = 1000

np.random.seed(234234)

xn = np.random.randn(n_basesample)

gkde=stats.gaussian_kde(xn)

ind = np.linspace(-7,7,101)

kdepdf = gkde.evaluate(ind)

plt.figure()

# plot histgram of sample

plt.hist(xn, bins=20, normed=1)

# plot data generating density

plt.plot(ind, stats.norm.pdf(ind), color="r", label='Original Function')

# plot estimated density

plt.plot(ind, kdepdf, label='kde estimation', color="g")

plt.title('Kernel Density Estimation')

plt.legend()

plt.show()

0 0 votes

Article Rating

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.