machine learning

Hierarchical Clustring in python

Hierarchical Clustering is a method of clustering which build a hierarchy of clusters. It could be Agglomerative or Divisive. Agglomerative: At the first step, every item is a cluster, then clusters based on their distances are merged and form bigger clusters till all data is in one cluster (Bottom Up). The complexity is \( O (n^2log(n) ) \). Divisive: At the beginning, […]

Hierarchical Clustring in python Read More »

Naive Bayes Classifier Example with Python Code

In the below example I implemented a “Naive Bayes classifier” in python and in the following I used “sklearn” package to solve it again: and the output is:

Naive Bayes Classifier Example with Python Code Read More »

Density-Based Spatial Clustering (DBSCAN) with Python Code

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. It starts with an arbitrary starting point that has not been visited. This point’s epsilon-neighborhood is retrieved, and if it

Density-Based Spatial Clustering (DBSCAN) with Python Code Read More »

Kernel Density Estimation (KDE) for estimating probability distribution function

There are several approaches for estimating the probability distribution function of a given data: 1)Parametric 2)Semi-parametric 3)Non-parametric A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization. Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within

Kernel Density Estimation (KDE) for estimating probability distribution function Read More »

Silhouette coefficient for finding optimal number of clusters

Silhouette coefficient is another method to determine the optimal number of clusters. Here I introduced c-index earlier. The silhouette coefficient of a data measures how well data are assigned to its own cluster and how far they are from other clusters. A silhouette close to 1 means the data points are in an appropriate cluster and a silhouette

Silhouette coefficient for finding optimal number of clusters Read More »

Finding optimal number of Clusters by using Cluster validation

This module finds the optimal number of components (number of clusters) for a given dataset. In order to find the optimal number of components for, first we used k-means algorithm with a different number of clusters, starting from 1 to a fixed max number. Then we checked the cluster validity by deploying \( C-index \) algorithm and

Finding optimal number of Clusters by using Cluster validation Read More »