Tag Archives: clustering

Density-Based Spatial Clustering (DBSCAN) with Python Code

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes.

It starts with an arbitrary starting point that has not been visited.

This point’s epsilon-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise. DBSCAN requires two parameters: epsilon (eps) and the minimum number of points required to form a cluster (minPts). If a point is found to be part of a cluster, its epsilon-neighborhood is also part of that cluster.

I implemented the pseudo code from DBSCAN wiki page:

Continue reading

Silhouette coefficient for finding optimal number of clusters

Silhouette coefficient is another method to determine the optimal number of clusters. Here I introduced c-index earlier. The silhouette coefficient of a data measures how well data are assigned to its own cluster and how far they are from other clusters. A silhouette close to 1 means the data points are in an appropriate cluster and a silhouette coefficient close to −1 implies out data is in the wrong cluster. The following is python code for computing the coefficient and plotting number fo clusters vs Silhouette coefficient.