Finding optimal number of Clusters by using Cluster validation

\( \)
This module finds the optimal number of components (number of clusters) for a given dataset.
In order to find the optimal number of components for, first we used k-means algorithm
with a different number of clusters, starting from 1 to a fixed max number. Then we checked the cluster validity by deploying \( C-index \) algorithm and select the optimal number of
clusters with lowest \( C-index\). This index is defined as follows:

\begin{equation} \label{C-index}
C = \frac{S-S_{min}}{S_{max}-S_{min}}

where \( S \) is the sum of distances over all pairs of patterns from the same cluster. Let \( l \)
be the number of those pairs. Then \( S_{min} \) is the sum of the l smallest distances if all
pairs of patterns are considered (i.e. if the patterns can belong to different clusters).
Similarly,\(Smax \) is the sum of the\( l\) largest distance out of all pairs. Hence a small value
of \( C \) indicates a good clustering. In the following code, I have generated 4 clusters, but since two of them are very close, they packed into one and the optimal number of clusters is 3.

Here there is also another method called “Silhouette coefficient” for finding the optimal number of components for clustering.

Leave a Reply

1 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
0 Comment authors
Silhouette coefficient for finding optimal number of clusters - Tutorials on Robotics for ROS developers Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

newest oldest most voted
Notify of

[…] coefficient is another method to determine the optimal number of clusters. Here I introduced Cluster validation earlier. The silhouette coefficient of a data measures how well […]