Silhouette coefficient for finding optimal number of clusters

Silhouette coefficient is another method to determine the optimal number of clusters. Here I introduced c-index earlier. The silhouette coefficient of a data measures how well data are assigned to its own cluster and how far they are from other clusters. A silhouette close to 1 means the data points are in an appropriate cluster and a silhouette coefficient close to −1 implies out data is in the wrong cluster. The following is python code for computing the coefficient and plotting number fo clusters vs Silhouette coefficient.

from sklearn import cluster
import numpy as numpy
import sklearn
import matplotlib.pyplot as plt


obs = numpy.concatenate( (numpy.random.randn(100, 2) , 20 + numpy.random.randn(300, 2) , -15+numpy.random.randn(200, 2)))
silhouette_score_values=list()

NumberOfClusters=range(2,30)

for i in NumberOfClusters:
    
    classifier=cluster.KMeans(i,init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True)
    classifier.fit(obs)
    labels= classifier.predict(obs)
    print "Number Of Clusters:"
    print i
    print "Silhouette score value"
    print sklearn.metrics.silhouette_score(obs,labels ,metric='euclidean', sample_size=None, random_state=None)
    silhouette_score_values.append(sklearn.metrics.silhouette_score(obs,labels ,metric='euclidean', sample_size=None, random_state=None))

plt.plot(NumberOfClusters, silhouette_score_values)
plt.title("Silhouette score values vs Numbers of Clusters ")
plt.show()

Optimal_NumberOf_Components=NumberOfClusters[silhouette_score_values.index(max(silhouette_score_values))]
print "Optimal number of components is:"
print Optimal_NumberOf_Components

from sklearn import cluster

import numpy as numpy

import sklearn

import matplotlib.pyplot as plt

obs = numpy.concatenate( (numpy.random.randn(100, 2) , 20 + numpy.random.randn(300, 2) , -15+numpy.random.randn(200, 2)))

silhouette_score_values=list()

NumberOfClusters=range(2,30)

for i in NumberOfClusters:

classifier=cluster.KMeans(i,init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True)

classifier.fit(obs)

labels= classifier.predict(obs)

print "Number Of Clusters:"

print i

print "Silhouette score value"

print sklearn.metrics.silhouette_score(obs,labels ,metric='euclidean', sample_size=None, random_state=None)

silhouette_score_values.append(sklearn.metrics.silhouette_score(obs,labels ,metric='euclidean', sample_size=None, random_state=None))

plt.plot(NumberOfClusters, silhouette_score_values)

plt.title("Silhouette score values vs Numbers of Clusters ")

plt.show()

Optimal_NumberOf_Components=NumberOfClusters[silhouette_score_values.index(max(silhouette_score_values))]

print "Optimal number of components is:"

print Optimal_NumberOf_Components

0 0 votes

Article Rating

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Finding optimal number of Clusters by using Cluster validation - Tutorials on Robotics for ROS developers

7 years ago

[…] Here there is also another method called “Silhouette coefficient” for finding the optimal number of components for clustering. […]

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.