Tag Archives: Dendrogram

Hierarchical Clustring in python

\(\)

Hierarchical Clustering is a method of clustering which build a hierarchy of clusters. It could be Agglomerative or Divisive.

  1. Agglomerative: At the first step, every item is a cluster, then clusters based on their distances are merged and form bigger clusters till all data is in one cluster (Bottom Up). The complexity is \( O (n^2log(n) ) \).
  2. Divisive: At the beginning, all items are in one big cluster. Then iteratively we break this cluster into smaller clusters (Top Down). The complexity is  \( O (2^n) \).

To merge or divide the clusters we need to know the shortest distance between clusters. The common metrics for the distance between clusters are:

  • Single Link: smallest distance between points.
  • Complete Link: largest distance between points.
  • Average Link: average distance between points
  • Centroid: distance between centroids.

Depending on the definition of ‘shortest distance’ (single/ complete/ average/ centroid link   ) we have different hierarchical clustering method.

 

Hierarchical Algorithms:

  1. Single Link: at each iteration, two clusters that have the closest pair of elements will be merged into a bigger cluster.
  2. Average Link: distance between clusters is the average distance between all points in between clusters. Clusters with the minimum of these distances merge into a bigger cluster.
  3. Complete Link: distance between clusters is the distance between those two points that are farthest away from each other. Two clusters with the minimum of these distances merge into a bigger cluster.
  4. Minimum spanning tree (MST): In a connected graph without any cycle, a spanning tree is a subset tree in which all vertex are still connected. If edges have weight, MST is a span tree in which the edges have the minimum weight. MST may not be unique.

to visualize the outcome of the hierarchical clustering we often use “Dendrogram”.

The following graph represents the following matrix :

 

Minimum spanning tree of the graph.