machine learning

Hierarchical Clustring in python

Hierarchical Clustering is a method of clustering which build a hierarchy of clusters. It could be Agglomerative or Divisive. Agglomerative: At the first step, every item is a cluster, then clusters based on their distances are merged and form bigger clusters till all data is in one cluster (Bottom Up). The complexity is \( O (n^2log(n) ) \). Divisive: At the beginning, …

Hierarchical Clustring in python Read More »

Naive Bayes Classifier Example with Python Code

In the below example I implemented a “Naive Bayes classifier” in python and in the following I used “sklearn” package to solve it again: and the output is:

Density-Based Spatial Clustering (DBSCAN) with Python Code

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. It starts with an arbitrary starting point that has not been visited. This point’s epsilon-neighborhood is retrieved, and if it …

Density-Based Spatial Clustering (DBSCAN) with Python Code Read More »

Kernel Density Estimation (KDE) for estimating probability distribution function

There are several approaches for estimating the probability distribution function of a given data: 1)Parametric 2)Semi-parametric 3)Non-parametric A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization. Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within …

Kernel Density Estimation (KDE) for estimating probability distribution function Read More »

Silhouette coefficient for finding optimal number of clusters

Silhouette coefficient is another method to determine the optimal number of clusters. Here I introduced c-index earlier. The silhouette coefficient of a data measures how well data are assigned to its own cluster and how far they are from other clusters. A silhouette close to 1 means the data points are in an appropriate cluster and a silhouette …

Silhouette coefficient for finding optimal number of clusters Read More »

Finding optimal number of Clusters by using Cluster validation

This module finds the optimal number of components (number of clusters) for a given dataset. In order to find the optimal number of components for, first we used k-means algorithm with a different number of clusters, starting from 1 to a fixed max number. Then we checked the cluster validity by deploying \( C-index \) algorithm and …

Finding optimal number of Clusters by using Cluster validation Read More »