machine learning

Installing NVIDIA DIGIST Ubuntu 16.04

Prerequisite

Protobuf 3

caffe Install caffe as being explained in my other post here. DIGITS visit https://github.com/NVIDIA/DIGITS/ Dependencies

# Install repo packages

Building DIGITS

Open in the browser: http://localhost:5000/

Hierarchical Clustring in python

Hierarchical Clustering is a method of clustering which build a hierarchy of clusters. It could be Agglomerative or Divisive. Agglomerative: At the first step, every item is a cluster, then clusters based on their distances are merged and form bigger clusters till all data is in one cluster (Bottom Up). The complexity is \( O (n^2log(n) ) \). Divisive: At the beginning, …

Hierarchical Clustring in python Read More »

Naive Bayes Classifier Example with Python Code

In the below example I implemented a “Naive Bayes classifier” in python and in the following I used “sklearn” package to solve it again: and the output is:

Density-Based Spatial Clustering (DBSCAN) with Python Code

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. It starts with an arbitrary starting point that has not been visited. This point’s epsilon-neighborhood is retrieved, and if it …

Density-Based Spatial Clustering (DBSCAN) with Python Code Read More »

Kernel Density Estimation (KDE) for estimating probability distribution function

There are several approaches for estimating the probability distribution function of a given data: 1)Parametric 2)Semi-parametric 3)Non-parametric A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization. Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within …

Kernel Density Estimation (KDE) for estimating probability distribution function Read More »

Silhouette coefficient for finding optimal number of clusters

Silhouette coefficient is another method to determine the optimal number of clusters. Here I introduced c-index earlier. The silhouette coefficient of a data measures how well data are assigned to its own cluster and how far they are from other clusters. A silhouette close to 1 means the data points are in an appropriate cluster and a silhouette …

Silhouette coefficient for finding optimal number of clusters Read More »