Tag Archives: python

How to develop GUI Application with PyQt (python Qt)

There are two main methods for developing GUI application with qt:
1) Adding all widgets in your code (your cpp or python code)
2) Creating qt UI files, adding widgets there and load everything into your application.

1)Adding all widgets in your code

Here is the snippet for adding all widgets and their slots in code:

2) Creating qt UI files, adding widgets there and load everything into your application

Now let’s do what we have done in the first method in a UI file and load it. First, create a text file and put the followings in it and save it as “mainwindow.ui”


Now call it in your python file like this:

The results should be the same as what you got in the first method.

Drawing graphs in Python with networkx

Hierarchical Clustring in python

\(\)

Hierarchical Clustering is a method of clustering which build a hierarchy of clusters. It could be Agglomerative or Divisive.

  1. Agglomerative: At the first step, every item is a cluster, then clusters based on their distances are merged and form bigger clusters till all data is in one cluster (Bottom Up). The complexity is \( O (n^2log(n) ) \).
  2. Divisive: At the beginning, all items are in one big cluster. Then iteratively we break this cluster into smaller clusters (Top Down). The complexity is  \( O (2^n) \).

To merge or divide the clusters we need to know the shortest distance between clusters. The common metrics for the distance between clusters are:

  • Single Link: smallest distance between points.
  • Complete Link: largest distance between points.
  • Average Link: average distance between points
  • Centroid: distance between centroids.

Depending on the definition of ‘shortest distance’ (single/ complete/ average/ centroid link   ) we have different hierarchical clustering method.

 

Hierarchical Algorithms:

  1. Single Link: at each iteration, two clusters that have the closest pair of elements will be merged into a bigger cluster.
  2. Average Link: distance between clusters is the average distance between all points in between clusters. Clusters with the minimum of these distances merge into a bigger cluster.
  3. Complete Link: distance between clusters is the distance between those two points that are farthest away from each other. Two clusters with the minimum of these distances merge into a bigger cluster.
  4. Minimum spanning tree (MST): In a connected graph without any cycle, a spanning tree is a subset tree in which all vertex are still connected. If edges have weight, MST is a span tree in which the edges have the minimum weight. MST may not be unique.

to visualize the outcome of the hierarchical clustering we often use “Dendrogram”.

The following graph represents the following matrix :

 

Minimum spanning tree of the graph.



 

Naive Bayes Classifier Example with Python Code

In the below example I implemented a “Naive Bayes classifier” in python and in the following I used “sklearn” package to solve it again:

and the output is:


Continue reading

Density-Based Spatial Clustering (DBSCAN) with Python Code

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes.

It starts with an arbitrary starting point that has not been visited.

This point’s epsilon-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise. DBSCAN requires two parameters: epsilon (eps) and the minimum number of points required to form a cluster (minPts). If a point is found to be part of a cluster, its epsilon-neighborhood is also part of that cluster.

I implemented the pseudo code from DBSCAN wiki page:


Continue reading

Kernel Density Estimation (KDE) for estimating probability distribution function

\(\)

There are several approaches for estimating the probability distribution function of a given data:

1)Parametric
2)Semi-parametric
3)Non-parametric

A parametric one is GMM via algorithm such as expectation maximization. Here is my other post for expectation maximization.

Example of Non-parametric is the histogram, where data are assigned to only one bin and depending on the number bins that fall within an interval the height of histogram will be determined.

Kernel Density Estimation (KDE) is an example of a non-parametric method for estimating the probability distribution function. It is very similar to histogram but we don’t assign each data to only to a bin. In KDE we use a kernel function which weights data point, depending on how far are they from the point \( x \).

\begin{equation}
\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^n k\bigg(\frac{ x-x_i  }{h}\bigg)
\end{equation}
where \( h \) is a bandwidth parameter and \( k \) is the kernel function. One choice for kernel function is the Gaussian (normal distribution)  but there are other kernel functions (uniform, triangular, biweight, triweight, Epanechnikov) that can be used as well. Choosing too small or too bog values for bandwidth might overfit or under fit our estimation. A rule of thumb for choosing bandwidth is Silverman rule.



 

Finding optimal number of Clusters by using Cluster validation

\( \)
This module finds the optimal number of components (number of clusters) for a given dataset.
In order to find the optimal number of components for, first we used k-means algorithm
with a different number of clusters, starting from 1 to a fixed max number. Then we checked the cluster validity by deploying \( C-index \) algorithm and select the optimal number of
clusters with lowest \( C-index\). This index is defined as follows:

\begin{equation} \label{C-index}
C = \frac{S-S_{min}}{S_{max}-S_{min}}
\end{equation}

where \( S \) is the sum of distances over all pairs of patterns from the same cluster. Let \( l \)
be the number of those pairs. Then \( S_{min} \) is the sum of the l smallest distances if all
pairs of patterns are considered (i.e. if the patterns can belong to different clusters).
Similarly,\(Smax \) is the sum of the\( l\) largest distance out of all pairs. Hence a small value
of \( C \) indicates a good clustering. In the following code, I have generated 4 clusters, but since two of them are very close, they packed into one and the optimal number of clusters is 3.

Continue reading

Sample based-optimisation-based planner with signed distance fields cost map

Rapidly-exploring random trees (RRT) and their variant are a very power solution for solving motion planning problem in robotics, but they suffer from finding an optimise solution and
the generated path is usually jerky with redundant movements. Sample based-optimisation-based planners benefit the robustness of RRT and the possibility of imposing a cost function. Here in this work, I set a cost function which is the combination of length and clearance of the trajectory. To start I define the environment as a black and white image (black obstacles, white collision free area). I obtain a  signed distance field from this image in which every pixel show the distance to the closest obstacle (closer darker).

image 1, environment representation, white free area, black obstacles image 2, cost map obtained from signed distance field

After this, I used RRTconnect to find a collision-free path between start and goal, as it can be seen the trajectory an unusual turns.

RRTconnect path solution

In the following, I defined a cost function which is the combination of clearance and length but has more weight on the clearance so the planner avoids the obstacle as mush it can while it tries to shorten the trajectory length.

Path with high clearance while shortening the length (lengthy but safe).

Finally, I changed the cost function such that put more weights on the length of the planner so the planner avoids the obstacle but minimize the length as much as possible trajectory length.

Shortest valid path (short but risky)

Solving of optimisation problem with evolutionary algorithm (LM-CMA-ES)

\( \)

Limited-Memory CMA-ES (LM-CMA-ES)  is a variation of CMA-ES which has been tailored for large-scale optimisation with limited memory. As a warm up here, I have used this algorithm to solve the min point of the following function:

$$ z= -( 4 \times  e^{- ( (x-4)^2 +(y-4)^2 ) }+ 2 \times e^{- ( (x-2)^2 +(y-2)^2 ) } )$$

The function has the following shape (all images are svg so you have unlimited zoom):

$$ z= -( 4 \times e^{- ( (x-4)^2 +(y-4)^2 ) }+ 2 \times e^{- ( (x-2)^2 +(y-2)^2 ) } )$$

$$ z= -( 4 \times e^{- ( (x-4)^2 +(y-4)^2 ) }+ 2 \times e^{- ( (x-2)^2 +(y-2)^2 ) } )$$

and the result was what  you exactly expect, here the green line show you the path that algorithm taken to reach the min point:

Path traversed with Limited-Memory CMA-ES (LM-CMA-ES) algorithm to reach the minimum of the function.

code is available at my GitHub.

References: 1,2.