Mastering git, Part 12, git rebase

Combining Git commits with squash

Imagine you have done lots of commits i.e. several commits for fixing a bug but you don’t need all of them and somehow you want to meld them and squash them into a single commit. You can use rebase. Git always squashes a newer commit into an older commit or “upward”. Let create some files:

Your tree would be something like this:

But now you would like to squash b-1, b2 and b-3 into one commit. So copy the hash ID of b-3 and execute rebase:

This will open your editor and the first few lines are:

Git will melt new commits into older one, and we want to squash b-3 and b-2 into b-1 so change it into:

Close all instance of your editor and save it. This will open another editor to write a new commit message.

Write your new commit message and save it. Your new tree should look like this:

Rebasing branches on to the base

When you create a feature branch and start working on that, your master might grow and also your branch. Let’s create a repository and add “a” and b:

Then lets create a feature branch for “c”

Now lets get back to master and “d” and “e”:

At a certain point you will decide to merge to the master branch your tree looks like this:

By doing this you will the following tree, which in a big project might become very complicated and difficult to track.

As you can see the branch feature-c is 2 commits behind of the master branch.  You can rebase your feature branch onto the master branch so you can make it like that your feature branch has just branched from your master.

By doing so, you finally end up with a “linear history”. You can also give tags to your commits so you make your tree more readable:

Rebase: rewriting Git history reword, delete, reorder  and Split

Refs 1,2, 3, 4

Mastering git, Part 11, View git history (commit log) of specific lines of code in a file

git blame

The git blame command is used to know who/which commit is responsible for the latest changes made to a file. The author/commit of each line can also been seen. git blame does not show the per-line modifications history in the chronological sense. It only shows who was the last person to have changed a line in a document up to the last commit in HEAD.

Line between 2 and 4:

2 lines after 2:

Make the output shorter:

Only display email of the auther:

get the blame for a specifi commit:

When the commit ID is 00000000000 it means I have changed that line locally.

git blame does not show the per-line modifications history in the chronological sense. It only shows who was the last person to have changed a line in a document up to the last commit in HEAD.
In order to see the full history/log of a document line, you would need to run a git blame path/to/file for each commit in your git log. Since Git 1.8.4, you can use git log has -L to view the evolution of a range of lines.

Refs: 1, 2

Mastering git, Part 10, Setting up your home Git server

In this tutorial I will show you how to set up your own Git server. Here I have used an Ubuntu 20.04 server on a virtual machine which is called homeserver and my clinent machine is called client. The convention for the bash scripts is user@host:~$

The Protocols

Git can use four distinct protocols to transfer data: Local, HTTP, SSH and Git

Local

The URLs will be in the following form:

or

HTTP Protocols

Git can communicate over HTTP using  Smart HTTP or Dumb HTTP.

Smart HTTP

Intelligently negotiate data transfer in a manner similar to how it does over SSH. Very similarly to the SSH or Git protocols but runs over standard HTTPS ports and can use various HTTP authentication mechanisms, meaning it’s often easier on the user than something like SSH, since you can use things like username/password authentication rather than having to set up SSH keys. It can be set up to both serve anonymously like the git:// protocol, and can also be pushed over with authentication and encryption like the SSH protocol.

Dumb HTTP

It is very simple and generally read-only. If a server does not respond with a Git HTTP smart service, the Git client will try to fall back to the simpler Dumb HTTP protocol. The Dumb protocol expects the bare Git repository to be served like normal files from the web server.To set up all you have to do is put a bare Git repository under your HTTP document root and set up a specific post-update hook.

The post-update hook that comes with Git by default runs the git update-server-info to make HTTP fetching and cloning work properly. This command is run when you push to this repository (over SSH for example). You can clone via:

SSH Protocol

To clone a Git repository over SSH, you can specify an ssh:// URL like this:

Or you can use the shorter scp-like syntax for the SSH protocol:

The main drawback is that you need SSH authentication even for read only repositories.

Git Protocol

It is a daemon (that comes packaged with Git) and  listens on a dedicated port (9418) that provides a service similar to the SSH protocol, but with absolutely no authentication. In order for a repository to be served over the Git protocol, you must create a git-daemon-export-ok file. Either the Git repository is available for everyone to clone, or it isn’t. You can enable push access but, given the lack of authentication, anyone on could push to your repository. Generally, you’ll pair it with SSH or HTTPS access for the few developers who have push (write) access and have everyone else use git:// for read-only access. This protocol requires xinetd or systemd configuration or the like and also requires firewall access to port 9418,

Now let’s set up our server:

1) First on the server side make sure that the essential packages are installed:

2) Create a group for Git users:

3) Create users and add them to the gituser group:

BTW If you want to:
List all groups:

List groups of a user:

List all users:

And if you want to complelty delete a user and remove the user home directory:

Hint: the user should do a login and logout to make the group assignment effective.

4) Create a git repository and tell it to share based on the group the user belongs to

BTW if you want to create a non-bare repository that you can push into that:

5) Give gituser group permission to access the repository directory

Hint:
All files in Linux belong to an owner and a group. chgrp command changes the group ownership of a file or directory. You can set the owner by using chown command, and the group by the chgrp command. -R means do it recursively. You can see the owner and permision by

This command will give the gituser group the permission of read and write.

chmod g+s sets the setgid mode bit on the current directory which will cause any new file or directory that placed in this directory inherit the group owner (gituser), but the current files and directories will not be effected. To aply changes on them we directly call the command on them by using find /home/git/repos1 -type d

6) Create and config the clients
Create two users on the client machine:

SSH key pairs can be used to authenticate a client to an SSH server. That’s why we used ssh-copy-id to add them into ~/.ssh/authorized_keys so we can log in without password later.

Now copy the public key to the server and append it to the authorized_keys file as above

Append the client’s public key for ceach client to the authorized keys on server

Now you should be able to clone the repository:

And you should be able to push:

and pull the changes with other user:

If you want to see the URL that your local Git repository was originally cloned from:

Refs: 1, 2, 3, 4, 5, 6

Metrics for Evaluating Machine Learning Models – Classification

Confusion Matrix

Let’s say we have a binary classifier cats and non- cats, we have 1100 test images, 1000 non cats, 100 cats. The output of the classifier is either Positive  which means “cat” or Negative which  means non-cat. The following is called confusion matrix:

How to interpret these term is as follows: Correctness of labeling, Predicted Class

True Positive:

Observation is positive, and is predicted to be positive.90 cats correctly labeled.

True Negative:

Observation is negative, and is predicted to be negative. 940 images labeled as non-cats, and they are non-cats.

False Positive:

Observation is negative, but is predicted positive. 60 non-cat images labeled cats, but they are cats.

False Negative:

Observation is positive, but is predicted negative.10 images labels as cat, but they are truly non-cats.

Accuracy:

Accuracy is defined as the number of correct predictions divided by the total number of predictions. Classification accuracy= (90+940)/(1000+100)= 1030/1100= 93.6%

Precision:

Precision is not always a good indicator for the performance of our classifier. If one class has more frequency in our set, and we predict it correctly while the classifier wrongly label the  smaller class, accuracy could be very high but the performance o the classifier is bad so:

Precision= True Positive/ (True Positive+ False Positive)

Precision cat=  90/(90+60) = 60%

Precision non-cat= 940/950= 98.9%

Recall

Recall is the ratio of the total number of correctly classified positive examples divide to the total number of positive examples, kind of optimistic classifier.

Recall= True Positive/ (True Positive+ False Negative)

Recall cat= 90/100= 90%
Recall non-cat= 940/1000= 94%

High recall, low precision:  This means that our classifier finds almost all positive examples in our test set but also recognizes a lot of negative examples as positive examples.

Low recall, high precision: This means our classifier is very certain about positive examples (if it has labeled as positive, with high confident it is positive) meanwhile our classifier has missed a lot of positive example, kind of conservative classifier.

F1-Score

Depending on application, you might be interested in a conservative or optimistic classifier. But sometimes you are not biased toward any of the classes in your set, so you need to combine precision and recall together. F1-score is the harmonic mean of precision and recall:

F1-score= 2*Precision*Recall/(Precision+Recall)

F1-score cat= 2*0.6*0.9/(0.6+0.9)= 72%

Sensitivity and Specificity

Sensitivity and specificity are two other popular metrics mostly used in medical and biology. Basically computing recall for both positive and negative classes.
Sensitivity= Recall= TP/(TP+FN)
Specificity= True Negative Rate= TN/(TN+FP)

Receiver Operating Characteristic (ROC) Curve

The output of a classifier is usually a probabilistic number, and we based on a cut off value decided to accept or reject the value. ROC curve is plotting TPR against FPR for various threshold values. ROC curve is a popular curve to look at overall model performance and pick a good cut-off threshold for the model.

False Positive Rate

High values means: False Positive > True Negative which means  our classifier labels many examples as Positive while they are Negative and this ratio is bigger than the examples that are actually Negative and correctly labeled as Negative.

Small value means True Negative > False Positive which means our classifier truly labels examples that are negative and the ratio is bigger than the examples that are Negative and classifier labels them as Positive.
False Positive Rate=1-Specificity=False Positive/ (False Positive + True Negative)

True Positive Rate

True Positive Rate = Sensitivity= Recall= True Positive /(True Positive +False Negative)

Area Under the Curve (AUC)

Sørensen–Dice Coefficient

Confident Interval

Refs: [1], [2], [3]