Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
You may find that there is already tons of information in the internet about Git, so why another blog post? The target audience of most the resources are developers. This 101 should be a good starting point for system engineers. We don't need to understand all of the Git features but we should have a basic understanding of them. Most importantly, not be afraid to start using it and learn it better little by little.
Git is a version control system (VCS) but does some things differently from other versioning systems and has become the de facto standard VCS for software projects. It can handle very big projects like the Linux kernel and works perfectly fine for small projects as well. In Git, we call a project a repository and all the Git relevant data are stored in the .git
directory. In order to demonstrate, we will create an empty directory and try to see the status of Git.
Git repository can be used for any deployment project with text files or configuration as well as a network configuration backup system. During the following examples, think about what kind of data could be in the repository. A simple example would be a deployment repository with a docker compose file and setting files.
mkdir test1 && cd test1 && ls -al
git status
The command git status
is definitely worth to remember. Of course there are Git GUI available and many editors or IDE softwares that have an integration but using the commands helps a lot to understand the concept. And it will allow you to be much faster anyway.
The error message tells us that Git cannot find the .git
directory. To create a Git repository in our local machine, we can initiate a new one or clone one from a remote server. Let's see first how we can create a new one.
git init
ls -al .git
git status
Git created succesfully a new repository and now we can see the directory .git
. We should never need to make any changes on this directory by hand. When we check the Git status, we see that we have a clear repository. We are working on the master branch (branches are one of the big advantages of Git - we will take a closer look later) and no commit is present. Meanwhile, no files are ready to be tracked. As the output above suggests, we will add some content to our test project.
cat > README.md <<EOF
# Test Project 1
It is always good to have a README file as a entry point of reading in a project.
The file doesn't have to be huge and perfect. As the project develops this file should too.
## ToDo
- create some content
- seize world domination
- save the planet
EOF
cat README.md
git status
After creating a new file with some content, we can see Git indicate that we have an untracked file. This means we have the file in our working directory but not in our Git repository. Now we can add the file to the staging
index with the add command. Everything in the staging index is ready to be comitted.
git add README.md
git status
Now we are ready for the first commit. But what is a commit? A commit is a snapshot of the changed files. Working with snapshots and not deltas like other solutions gives some benefits. We can imagine a timeline with many commits and now we can easily go to any point of our timeline to see a specific snapshot at the chosen time. In Git we call the timeline a branch and as you can imagine, we can have multiple timelines. As we will see, we can create an new branch, have multiple commits and merge it together with another branch later. This is just one simple use case and as you get to know Git better, you will note this more and more.
A commit also has a commit message to describe the snapshot and it is highly recommended to have a commit message convention. For small commits, I prefer a short subject, starting with a verb which describes what the commit does, such as "Fix typo in README.md". For more complex commits you can also add more text. In addition, the commit includes the date and the author, plus their name and email address. We have to first set the name and email address before we can create a commit. With --global
we can set the settings for all the repositories of our user. More information about the Git configuration and the location are available here https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration
git config user.name "ubaumann" && git config user.email demo@infrastructureascode.ch
git commit -m 'Init my awesone project'
git status
Now we have a clean working directory again and one commit. We can check the result with the log
command to see the history of our branch. At this point everything is only local.
git log
Git is great for working in teams since the repositories are distributed. The normal Git binary can be used for setting up a server but in most cases you will use a special server or a cloud service. Products like GitLab, Bitbucket or GitHub are available as cloud services or can be installed on-premise. This are only a few products to name, and far away from a complete list. If a team in your company already uses a solution it will probably make sense to use the same one.
Sebastian created a good illustration which explains the Git principals of the working directory, the staging index and the location of the repository.
When working on our files we are in the working directory. As soon as we have made some progress and we want to commit it, we have to add the files with git add
to the staging index. When all the files are ready we can create the commit with git commit
. This will add the new commit to the branch (timeline) we are working on. All of this is only local and at the same time someone could have the possibility to add a commit and push it into the server. When you are not the only one working on a branch you should pull down potential commits to the local repository before creating the new commit.
With the command git pull
we can synchronize the remote repository with the local one. The command git push
synchronizes the local repository with the remote one. A repository can have multiple remote ones and can be added easily. If you have Read and Write permissions, this is defined by the remote server and the credentials you use. The default Remote is called origin
and is added automatically if you clone a repository.
git remote add origin git@github.com:ubaumann/test1.git
git remote
git push -u origin master
git status
On the server (in this case GitHub), the repository must exist and the user needs Write permission. Here we use SSH for the communication and the public SSH Key must be stored in your GitHub profile. For the first push we need to set the option -u
(short for --set-upstream
) for setting up the mapping to the branch in the remote repository. After the upstream branch is set, git status
shows if the branch is up to date with it. As a result, we can work, make commits and push them to the Remote and pull changes made on the server or from other contributors.
As mentioned above we could also create the repository first on the server and clone it to the local machine. This works the same way as it would be when cloning an existing project. We can use the same repository to show this.
cd .. && pwd
git clone git@github.com:ubaumann/test1.git test2
cd test2 && ls -al
git status
After cloning the repository, we get a clean instance in the specified folder test2
. If the folder is not specified, it would create the folder with the name of the repository test1
, but in this case this folder already exists. The remote origin
is already set and the local branch master is set to the upstream origin/master
, as the command git status
shows. This works really fast and many people choose to first create the repository on the server and clone it. Most servers display the two possible options with the commands when a new repository is created.
This section covers some more options on how to work with the working directory, such as adding files to the staging index, discarding changes and should subsequently help to get started working with Git.
sed -i 's/some content/some awesome content/' README.md
echo "local temp file" > tmp.txt
git status
The git status
shows one modified and one new file. Before adding files to the staging, we can see what we actually changed with git diff
. This command can also show the difference between commits and branches but here we will only be covering the basics.
git diff
The diff shows the part of the files with the changes. This is handy to check the changes before creating a commit. To see only the changes in one file, the path to the file can be specified like git diff README.md
.
When we are satisfied with the change we can add the file to the staging index and create a new commit. This works well with one or two files, but how can we handle it when we have many files?
mkdir files && touch files/{a..g}.txt && ls files
git status --untracked-files=all
After creating some files we can check if Git recognized them. The option --untracked-files=all
shows untracked files in subdirectory as well. Without this option, the output would only show the subdirectory name and we could use git status files/
to see more details. The git add
command supports fileglobs like files/*.txt
. So how could we add the files a.txt, b.txt and c.txt with one command to the staging?
git add files/[a-c].txt && git status
git pull
git commit -m "Add files/[a-c].txt"
git push
git status --untracked-files=all
Sooner or later we may want to delete a file in our working directory. Let's use the committed file files/a.txt
to show the difference between just deleting the file on the file system with rm
or by using the git command git rm
.
git rm files/a.txt
ls files
git status
With the command git rm files/a.txt
the file is erased on the filesystem and the change is already staged thus included in the next commit. As the status output indicates, we can unstage the change.
git reset HEAD files/a.txt
git status
ls files
The file is still erased on the file system but the change is not staged and will be not included in the commit. Because Git still has the file in the last commit, we can discard the change and bring our file back.
git checkout -- files/a.txt && ls files
When the files get deleted directly on the file system it will show up as a non staged change.
rm files/a.txt && git status
To stage this change, we have to add it to the staging area. The command add
and rm
would work here but since the file is already deleted, we will use add
in this example.
git add files/a.txt && git status
As seen above, the reset command can remove files from the staging index. This is handy when we accidentally add a file. Without specifying a file, we can remove all files from the staging. If a file is added and more changes have to be done, the file can simply be added again to include the change in the commit. There are still some essential things to know about git add
.
git reset HEAD
git add --dry-run .
The option --dry-run
allows to see what the add
command would add to the staging area. This allows to see how the different options work. In the example above, the root directory is specified (with the .
) and all changed and untracked files will be added.
git add --dry-run files
git add --dry-run --no-all .
To ignore the removed files, --no-all
or --ignore-removal
can be used. In all these examples, the file tmp.txt
is always included but some files like temporary files or local settings, we do not want to commit. For this we can use gitignore
and the simplest way is to create the .gitignore
file in the root directory of the repository and then add the patterns to match the files that are to be excluded.
echo "tmp.txt" >> .gitignore && git add --dry-run .
It is recommended to keep the git status
output always clean. It can save a lot of time. After adding changes to the staging index, a commit has to be created. The commit
command also has an option -a
(--all
) to automatically stage all modified or deleted files but will not add untracked files.
git commit -a -m "Update README and delete files/a.txt"
git show --oneline
Every commit needs a commit message and whenever we have not specified the option -m
(--message
), the editor opens and the message can then be entered. An empty message aborts the commit. The message option can be used multiple times and is handy when you want to add more text but don't want to use the editor.
git add .gitignore
git commit -m "Add .gitignore" -m "The file 'tmp.txt' should be excluded"
git log -1 --format=%B
Git and all the Git commands are really powerful and little by little you will discover more options. Above the format option is used only to show the raw body message. git <command> --help
is always a good start to look for help.
git push
Branching is nothing new and is not only available in Git, but because Git works with snapshots, it is a really lightweight operation. This really changes the way how to work with branches. There are many good resources available online about Git Flow and how to work with branches. It always depends on the project but for most projects it is good practice to have a master
branch with the stable release and a develop
branch with a working in development version of your project, from which we create for every new feature a separate branch. Do all the work, test it and merge it back in to the develop
.
For example, take a repository with a docker-compose file and all the necessary files. The master
branch would be the version running in production, while the develop
branch would be running in the testing environment and for every change a new branch wowuld have to be created from the develop
branch, tested and merged back to the develop
branch.
When the develop
branch is stable, a new version can be created and merged into the master
so it can be deployed in production.
Multiple new features can be developed at the same time from different engineers.
git branch
git branch develop
git branch
The command git branch develop
creates from the current position a new branch develop
, but the working directory is still on the branch master
. Furthermore, the remote does not know anything about the new branch.
git checkout develop
git push -u origin develop
git status
Switching between branches can be done with the command checkout
and with the push option -u
(--set-upstream
) we defined to upstream branch. Now the remote repository also has the branch develop
. Creating new branches and checking them out is an often needed operation and luckily there is a shorter way to do it. The command checkout
offers an option -b
to create the new branch directly. To demonstrate, let's create two feature branches and add a new file. It is important to start from the develop
branch.
git pull
git checkout -b feature1
echo "Feature 1" > f1.txt && git add f1.txt && git commit -m "Add f1"
git push -u origin feature1
git checkout develop
git checkout -b feature2
echo "Feature 2" > f2.txt && git add f2.txt && git commit -m "Add f2"
git push -u origin feature2
git checkout develop
git log --oneline
After creating two new branches, adding files and pushing it to the remote, the develop
branch is still on the same state. A branch can definitely have multiple commits but we keep it simple here. With git log
has many options and we can use it to see the new commits as well.
git log --all --oneline --graph
Many GUI tools can possibly display this more nicely and the option format can be used to tweak the output. The develop
and master
branch are still at point when we add the .gitignore
and the two new branches have each one more. With git merge
we can merge the new features into our branch.
git merge feature1
git status
git log --all --oneline --graph
Merging the first feature went smoothly and a closer looks shows Fast-forward
in the output of the merge command. Fast-forward means that the pointer could be moved forward. For feature2 fast-forward (ff) will not be available because moving the pointer would imply losing the feature1 commit.
git merge --ff-only feature2
git merge feature2 -m "Merge branch 'feature2' into develop"
git log --all --oneline --graph
git push
Because it was already clear that we needed a new commit, the option -m
can already be provided. Otherwise the editor would automatically open in order to provide a commit message.
git branch -d feature1
git branch -d feature2
git push
git push origin --delete feature1 feature2
Branches can be deleted to clean up the repository and that does not affect the commits. To delete branches on the remote, the command git push <remote> --delete <branch>
has to be used.
Many server solutions make the collaboration easier and have great features. Common features are Pull Request (PR) or also called Merge Request, review and approval functions, issue tracking with creating new branches directly. In your working team you should define a working strategy. It totally makes sense to use the four eyes principle for merging branches, and can increase the quality greatly besides the test.
Git tries to automatically merge changes, but in some cases Git will not be able to merge them, causing the merge to end up in a merge stage where the conflict has to be resolved by hand. This happens when, for example, two engineers change the same line.
git checkout -b todo_moon
echo "- walk the moon" >> README.md && cat README.md
git commit -am "Add walk the moon"
git checkout develop
In the new branch we added a new ToDo to the READMME.md
. To simulate some other changes before we merge the created branch we will also add a ToDo in the develop branch and commit it.
echo "- fully automated IT" >> README.md && cat README.md
git commit -am "Add fully automated IT"
git merge todo_moon
git status
As we can see the Auto-merging failed for the file README.md
because Git does not know how solve it automatically. The git status
shows that we are having unmerged paths and alse the files with the conflict. In the case that we would not want to deal with the conflict at that moment, the merge can be aborted with git merge --abort
. Otherwise the conflict has to be fixed manually and commited.
cat README.md
In the file <<<<<<<
, =======
and >>>>>>>
are showing the conflict and which part is coming from which. Git detected a change on the same line on both branches and does not know about the order of this command.
vim README.md -s <(printf 'G dd k dd 2k dd :m+\n :wq!\n') > /dev/null 2>&1 && cat README.md
The above command starts Vim and uses normal mode commands. In short it goes to the last line, deletes (cut) the line, moves one up, then deletes the line, moves two lines up and finally delete this line as well. The active line will now be moved one line down and the file will be saved and closed. You can use your normal editor to fix the merge conflict.
git add README.md && git status
git commit -m "Fix merge conflict"
git status
git log -4 --all --oneline --graph
After fixing the conflict, all changed files need to be added to the staging index and a commit is created. Now the merge is successfully done. In bigger pull requests it makes sense to merge the develop
branch first to the feature branch, so to have the merge conflict already solved and tested. Moreover, making small features reduces the complexity.
Sometimes we need to undo a change. There are multiple ways of doing this and in this post we will take a look at git revert
and git reset
. We used the reset command already but it can do much more.
Some commits ago we changed the ToDo "create some content" to "create some awesome content". With git revert
we can get this change undone.
First we have to find the commit. We could indeed use the log to find the commit message, but let's explore another approach.
git blame README.md
With the blame command we can see who changed a line in a file and with which commit. Also, the commit id is shown directly. The term Blame may sound bad but it is actually a good thing plus handy. Although, it would be better to call it praise.
git show 911acd38 --oneline
After locating the commit, we will want to undo this. In this simple example it is easy to fix it by hand and create a new command but git has a neater solution. With git revert
we can automatically undo this change.
git revert 911acd38 --no-edit
cat README.md
A new commit is created with the reverted change. The word awesome is deleted and the file files/a.txt
is back. With the option --no-edit
Git does not ask for the commit message. Because the old commit is still in the time line, the history is still correct.
We used already reset
to clean the staging index. Let's take a closer look at what this command does. The command is powerful so always be very aware of what are you doing, because executing this command can result in data loss.
The reset command updates the branch head to the specified commit. This means we can reset a branch to a commit and detach the newer commits. The operation has different modes and the three main ones are --soft
, --mixed
and --hard
.
Soft mode changes the pointer in the local repository, mixed mode does the prior while also cleaning the staging index and the hard mode does both plus reseting the working directory.
echo "Extend feature 1" >> f1.txt && echo "Extend feature 2" >> f2.txt
git add f1.txt && git commit -m "Extend feature 1"
git log -2 --oneline
git add f2.txt && git status
After extending both features and committing the change to feature1, we can reset the branch head to the second last commit. Because we do not need to clean the staging index, only the mode --soft
is needed.
git reset --soft HEAD~
git status
git log -2 --oneline
As the two commands above show, the last commit is "undone" and thus a new one can be created. To edit the last commit, the command git commit
has an option --amend
exacly for this use case. It can be handy when, for example, a file was forgotten to add.
HEAD~
is one commit older as the HEAD
and HEAD~2
would be two commits older. To be sure that also the commit id can be used.
git commit -m "Extend feature 1 and 2 (second commit)"
git reset --mixed HEAD~ && git status
After resetting with --mixed
, the working directory is still the same but the files are not staged anymore. This is the default mode and it is equivalent to git reset HEAD~
.
git add f1.txt && git commit -m "Extend feature 1 (third time)"
cat f2.txt
git reset --hard HEAD~ && git status
cat f2.txt
The mode --hard
clears the staging index and resets all tracked files. This command can result in data loss!
Resetting the pointer of a branch should only be done when the commits are not pushed to a remote or if no one else is working on this branch.
Git is a huge topic so hopefully you were able to get a good basic understanding, and as mentioned several times, there are a ton of resources available. The following list contains recommendations for further reading about Git.