💾 Archived View for envs.net › ~nerosnm › blog › git-crash-course-part-1.gmi captured on 2022-04-28 at 18:05:57. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2020-10-31)

🚧 View Differences

-=-=-=-=-=-=-

 Git Crash Course, Part 1

I have few frustrations with the way my computer science degree is taught, but my biggest by far has been the conspicuous lack of practical skills such as Git.

I understand this is partly by design, because it's a computer science degree and not a software engineering degree, but only 1 or 2 months in, you start being assigned group projects that pretty much require version control. I've also heard this complaint about computer science degrees before, so with that in mind, here's a crash course.

 Goals & Assumptions

My goal with this crash course is to get you to the point where you can contribute to a pair or group project effectively. I'm not aiming for you to understand the inner workings of Git, but it's important that I explain enough for you to build a basic mental model.

At some point, I'll write a tutorial on this too and link to it here, but for now you'll have to fend for yourself (sorry).
I'm very open to emails that correct this paragraph if needed.

Email Me

Version Control

What

Version control has two main purposes:

Git is the version control software that's most commonly used by software developers, and with good reason; when Git came around, it was able to replace a lot of other tools that are now widely regarded as having been much harder to use and more error-prone.

Lots of other software that you might have used has version control features, to more or less of an extent (e.g. Google Docs and Microsoft Word). Google Docs is a particularly visible example; multiple people can edit the same document at the same time and all of their changes are kept, and it does have a version history.

Why

At this point, you might be thinking that another piece of software to do this is unnecessary, or an inconvenience. You might be thinking that you've been getting by just fine until now. In that case, here are just a few of the reasons why I think learning Git is worth the effort:

Personally, if I applied for a job and they didn't consider Git essential and require me to know how to use it, I would consider that a red flag.

Getting Started

With that out of the way, let's start by installing `git`. Once you've done that, make sure you configure your authorship information:

Installing Git

$ git config --global user.name "Your Name"
$ git config --global user.email youremail@example.com

You might also want to configure which editor Git should use. The default (if you haven't set up a default editor with `$EDITOR`) is probably `vim`, which could be frustrating if you don't know how to use it. For example, you could set the editor to Visual Studio Code:

$ git config --global core.editor "code --wait"
You can find more information about first-time setup and setting your text editor here:

First Time Setup

Setting an Editor

Repositories

Git works by designating a particular folder as a "repository", which means that it will track changes to any files in that folder or its children.

I'll put commands throughout this article that you should run to follow along. To start, create a folder, move inside, and designate it as a Git repository:

$ mkdir git-crash-course
$ cd git-crash-course
$ git init .
Initialized empty Git repository in /Users/soren/src/localhost/git-crash-course/.git/

`git` has created a folder called `.git/` inside `git-crash-course/`. This is where it will store information about the history of changes you've made to the contents of this repository.

Changes

In order to understand how Git tracks changes, we need to actually make some changes first. Create a file at the root of the repository with some text in it:

$ echo "Hello, world!" > foo.txt
$ ls -a
./       ../      .git/    foo.txt

When you make changes to a file in a repository and then run `git`, it compares the state of the files with their state at a specific point in the past. You can ask about what changes it sees:

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        foo.txt

nothing added to commit but untracked files present (use "git add" to track)

There's a lot of information here, but what we care about is the section starting with "Untracked files". Git is telling us that it sees a file (foo.txt) that hasn't yet had any changes recorded. Because it hasn't seen this file before, it's labelled as untracked. To start tracking it, we need to create a **commit**.

Commits

A commit is simply a bundle of changes, packaged up together and given a message to describe what those changes represent. This is the basic unit of organisation in Git's model of changes.

Let's create a commit for the change we made. The first thing to do is designate which changes to include in the commit:

$ git add foo.txt
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   foo.txt

Now, the changes we made to `foo.txt` have been designated as "to be committed", more commonly referred to as "staged". Only changes that are staged will be included when we create a commit.

To be precise, the change that's been staged is the creation of the file `foo.txt` with its current contents.

Now, let's create a commit to package up those changes and describe them:

$ git commit -m "Add foo.txt"
[master (root-commit) ffbbc5a] Add foo.txt
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt

We've used the `-m` flag to provide a message describing the commit. Convention dictates that commits are phrased in the _imperative_ tense. In other words, say "add foo.txt", not "added foo.txt" or "adds foo.txt".

This makes more sense if you think about the history of commits in a repository as describing the steps required to go from an empty folder to one containing all of the files in their current state. From that perspective, each commit is an _instruction_ to Git to perform one of those steps in the change history: so we tell Git _to add_ foo.txt, rather than telling it _that we added_ foo.txt.

If we run `git status` again, we see that now all the changes have been committed, we have what's called a "clean working tree": there aren't any changes in any files that haven't been committed.

$ git status
On branch master
nothing to commit, working tree clean

Let's make a couple more commits. First, add a second file:

$ echo "See foo.txt for a message!" > bar.txt
$ git add bar.txt
$ git commit -m "Add some instructions"
[master 18e31b4] Add some instructions
 1 file changed, 1 insertion(+)
 create mode 100644 bar.txt

Now, make some changes to an existing file, and notice the different output of `git status` after you make the change:

$ echo "Here's another line." >> foo.txt
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   foo.txt

no changes added to commit (use "git add" and/or "git commit -a")

You'll notice this time that the changes are marked as "not staged for commit", rather than having files listed as "untracked". That's because Git is comparing the current state of our files to the last commit, and since `foo.txt` already existed in the last commit, it recognises that all we're doing is changing some lines inside the file.

If you want to see exactly what those changes are, run `git diff`, and you'll see something like this:

diff --git a/foo.txt b/foo.txt
index af5626b..bdfc7b8 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1,2 @@
 Hello, world!
+Here's another line.

Now stage and commit this.

$ git add .
$ git commit -m "Update foo.txt"
[master ed1fa47] Update foo.txt
 1 file changed, 1 insertion(+)
You can pass `git add` a path instead of individual filenames to add all changes in the folder at that path. In this case, I've used `.`, which is the current directory, to just add _all_ changes in the repository.

Branches

Now that we have a few commits, we can look at the history of this repository:

$ git log
commit ed1fa470d5d6707161d88672cd6230f2faabac92 (HEAD -> master)
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:05:21 2020 +0100

    Update foo.txt

commit 18e31b4ced4f7f9ec055748b7a45f562000d2217
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:03:23 2020 +0100

    Add some instructions

commit ffbbc5a3d42476a706895c974a4144406410bbfa
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:02:18 2020 +0100

    Add foo.txt

Each commit is given a SHA-1 hash, computed from its contents, that uniquely identifies it.

Almost all the time, though, the first 7 characters are more than enough to identify a specific commit within the context of one repository, which is why both `git` and its users often just use that portion of the hash to refer to commits.

Unfortunately, this isn't a particularly helpful representation. Try something more like this:

+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+

The reason I've chosen to illustrate the commits this way is because Git represents a repository's commit history internally as a _linked list_: each commit is a node that points to its parent (the commit before it).

In fact, Git goes one step farther, with the concept of a **branch**. But once you know that the commit history is stored as a linked list, branches are much easier to understand: a branch is just a particular portion of that linked list, stored as a pointer to the most recent child node.

                                 master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+

Git calls the default branch `master`, and it's treated as the authoritative version of the codebase. But when new code is being written, it's important to be able to make changes and not have to worry about messing up the nice, clean version on `master`, so Git allows us to create other branches.

If we run the following, we create a new branch, pointing to exactly the same commit (called its head commit, in exactly the same sense as the head of a linked list):

$ git checkout -b new-feature
Switched to a new branch 'new-feature'
This is a little bit of a shortcut way of doing this; that command both creates and switches to the new branch, which is why the base command is `git checkout` (the command used for switching branches). If you wanted to do those steps separately, you could run `git branch new-feature` to create it, and then `git checkout new-feature` to switch to it.

Now our commit history looks like this:

                                 master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+
                                   ^     
                                   |     
                              new-feature

We can also add into the mix the concept of `HEAD`, which refers to whatever branch the repository is currently on. This is what determines the state of the files on disk; if you switch branches, Git actually changes the contents of the files to reflect that.

Because the command we just ran both created _and_ switched to the new branch, our repository is in this state:

                                 master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+
                                   ^
                                   |
                              new-feature <--- HEAD

Now, let's add some commits to this new branch! Create a new file or modify an existing one and commit it, as we did above.

$ echo "New feature 1" > new-feature.txt
$ git add .
$ git commit -m "Add a new feature"
[new-feature 1b8de27] Add a new feature
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature.txt

The repository will now look something more like this:

                                 master
                                   |
                                   v
+---------+    +---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+    +---------+    +---------+    +---------+
                                                  ^
                                                  |
                                             new-feature <--- HEAD

If you switch back to `master`, you'll notice that your changes from `new-feature` aren't there.

$ git checkout master
Switched to branch 'master'
$ cat new-feature.txt
cat: new-feature.txt: No such file or directory

Merging

Of course, at some point, we will want to integrate those changes back in! First, though, let's add some more changes on another branch. Switch to `master`, then create a new branch called `new-feature-2` and commit something to it.

$ git checkout master
Already on 'master'
$ git checkout -b new-feature-2
Switched to a new branch 'new-feature-2'
$ echo "New feature 2" > new-feature-2.txt
$ git add .
$ git commit -m "Add another new feature"
[new-feature-2 8d5f335] Add another new feature
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature-2.txt

The repository will now be in this state:

                                 master       new-feature
                                   |              |
                                   v              v
+---------+    +---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+    +---------+    +---------+    +---------+
                                   ^
                                   |         +---------+
                                   +---------| 8d5f335 |
                                             +---------+
                                                  ^
                                                  |
                                            new-feature-2 <--- HEAD

Now we're ready to integrate the changes from `new-feature` back into `master`. Switch to the branch we're going to merge _into_ (in this case `master`), and run `git merge new-feature` to merge `new-feature` into it.

$ git checkout master
Switched to branch 'master'
$ git merge new-feature
Upda