💾 Archived View for envs.net › ~nerosnm › blog › git-crash-course-part-1.gmi captured on 2021-12-06 at 14:29:53. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2020-10-31)
-=-=-=-=-=-=-
I have few frustrations with the way my computer science degree is taught, but my biggest by far has been the conspicuous lack of practical skills such as Git.
I understand this is partly by design, because it's a computer science degree and not a software engineering degree, but only 1 or 2 months in, you start being assigned group projects that pretty much require version control. I've also heard this complaint about computer science degrees before, so with that in mind, here's a crash course.
My goal with this crash course is to get you to the point where you can contribute to a pair or group project effectively. I'm not aiming for you to understand the inner workings of Git, but it's important that I explain enough for you to build a basic mental model.
At some point, I'll write a tutorial on this too and link to it here, but for now you'll have to fend for yourself (sorry).
I'm very open to emails that correct this paragraph if needed.
Version control has two main purposes:
Git is the version control software that's most commonly used by software developers, and with good reason; when Git came around, it was able to replace a lot of other tools that are now widely regarded as having been much harder to use and more error-prone.
Lots of other software that you might have used has version control features, to more or less of an extent (e.g. Google Docs and Microsoft Word). Google Docs is a particularly visible example; multiple people can edit the same document at the same time and all of their changes are kept, and it does have a version history.
At this point, you might be thinking that another piece of software to do this is unnecessary, or an inconvenience. You might be thinking that you've been getting by just fine until now. In that case, here are just a few of the reasons why I think learning Git is worth the effort:
Personally, if I applied for a job and they didn't consider Git essential and require me to know how to use it, I would consider that a red flag.
With that out of the way, let's start by installing `git`. Once you've done that, make sure you configure your authorship information:
$ git config --global user.name "Your Name" $ git config --global user.email youremail@example.com
You might also want to configure which editor Git should use. The default (if you haven't set up a default editor with `$EDITOR`) is probably `vim`, which could be frustrating if you don't know how to use it. For example, you could set the editor to Visual Studio Code:
$ git config --global core.editor "code --wait"
You can find more information about first-time setup and setting your text editor here:
Git works by designating a particular folder as a "repository", which means that it will track changes to any files in that folder or its children.
I'll put commands throughout this article that you should run to follow along. To start, create a folder, move inside, and designate it as a Git repository:
$ mkdir git-crash-course $ cd git-crash-course $ git init . Initialized empty Git repository in /Users/soren/src/localhost/git-crash-course/.git/
`git` has created a folder called `.git/` inside `git-crash-course/`. This is where it will store information about the history of changes you've made to the contents of this repository.
In order to understand how Git tracks changes, we need to actually make some changes first. Create a file at the root of the repository with some text in it:
$ echo "Hello, world!" > foo.txt $ ls -a ./ ../ .git/ foo.txt
When you make changes to a file in a repository and then run `git`, it compares the state of the files with their state at a specific point in the past. You can ask about what changes it sees:
$ git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt nothing added to commit but untracked files present (use "git add" to track)
There's a lot of information here, but what we care about is the section starting with "Untracked files". Git is telling us that it sees a file (foo.txt) that hasn't yet had any changes recorded. Because it hasn't seen this file before, it's labelled as untracked. To start tracking it, we need to create a **commit**.
A commit is simply a bundle of changes, packaged up together and given a message to describe what those changes represent. This is the basic unit of organisation in Git's model of changes.
Let's create a commit for the change we made. The first thing to do is designate which changes to include in the commit:
$ git add foo.txt $ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: foo.txt
Now, the changes we made to `foo.txt` have been designated as "to be committed", more commonly referred to as "staged". Only changes that are staged will be included when we create a commit.
To be precise, the change that's been staged is the creation of the file `foo.txt` with its current contents.
Now, let's create a commit to package up those changes and describe them:
$ git commit -m "Add foo.txt" [master (root-commit) ffbbc5a] Add foo.txt 1 file changed, 1 insertion(+) create mode 100644 foo.txt
We've used the `-m` flag to provide a message describing the commit. Convention dictates that commits are phrased in the _imperative_ tense. In other words, say "add foo.txt", not "added foo.txt" or "adds foo.txt".
This makes more sense if you think about the history of commits in a repository as describing the steps required to go from an empty folder to one containing all of the files in their current state. From that perspective, each commit is an _instruction_ to Git to perform one of those steps in the change history: so we tell Git _to add_ foo.txt, rather than telling it _that we added_ foo.txt.
If we run `git status` again, we see that now all the changes have been committed, we have what's called a "clean working tree": there aren't any changes in any files that haven't been committed.
$ git status On branch master nothing to commit, working tree clean
Let's make a couple more commits. First, add a second file:
$ echo "See foo.txt for a message!" > bar.txt $ git add bar.txt $ git commit -m "Add some instructions" [master 18e31b4] Add some instructions 1 file changed, 1 insertion(+) create mode 100644 bar.txt
Now, make some changes to an existing file, and notice the different output of `git status` after you make the change:
$ echo "Here's another line." >> foo.txt $ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: foo.txt no changes added to commit (use "git add" and/or "git commit -a")
You'll notice this time that the changes are marked as "not staged for commit", rather than having files listed as "untracked". That's because Git is comparing the current state of our files to the last commit, and since `foo.txt` already existed in the last commit, it recognises that all we're doing is changing some lines inside the file.
If you want to see exactly what those changes are, run `git diff`, and you'll see something like this:
diff --git a/foo.txt b/foo.txt index af5626b..bdfc7b8 100644 --- a/foo.txt +++ b/foo.txt @@ -1 +1,2 @@ Hello, world! +Here's another line.
Now stage and commit this.
$ git add . $ git commit -m "Update foo.txt" [master ed1fa47] Update foo.txt 1 file changed, 1 insertion(+)
You can pass `git add` a path instead of individual filenames to add all changes in the folder at that path. In this case, I've used `.`, which is the current directory, to just add _all_ changes in the repository.
Now that we have a few commits, we can look at the history of this repository:
$ git log commit ed1fa470d5d6707161d88672cd6230f2faabac92 (HEAD -> master) Author: Søren Mortensen <soren@neros.dev> Date: Tue Oct 20 14:05:21 2020 +0100 Update foo.txt commit 18e31b4ced4f7f9ec055748b7a45f562000d2217 Author: Søren Mortensen <soren@neros.dev> Date: Tue Oct 20 14:03:23 2020 +0100 Add some instructions commit ffbbc5a3d42476a706895c974a4144406410bbfa Author: Søren Mortensen <soren@neros.dev> Date: Tue Oct 20 14:02:18 2020 +0100 Add foo.txt
Each commit is given a SHA-1 hash, computed from its contents, that uniquely identifies it.
Almost all the time, though, the first 7 characters are more than enough to identify a specific commit within the context of one repository, which is why both `git` and its users often just use that portion of the hash to refer to commits.
Unfortunately, this isn't a particularly helpful representation. Try something more like this:
+---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+
The reason I've chosen to illustrate the commits this way is because Git represents a repository's commit history internally as a _linked list_: each commit is a node that points to its parent (the commit before it).
In fact, Git goes one step farther, with the concept of a **branch**. But once you know that the commit history is stored as a linked list, branches are much easier to understand: a branch is just a particular portion of that linked list, stored as a pointer to the most recent child node.
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+
Git calls the default branch `master`, and it's treated as the authoritative version of the codebase. But when new code is being written, it's important to be able to make changes and not have to worry about messing up the nice, clean version on `master`, so Git allows us to create other branches.
If we run the following, we create a new branch, pointing to exactly the same commit (called its head commit, in exactly the same sense as the head of a linked list):
$ git checkout -b new-feature Switched to a new branch 'new-feature'
This is a little bit of a shortcut way of doing this; that command both creates and switches to the new branch, which is why the base command is `git checkout` (the command used for switching branches). If you wanted to do those steps separately, you could run `git branch new-feature` to create it, and then `git checkout new-feature` to switch to it.
Now our commit history looks like this:
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+ ^ | new-feature
We can also add into the mix the concept of `HEAD`, which refers to whatever branch the repository is currently on. This is what determines the state of the files on disk; if you switch branches, Git actually changes the contents of the files to reflect that.
Because the command we just ran both created _and_ switched to the new branch, our repository is in this state:
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+ ^ | new-feature <--- HEAD
Now, let's add some commits to this new branch! Create a new file or modify an existing one and commit it, as we did above.
$ echo "New feature 1" > new-feature.txt $ git add . $ git commit -m "Add a new feature" [new-feature 1b8de27] Add a new feature 1 file changed, 1 insertion(+) create mode 100644 new-feature.txt
The repository will now look something more like this:
master | v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | new-feature <--- HEAD
If you switch back to `master`, you'll notice that your changes from `new-feature` aren't there.
$ git checkout master Switched to branch 'master' $ cat new-feature.txt cat: new-feature.txt: No such file or directory
Of course, at some point, we will want to integrate those changes back in! First, though, let's add some more changes on another branch. Switch to `master`, then create a new branch called `new-feature-2` and commit something to it.
$ git checkout master Already on 'master' $ git checkout -b new-feature-2 Switched to a new branch 'new-feature-2' $ echo "New feature 2" > new-feature-2.txt $ git add . $ git commit -m "Add another new feature" [new-feature-2 8d5f335] Add another new feature 1 file changed, 1 insertion(+) create mode 100644 new-feature-2.txt
The repository will now be in this state:
master new-feature | | v v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | +---------+ +---------| 8d5f335 | +---------+ ^ | new-feature-2 <--- HEAD
Now we're ready to integrate the changes from `new-feature` back into `master`. Switch to the branch we're going to merge _into_ (in this case `master`), and run `git merge new-feature` to merge `new-feature` into it.
$ git checkout master Switched to branch 'master' $ git merge new-feature Updating ed1fa47..1b8de27 Fast-forward new-feature.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 new-feature.txt
Git will perform what's called a "fast-forward merge", which means that the only action needed to merge the changes from `new-feature` into `master` was to move the `master` pointer forward through the commit history so it points at the same commit as `new-feature`.
new-feature, master <--- HEAD | v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | +---------+ +---------| 8d5f335 | +---------+ ^ | new-feature-2
We can now delete the `new-feature` branch, since we're finished with it:
$ git branch --delete new-feature Deleted branch new-feature (was 1b8de27).
Now let's merge `new-feature-2` into `master`.
$ git merge new-feature-2
At this point, a text editor will open. If you told Git which editor you'd like it to use earlier, then it should have opened a text file in that editor. This is the same behaviour that you'll see if you run `git commit` without the `-m` flag and a message; it is asking you to enter a commit message for a commit it is creating.
Because Git is responsible for the creation of this new commit, it has helpfully autogenerated a message for you, like this:
Merge branch 'new-feature-2' into master # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
That message should be fine, so all you need to do is save and close the file.
If you didn't set an editor, the editor that opened is probably `vim`. `vi` and its descendants are a whole other can of worms, so because this is a crash course, I'll tell you only what you need to know: type a colon (`:`), enter `wq` into the line at the bottom of the screen that opens, and press enter. This will save the file and quit.
If you pressed any other keys before trying this, you might have to press the escape key a few times before `:wq`.
After that, Git will spit out a message that looks something like this:
Merge made by the 'recursive' strategy. new-feature-2.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 new-feature-2.txt
So why did that merge play out in such a different way than the last one? Well, take a look at the state of the repository after the merge:
HEAD ---> master | v +---------+ +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |<--+---| 7d90c55 | +---------+ +---------+ +---------+ +---------+ | +---------+ ^ | | +---------+ | +---------| 8d5f335 |<--+ +---------+ ^ | new-feature-2
Because we merged in a branch whose history had previously diverged from that of the current branch, Git has been forced to create a new commit, called a "merge commit", to integrate together the two sets of changes. The merge commit, unlike the other commits we've seen so far, has _two_ parents (the heads of the two diverged branches we merged).
There are ways to avoid merge commits (look into `git rebase`), but you should become more familiar with Git before attempting to mess around with them, and the reasons for doing this are mostly aesthetic.
Now that we're finished with `new-feature-2`, delete it too.
$ git branch --delete new-feature-2 Deleted branch new-feature-2 (was 8d5f335).
When you don't have someone like me around to draw ASCII art of your branches all day, it's nice to have a way of visualising your commit history that reflects this structure a bit better. Feel free to add this line to the `[alias]` section of your `.gitconfig` file (found in your home directory), which will allow you to run `git hist` in place of `git log`. Commit histories will then include a visual representation of branches:
Remember what I said at the beginning about the two main purposes of version control?
Well, congratulations! If you've gotten this far, you've learned how to use Git to do the first of those two things. Give yourself a pat on the back.
I'm going to cover the second point in part 2 of this post, **coming soon**.