š¾ Archived View for danieljanus.pl āŗ blog āŗ en āŗ 2021 āŗ 07 āŗ 01 āŗ commit-groups āŗ index.gmi captured on 2023-01-29 at 15:44:54. Gemini links have been rewritten to link to archived content
ā¬ ļø Previous capture (2021-11-30)
-=-=-=-=-=-=-
Everyone 1 [1] and their dog 2 [2] loves Git. I know I do. It works, itās efficient, it has a brilliant data model, and it sports every feature under the sun [3]. In 13 years of using it, Iāve never found myself needing a feature it didnāt have. Until recently.
But before I tell you about it, letās talk about GitHub.
There are three groups of GitHub users, distinguished by how they prefer to merge pull requests:
Merge commit, squash, or rebase? Thereās no single best answer to that question. A number of factors are at play in choosing the merge strategy: the type of the project, the size, workflow and preferences of the team, business considerations, and so on. You probably have your own preference if youāve used GitHub to collaborate with a team.
Iāll talk for a while about the pros and cons of each approach. But first, letās establish a setting. Imagine that your project has a āmainā branch, from which a āfeatureā branch was created off at one point. Since then, both branches have seen developments, and now after āfeatureā has undergone reviews and testing, itās ready to be merged back to āmainā:
Merge commits are the original answer that Git has to combining changes. A merge commit has two or more parents and brings in all the changes from them and their ancestors:
In this example, Git has created a new commit, number 9, that merges commits 6 and 8. The branch āmainā now points to that new commit, and so contains all changes in the range 1ā8.
Merge commits are extremely versatile and scale well, especially for complicated workflows with multiple maintainers, each responsible for different part of the code; for example, theyāre pervasively used by the Linux kernel developers. However, for small, agile teams (especially in the business context), they can be overkill and pose potential problems.
In such a team, you typically have one eternal branch, from which production releases are made, and to which people merge changes from short-lived feature branches. In such a setting, itās hard to tell how the history of a project has progressed. GitFlow [4], a popular way of working with Git, advocates merge commits everywhere, and people are struggling with it [5].
Iāll refer you to the visual argument from that last post:
Setting aside the fact that this history is littered with merge commits, the author makes a point that with this kind of an entangled graph, itās practically impossible to find anything in it. Whether thatās true or not Iāll leave for you to decide, but thereās definitely a case for linear history there.
Thereās another, oft-overlooked quirk here. Quick: look again at the second image above, the one with merge commit number 9. Can you tell, from the image alone, which commit was the tip of āmainā before the merge happened? Surely it must be 8, because itās on the gray line, right?
Yeah: on the image. But when you look at the merge commit itself, itās not that obvious. Under the hood, all the commit really says is:
Merge: 8 6
So it tells you that these two parents have been merged together, but it doesnāt tell you which one used to be main. You might guess 8, because itās the leftmost one, but you donāt know for sure. (Remember, branches in Git are just pointers to commits.) The only way (that I know of) to be sure is to use the reflog [6], but that is ephemeral: Git occassionally prunes old entries from reflogs.
So this prevents you from being able to confidently answer questions such as: āwhich features were released over the given time period?ā, or āwhat was the state of āmainā as of a given date?ā.
Thatās also why you canāt āgit revertā a merge commitāthat is, unless you tell Git which of the parent commits you want to keep and which to discard.
In the merge commit-based approach, we donāt rewrite history: once a commit is made, it stays; repository only grows by accretion. In contrast, the other two approaches use Gitās facilities for rewriting history. As weāll see, the fundamentals are the same: where they differ is commit granularity.
Coming back to our example: when squashing, we mash together the changes introduced by commits 4, 5, and 6 into a single commit (āSā), and then replay that commit on top of āmainā.
The āfeatureā branch is still there, but I didnāt include it on this picture because itās no longer relevantāit typically gets deleted upon merge (which, as we will see, might not actually be a good idea).
Thereās a lot to like about this approach, and some teams [7] advocate for it [8]. The biggest and most obvious benefit is likely that the history becomes very legible. Itās linear and thereās a one-to-one correspondence between commits on āmainā and pull requests (and, mostly, either features or bugfixes). Such a history can be of great help in project management: it becomes very easy to answer the questions which were nigh impossible to answer in the merge-commit approach.
This situation is similar to the previous one, except that we donāt squash commits 4ā6 together. Instead, we directly replay them on top of āmainā.
Let me start with a long digression. You might guess, from the GitHub screenshot at the top of this post, that Iām in this camp, and youād be right. In fact, I used to squash and merge feature branches, but I switched to the rebase-and-merge approach after introducing probably the single biggest improvement to the quality of my work over recent years:
I started writing meaningful commit messages [9].
In the not-too-distant past, my commit messages used to be one-liners, as evidenced, for example, in the history of Skyscraper [10]. These first lines havenāt changed much, but now I strive to augment them with explanation of why the change is being made. When it fixes a bug, I explain what was causing it and how the change makes the bug go away; when it implements a feature, I highlight the specifics of the implementation. I might not write more code these days, but I certainly write more prose: itās not uncommon for me to write two or three paragraphs about a +1/ā1 change.
So my commit messages now look like this (Iām taking a recent random example from the Fy! [11] appās repo):
app/tests: allow to mock config Tests expected the code-push events to fire, but now that Iāve disabled CP in dev, and the tests are built with the dev aero profile, theyād fail. This could have been fixed by building them with AERO_PROFILE=staging in CI, but it doesnāt feel right: I think tests shouldnāt depend on varying configuration. If a test requires a given bit of configuration to be present, itās better to configure it that way explicitly. Hence this commit. It adds a wrap-config mock and a corresponding :extra-config fixture, which, when present (and it is by default), will merge the value onto generated-config.
Iām very conscious about having a clean history. Iām aiming for each commit to be small (with the threshold at approximately +20/ā20 LOCs) and introduce a coherent, logical change.
Thatās not to say I always develop that way, of course. If you looked at a āgit logā of my work-in-progress branch, chances are youād see something like this:
5d64b71 wip 392b1e0 wip 0a3ad89 more wip 3db02d3 wip
But before declaring the PR ready to review, Iāll throw this history away (by āgit reset --mixed $(git merge-base feature main)ā) and re-commit the changes, dividing them into logical units and writing the rationales, bit by bit.
The net result of rigorously applying this practice is that
you can do git annotate anywhere, and learn about why any line of code in the codebase is the way it is.
I canāt emphasize enough how huge, huge impact for the developerās wellbeing this has. These commits messages, when I read them back weeks or months later, working on something different but related, almost read as little love letters from me-in-the-past to me-now. They reduce the all-important WTFs/minute metric to zero.
Theyāre also an aid in reviewing code. My PR notes usually say āplease read each commit in isolation.ā Iāve found it easier to follow a PR when it tries to tell a story, and each commit is a milestone down that road.
Ending the digression: can you see why I prefer rebase-and-merge over squash-and-merge? Because, all the benefits notwithstanding, squashing irrevocably loses context.
Now, instead of each line being a result of a small, +20/ā20 change, you can only tell that itās part of a set of such changes ā maybe ten of them, maybe fifty. You donāt know. Sure you can go look in the original branch, but itās an overhead, and what if itās been deleted?
So yeah. Having those love letters all in place, each carefully placed and not glued to others, is just too much of a boon to let go. But itās not to say that rebasing-and-merging is without downsides.
For example, itās again hard to tell how many features were deployed over a given period of time. More troublesomely, itās harder to revert changes: typically you want to operate on a feature level there. With squash-and-merge, it takes one āgit revertā to revert a buggy feature. With rebase-and-merge, you need to know the range.
Worse yet: itās more likely for a squashed-and-merged commit to be cleanly undone (or cherry-picked) than for a series of small commits. (I sometimes deliberately commit wrong or half-baked approaches that are changed in subsequent commits, just to tell the story more convincingly, and itās possible that each of these changes individually causes trouble but that they cancel each other in squash.)
So Iām not completely happy with either of the three approaches. Which finally brings me to my preferred fourth approach, one that Git (yet?) doesnāt allow for:
You know the āgroupā facility of vector graphics programs? You draw a couple of shapes, you group them together, and then you can apply transformations to the entire group at once, operating on it as if it were an atomic thing. But when need arises, you can āungroupā it and look deeper.
Thatās because sometimes thereās a need to have a āhigh-levelā view of things, and sometimes you need to delve deeper. Each of these needs is valid. Each is prompted by different circumstances that we all encounter.
Iād love to see that same idea applied to Git commits. In Git, a commit group might just be a named and annotated range of commits: āfeature-aā might be the same as ā5d64b71..3db02d3ā. Every Git command that currently accepts commit ranges could accept group names. I envision groups to have descriptions, so that āgit logā, āgit blameā, etc could take ā--groupedā or ā--ungroupedā options and act appropriately.
Obviously, details would need to be fleshed out (can groups overlap? can groups be part of other groups?), and Iām not that familiar with Git innards to say with confidence that itās doable. But the more I think about it, the more sound the idea seems to me.
I think creating a group when doing a rebase-and-merge could bring together the best of all three worlds, so that we can have all our cakes and eat them too.
1. Well, [12] almost [13] everyone [14].ā© [15]
2. Itās Dog Day here in Poland as I write these words. Happy Dog Day!ā© [16]