šŸ’¾ Archived View for danieljanus.pl ā€ŗ blog ā€ŗ en ā€ŗ 2021 ā€ŗ 07 ā€ŗ 01 ā€ŗ commit-groups ā€ŗ index.gmi captured on 2023-01-29 at 15:44:54. Gemini links have been rewritten to link to archived content

View Raw

More Information

ā¬…ļø Previous capture (2021-11-30)

-=-=-=-=-=-=-

Things I wish Git had: Commit groups

Intro

Everyone 1 [1] and their dog 2 [2] loves Git. I know I do. It works, itā€™s efficient, it has a brilliant data model, and it sports every feature under the sun [3]. In 13 years of using it, Iā€™ve never found myself needing a feature it didnā€™t have. Until recently.

But before I tell you about it, letā€™s talk about GitHub.

There are three groups of GitHub users, distinguished by how they prefer to merge pull requests:

Image

Merge commit, squash, or rebase? Thereā€™s no single best answer to that question. A number of factors are at play in choosing the merge strategy: the type of the project, the size, workflow and preferences of the team, business considerations, and so on. You probably have your own preference if youā€™ve used GitHub to collaborate with a team.

Iā€™ll talk for a while about the pros and cons of each approach. But first, letā€™s establish a setting. Imagine that your project has a ā€œmainā€ branch, from which a ā€œfeatureā€ branch was created off at one point. Since then, both branches have seen developments, and now after ā€œfeatureā€ has undergone reviews and testing, itā€™s ready to be merged back to ā€œmainā€:

Image

Create a merge commit

Merge commits are the original answer that Git has to combining changes. A merge commit has two or more parents and brings in all the changes from them and their ancestors:

Image

In this example, Git has created a new commit, number 9, that merges commits 6 and 8. The branch ā€œmainā€ now points to that new commit, and so contains all changes in the range 1ā€“8.

Merge commits are extremely versatile and scale well, especially for complicated workflows with multiple maintainers, each responsible for different part of the code; for example, theyā€™re pervasively used by the Linux kernel developers. However, for small, agile teams (especially in the business context), they can be overkill and pose potential problems.

In such a team, you typically have one eternal branch, from which production releases are made, and to which people merge changes from short-lived feature branches. In such a setting, itā€™s hard to tell how the history of a project has progressed. GitFlow [4], a popular way of working with Git, advocates merge commits everywhere, and people are struggling with it [5].

Iā€™ll refer you to the visual argument from that last post:

Image

Setting aside the fact that this history is littered with merge commits, the author makes a point that with this kind of an entangled graph, itā€™s practically impossible to find anything in it. Whether thatā€™s true or not Iā€™ll leave for you to decide, but thereā€™s definitely a case for linear history there.

Thereā€™s another, oft-overlooked quirk here. Quick: look again at the second image above, the one with merge commit number 9. Can you tell, from the image alone, which commit was the tip of ā€œmainā€ before the merge happened? Surely it must be 8, because itā€™s on the gray line, right?

Yeah: on the image. But when you look at the merge commit itself, itā€™s not that obvious. Under the hood, all the commit really says is:

Merge: 8 6

So it tells you that these two parents have been merged together, but it doesnā€™t tell you which one used to be main. You might guess 8, because itā€™s the leftmost one, but you donā€™t know for sure. (Remember, branches in Git are just pointers to commits.) The only way (that I know of) to be sure is to use the reflog [6], but that is ephemeral: Git occassionally prunes old entries from reflogs.

So this prevents you from being able to confidently answer questions such as: ā€œwhich features were released over the given time period?ā€, or ā€œwhat was the state of ā€œmainā€ as of a given date?ā€.

Thatā€™s also why you canā€™t ā€œgit revertā€ a merge commitā€”that is, unless you tell Git which of the parent commits you want to keep and which to discard.

Squash and merge

In the merge commit-based approach, we donā€™t rewrite history: once a commit is made, it stays; repository only grows by accretion. In contrast, the other two approaches use Gitā€™s facilities for rewriting history. As weā€™ll see, the fundamentals are the same: where they differ is commit granularity.

Coming back to our example: when squashing, we mash together the changes introduced by commits 4, 5, and 6 into a single commit (ā€œSā€), and then replay that commit on top of ā€œmainā€.

Image

The ā€œfeatureā€ branch is still there, but I didnā€™t include it on this picture because itā€™s no longer relevantā€”it typically gets deleted upon merge (which, as we will see, might not actually be a good idea).

Thereā€™s a lot to like about this approach, and some teams [7] advocate for it [8]. The biggest and most obvious benefit is likely that the history becomes very legible. Itā€™s linear and thereā€™s a one-to-one correspondence between commits on ā€œmainā€ and pull requests (and, mostly, either features or bugfixes). Such a history can be of great help in project management: it becomes very easy to answer the questions which were nigh impossible to answer in the merge-commit approach.

Rebase and merge

This situation is similar to the previous one, except that we donā€™t squash commits 4ā€“6 together. Instead, we directly replay them on top of ā€œmainā€.

Image

Let me start with a long digression. You might guess, from the GitHub screenshot at the top of this post, that Iā€™m in this camp, and youā€™d be right. In fact, I used to squash and merge feature branches, but I switched to the rebase-and-merge approach after introducing probably the single biggest improvement to the quality of my work over recent years:

I started writing meaningful commit messages [9].

In the not-too-distant past, my commit messages used to be one-liners, as evidenced, for example, in the history of Skyscraper [10]. These first lines havenā€™t changed much, but now I strive to augment them with explanation of why the change is being made. When it fixes a bug, I explain what was causing it and how the change makes the bug go away; when it implements a feature, I highlight the specifics of the implementation. I might not write more code these days, but I certainly write more prose: itā€™s not uncommon for me to write two or three paragraphs about a +1/āˆ’1 change.

So my commit messages now look like this (Iā€™m taking a recent random example from the Fy! [11] appā€™s repo):

app/tests: allow to mock config

Tests expected the code-push events to fire, but now that Iā€™ve
disabled CP in dev, and the tests are built with the dev aero profile,
theyā€™d fail.

This could have been fixed by building them with AERO_PROFILE=staging
in CI, but it doesnā€™t feel right: I think tests shouldnā€™t depend on
varying configuration. If a test requires a given bit of configuration
to be present, itā€™s better to configure it that way explicitly.

Hence this commit. It adds a wrap-config mock and a corresponding
:extra-config fixture, which, when present (and it is by default),
will merge the value onto generated-config.

Iā€™m very conscious about having a clean history. Iā€™m aiming for each commit to be small (with the threshold at approximately +20/āˆ’20 LOCs) and introduce a coherent, logical change.

Thatā€™s not to say I always develop that way, of course. If you looked at a ā€œgit logā€ of my work-in-progress branch, chances are youā€™d see something like this:

5d64b71 wip
392b1e0 wip
0a3ad89 more wip
3db02d3 wip

But before declaring the PR ready to review, Iā€™ll throw this history away (by ā€œgit reset --mixed $(git merge-base feature main)ā€) and re-commit the changes, dividing them into logical units and writing the rationales, bit by bit.

The net result of rigorously applying this practice is that

you can do git annotate anywhere, and learn about why any line of code in the codebase is the way it is.

I canā€™t emphasize enough how huge, huge impact for the developerā€™s wellbeing this has. These commits messages, when I read them back weeks or months later, working on something different but related, almost read as little love letters from me-in-the-past to me-now. They reduce the all-important WTFs/minute metric to zero.

Image

Theyā€™re also an aid in reviewing code. My PR notes usually say ā€œplease read each commit in isolation.ā€ Iā€™ve found it easier to follow a PR when it tries to tell a story, and each commit is a milestone down that road.

Ending the digression: can you see why I prefer rebase-and-merge over squash-and-merge? Because, all the benefits notwithstanding, squashing irrevocably loses context.

Now, instead of each line being a result of a small, +20/āˆ’20 change, you can only tell that itā€™s part of a set of such changes ā€” maybe ten of them, maybe fifty. You donā€™t know. Sure you can go look in the original branch, but itā€™s an overhead, and what if itā€™s been deleted?

So yeah. Having those love letters all in place, each carefully placed and not glued to others, is just too much of a boon to let go. But itā€™s not to say that rebasing-and-merging is without downsides.

For example, itā€™s again hard to tell how many features were deployed over a given period of time. More troublesomely, itā€™s harder to revert changes: typically you want to operate on a feature level there. With squash-and-merge, it takes one ā€œgit revertā€ to revert a buggy feature. With rebase-and-merge, you need to know the range.

Worse yet: itā€™s more likely for a squashed-and-merged commit to be cleanly undone (or cherry-picked) than for a series of small commits. (I sometimes deliberately commit wrong or half-baked approaches that are changed in subsequent commits, just to tell the story more convincingly, and itā€™s possible that each of these changes individually causes trouble but that they cancel each other in squash.)

So Iā€™m not completely happy with either of the three approaches. Which finally brings me to my preferred fourth approach, one that Git (yet?) doesnā€™t allow for:

Rebase, group and merge

You know the ā€œgroupā€ facility of vector graphics programs? You draw a couple of shapes, you group them together, and then you can apply transformations to the entire group at once, operating on it as if it were an atomic thing. But when need arises, you can ā€œungroupā€ it and look deeper.

Thatā€™s because sometimes thereā€™s a need to have a ā€œhigh-levelā€ view of things, and sometimes you need to delve deeper. Each of these needs is valid. Each is prompted by different circumstances that we all encounter.

Iā€™d love to see that same idea applied to Git commits. In Git, a commit group might just be a named and annotated range of commits: ā€œfeature-aā€ might be the same as ā€œ5d64b71..3db02d3ā€. Every Git command that currently accepts commit ranges could accept group names. I envision groups to have descriptions, so that ā€œgit logā€, ā€œgit blameā€, etc could take ā€œ--groupedā€ or ā€œ--ungroupedā€ options and act appropriately.

Obviously, details would need to be fleshed out (can groups overlap? can groups be part of other groups?), and Iā€™m not that familiar with Git innards to say with confidence that itā€™s doable. But the more I think about it, the more sound the idea seems to me.

I think creating a group when doing a rebase-and-merge could bring together the best of all three worlds, so that we can have all our cakes and eat them too.

1. Well, [12] almost [13] everyone [14].ā†© [15]

2. Itā€™s Dog Day here in Poland as I write these words. Happy Dog Day!ā†© [16]

Links

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]