💾 Archived View for d.moonfire.us › blog › 2014 › 10 › 05 › reorganization-git-story-repo captured on 2023-07-10 at 15:04:34. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-04-26)
-=-=-=-=-=-=-
As some of you may know, I use Git[1] to organize my writing. After years of accidentally overwriting a good chapter with an old one or trying to coordinate changes from two separate machines, I got into source control for writing; it worked for programming, why not my novels?
Well, I've had a couple iterations of trying to get the “perfect” Git setup for my writing. Earlier this year, I broke apart the novels into submodules but left the bulk of my writing in the main repository (called `stories`). This meant I had a `sand-and-blood`, `sand-and-ash`, and `sand-and-bone` repository as submodules in the appropriate location of the `stories` repo (dmoonfire/fedran, if you are curious).
My reasons came while I was working on Sand and Blood[2] covers. Since I checked in as I went, the size of the repository quickly became too large for my website to handle. I could download up to 50 MiB repo without too much trouble, but when it got into the 900 MiB range, I couldn't clone the repository anymore.
I had already worked with submodules[3] before, so I thought they would be a perfect thing for the novels. I spent a pair of nights pulling out the five current WIP novels into a submodule, mainly by cloning the repo and using various commands to carve them out. It also took a while because I have a lot of project branches (41 beyond `master`) which represent every work-in-progress or semi-completed work I've done. Pulling out binaries from every branch was a painful process to say the least.
3: http://git-scm.com/book/en/Git-Tools-Submodules
The submodule approach worked out fairly well, but I quickly found out some of its limitations. Because of how Git implements submodules, its inevitably shows up in other branches. It also has additional work.
To give an example. Assume I'm on my `sand-and-ash` branch and I'm happily working in the `dmoonfire/fedran/sand-and-ash` directory making changes. When I'm done, I've committed them and pushed up.
When I got up a level, to `dmoonfire/fedran`, I have to do a second commit to commit the submodule's position in the `stories` repository. It was a little extra work, but it kept the two isolated.
The real problem came when I switched to the `sand-and-blood` branch. The directory `dmoonfire/fedran/sand-and-ash` is still there and pointing to a repoistory (the `sand-and-ash` one), but I have to tell the `sand-and-blood` branch about it, otherwise it will show as an untracked file.
My two choices were to either add the `dmoonfire/fedran/sand-and-ash` directory to the `.gitignore` file of the `sand-and-blood` branch. (Okay, there are a lot of filenames in this post, sorry about that.)
The other approach is to add the submodule to the other branches so they didn't show as changes. Which worked until I made another change to the submodule and then I had to update it on every other branch to reflect the changes.
Last night, I got tired of jumping through the hoops of submodules. I realized the entire reason I wanted to isolate the novels was to handle the covers. So, I decided to make a `covers` repository instead, put it into the root of the `stories` working directory and then add it to the `.gitignore`. This means that the `stories` repository doesn't officially know about the `covers` repository, but I can still reference it via soft links into `covers`.
The advantage of this approach is all the writing (actual words) are still managed in the same repository. This means when I switch branches, the stuff in `sand-and-ash` branch (not repo now) goes away until I go back. There isn't any cruft that drags on between the individual branches that has nothing to do with the current branch.
It isn't very elegant to have covers separated, but I only need covers when I'm formatting ebooks.
One of the side effects of breaking apart the repository and pulling them back together is that I'm losing history data. I kept most of the commit histories intact, but now I can't really do a graph of total words written over a month or time. Since I can't tell if anyone actual read my posts when I documented them, I decided to accept that lose.
I mentioned that splitting apart the repositories was a lot of work. When I combined them back together, I was preparing myself for a lot of work. Then, I found BFG Repo Cleaner[4]. This is a Scala (a language I don't know) tool that works better than `git filter-branch` in a lot of ways.
4: http://rtyley.github.io/bfg-repo-cleaner/
I ended up using BFG to remove most of the cover images from the repository along with the large files. This let me trim the final `stories` repository from 1.9 GiB to 20 MiB. The `covers` repository is at a nice 419 MiB, but that is also acceptable since I use it so infrequently.
If you have to remove files, directories, or large objects from your repository, it looks like BFG is something to seriously consider.
Categories:
Tags:
Below are various useful links within this site and to related sites (not all have been converted over to Gemini).
https://d.moonfire.us/blog/2014/10/05/reorganization-git-story-repo/