2024-08-11 Serving bare git on the web

In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.

For I while I thought that `legit` was the answer. It was nice and fast and all that. But recently, `git clone` no longer worked. It calls `git upload-pack` as an intermediate workaround for #1062. This was failing for some reason, however. I tinkered with it for a while but didn't get anywhere.

#1062

Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos. I made some changes to my Apache config and now `git clone` works again.

hosting git repos

The key is that you need a `post-update` hook that calls `git update-server-info`. Each git repository already comes with a `post-update.sample` hook containing the necessary code, so I needed to loop over all the bare repositories I had and rename the example hook.

Using the Fish shell:

for d in *.git
    sudo -u git mv $d/hooks/post-update.sample $d/hooks/post-update
end

Sadly, this is not good enough.

In order to generate an `index.html` file for every repository, I need a hook that regenerates it. If you know how to determine whether regeneration can be skipped, I'd love to hear how to that.

This hook also updates or adds the `AddDescription` lines I need.

I prepared a hook that I wanted to install in every repository and saved it as `~/post-update`.

This is what it looks like, using the Fish shell:

#!/usr/bin/fish

git update-server-info

# create index.html
set branch (git branch --show-current)
set template (cat /home/git/.readme.html | string collect)
set title (basename (pwd))
set body (git show $branch:README.md | cmark --to html | string collect)
printf "$template" "$title" "$body" "$title" > index.html

# update description
set description (cat description)
sed --in-place=~ --expression "/ $title/d" /home/git/.htaccess
printf "AddDescription \"$description\" $title\n" >> /home/git/.htaccess

(I need the title twice, once for the title and once for the reminder on how to clone.)

I turn Markdown into HTML using `cmark`. Common Mark is the closest we have to a standard, I guess.

The template `/home/git/.readme.html` looks like this:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="format-detection" content="telephone=no">
    <meta name="viewport" content="width=device-width">
    <title>%s</title>
    <style>
html { max-width: 70ch; padding: 1ch; margin: auto; }
body { hyphens: auto; }
    </style>
  </head>
  <body>
    <nav>
      <a href="https://src.alexschroeder.ch/">Source code repositories</a>
    </nav>
    <main>
%s</main>
    <footer>
      <h2>Clone</h2>
      <pre>
<mark>git clone https://src.alexschroeder.ch/%s</mark>
      </pre>
    </footer>
  </body>
</html>

So now I needed to distribute the `post-update` hook to every repository and run it once. I wrote yet another Fish script, `~/recreate-index`:

#!/usr/bin/fish
for d in /home/git/*.git
    echo $d
    cd $d
    cp ~/post-update hooks/
    chown git:git hooks/post-update
    chmod 775 hooks/post-update
    sudo -u git git hook run post-update
end

I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.

Now, for the Apache web server – I changed my site to the following:

<VirtualHost *:443>
    ServerAdmin alex@alexschroeder.ch
    ServerName src.alexschroeder.ch
    Include conf-enabled/blocklist.conf
    SSLEngine on
    DocumentRoot /home/git
   <Directory /home/git>
        Options Indexes
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>

The `/home/git` directory has an `.htaccess` file that starts out containing the following:

HeaderName .top.html
IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase
IndexOrderDefault Descending Date
IndexHeadInsert "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"
IndexOptions Charset=UTF-8

The `.top.html` file contains a fragment to add to the top of the index:

<style>
body { max-width: 80ch }
table { overflow-x: auto }
td { padding: 0.5ex 1em 0 0; white-space: nowrap }
td:nth-child(3) { white-space: wrap }
</style>
<h1>Source code repositories</h1>
<p>
  Hello!
</p>
<p>
  I'm Alex Schroeder.
  These are my source code repositories. You can find out more about me on
  <a href="https://alexschroeder.ch/">my blog</a>. There, you'll also find a page
  listing ways to <a href="https://alexschroeder.ch/view/Contact">contact me</a>.
</p>
<p>
  As for the git repositories, you should be able to clone them as they are.
  For example:
</p>
<pre>
  git clone https://src.alexschroeder.ch/oddmu.git
</pre>
<p>
  For more about this setup, see
  <a href="https://idiomdrottning.org/hosting-git-repos">How to host git repos</a>
  by <a href="https://idiomdrottning.org/users/Sandra">@Sandra</a> and my post,
  <a href="https://alexschroeder.ch/edit/2024-08-11-bare-git">2024-08-11 Serving bare git on the web</a>.
</p>

Take a look!

Take a look

​#Butlerian Jihad ​#Git ​#Administration

If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:

for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)')
    set dir (dirname $file)
    if test ! -d $dir
        mkdir -p $dir
    end
    echo $file; sudo -u git git show $branch:$file > $file
end

This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.

oddmu

But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.

I think the better way forward is to move this information elsewhere. The README is not the documentation.

And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.