2024-08-11 Serving bare git on the web

In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.

For I while I thought that `legit` was the answer. It was nice and fast and all that. But recently, `git clone` no longer worked. It calls `git upload-pack` as an intermediate workaround for #1062. This was failing for some reason, however. I tinkered with it for a while but didn't get anywhere.

#1062

Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos. Time to fiddle with the Apache config.

hosting git repos

I changed my site to the following in order to just serve `/home/git` from a subdomain:

<VirtualHost *:443>
    ServerAdmin alex@alexschroeder.ch
    ServerName src.alexschroeder.ch
    Include conf-enabled/blocklist.conf
    SSLEngine on
    DocumentRoot /home/git
    <Directory /home/git>
        Options Indexes
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>

For this to work, you need a `post-update` hook that calls `git update-server-info`.

Knowing that I'm also going to serve the bare git repositories via the web, the hook also needs to generate an `index.html` file.

Furthermore, given that the repositories have a `description` file, I update `/home/git/.htaccess` accordingly.

I prepared the hook that I want to install in every repository and saved it as `/home/git/.post-update`.

This is what it looks like, written in Perl:

#!/usr/bin/perl
use Modern::Perl;
use File::Slurper qw(read_text write_text);
use File::Temp qw(tempfile);
use Encode qw(decode_utf8);
use Cwd;

qx(/usr/bin/git update-server-info);

# create index.html
my $branch = qx(git branch --show-current);
chomp $branch;
my $template = read_text("/home/git/.readme.html");
my $dir = getcwd;
my $title = $dir;
$title =~ s/\.git$//;
my $body = decode_utf8(qx(/usr/bin/git show $branch:README.md));
my ($fh, $filename) = tempfile(SUFFIX => '.md');
write_text($filename, $body);
my $pagename = substr($filename, 0, -3);
my $html = decode_utf8(qx(/home/oddmu/oddmu html $pagename));
unlink($filename);
write_text("index.html", sprintf($template, $title, $html, $title, $title));

# update description
if (-r "description") {
    my $description = read_text("description");
    chomp $description;
    my $htaccess = read_text("/home/git/.htaccess");
    write_text("/home/git/.htaccess~", $htaccess);
    my @lines = grep { !/ $title\.git$/ } split(/\n/, $htaccess);
    push(@lines, "AddDescription \"$description\" $title.git");
    my (@new, @descriptions);
    for my $line (@lines) {
	if ($line =~ /^AddDescription .* (\S+\.git)$/) {
	    push(@descriptions, [length($1), $line]);
	} else {
	    push(@new, $line);
	}
    }
    for my $description (sort { $b->[0] <=> $a->[0] } @descriptions) {
	push(@new, $description->[1]);
    }
    write_text("/home/git/.htaccess", join("\n", @new));
}

I turn Markdown into HTML using oddmu but feel free to use some other command-line tool like `cmark`.

oddmu

Next, I created a symlink in every git repository's `hooks` directory.

Using the Fish shell, assuming that `/home/git` is where all the repositories are, owned by the user `git`, and that you're using the root account:

for d in *.git; sudo -u git ln -sf /home/git/.post-update $d/hooks/post-update; end

The hook uses `/home/git/.readme.html` as a template for the `index.html` file.

This is what it looks like:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="format-detection" content="telephone=no">
    <meta name="viewport" content="width=device-width">
    <title>%s</title>
    <style>
html { max-width: 70ch; padding: 1ch; margin: auto; }
body { hyphens: auto; }
    </style>
  </head>
  <body>
    <nav>
      <a href="https://src.alexschroeder.ch/">Source code repositories</a>
    </nav>
    <main>
%s</main>
    <footer>
      <h2>Clone</h2>
      <pre>
<mark>git clone https://src.alexschroeder.ch/%s.git</mark>
      </pre>
      <h2>Contact</h2>
      <p>If you like it, send an email to Alex Schroeder &lt;<a href="mailto:alex@gnu.org?subject=I+like+%s">alex@gnu.org</a>&gt; ❤️
    </footer>
  </body>
</html>

I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.

The `/home/git` directory has an `.htaccess` file that starts out containing the following:

HeaderName .top.html
IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase
IndexOrderDefault Descending Date
IndexIgnore *~ .* Makefile
IndexHeadInsert "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"
IndexOptions Charset=UTF-8

The rest of the file is all the AddDescription directives added by the `post-update` hook.

The `.top.html` file contains a fragment to add to the top of the index:

<style>
body { max-width: 80ch }
table { overflow-x: auto }
td { padding: 0.5ex 1em 0 0; white-space: nowrap }
td:nth-child(3) { white-space: wrap }
</style>
<h1>Source code repositories</h1>
<p>
  Hello!
</p>
<p>
  I'm Alex Schroeder.
  These are my source code repositories. You can find out more about me on
  <a href="https://alexschroeder.ch/">my blog</a>. There, you'll also find a page
  listing ways to <a href="https://alexschroeder.ch/view/Contact">contact me</a>.
</p>
<p>
  As for the git repositories, you should be able to clone them as they are.
  For example:
</p>
<pre>
  git clone https://src.alexschroeder.ch/oddmu.git
</pre>
<p>
  For more about this setup, see
  <a href="https://idiomdrottning.org/hosting-git-repos">How to host git repos</a>
  by <a href="https://idiomdrottning.org/users/Sandra">@Sandra</a> and my post,
  <a href="https://alexschroeder.ch/edit/2024-08-11-bare-git">2024-08-11 Serving bare git on the web</a>.
</p>

The only thing that's strange is that this lists all the repositories by the last modification date of the `index.html` file contained within. That's not good.

I ended looping through all the directories a few times as I kept finding bugs in my `post-update` hook, so I ended up writing a `Makefile`. That's the reason `Makefile` is listed in the `IndexIgnore` directive for Apache, above.

This is the `Makefile`:

SHELL=/usr/bin/fish

# Regenerate the index.html files. Set their modification time because
# it looks like FancyIndex uses the index.html modification date.
update-indexes:
	for f in *.git; \
	  cd "$f"; \
	  sudo -u git hooks/post-update; \
	  sudo -u git git log -1 --format='%at' \
	   | xargs -I{} date -d @{} '+%Y-%m-%d %H:%M:%S' \
	   | xargs -I{} touch index.html --date {}; \
	  cd ..; \
	end

So now this will regenerate all the `index.html` files:

make

In any case, now we're done.

Take a look!

Take a look

.readme.html

.post-update

.top.html

​#Butlerian Jihad ​#Git ​#Administration

If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:

for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)')
    set dir (dirname $file)
    if test ! -d $dir
        mkdir -p $dir
    end
    echo $file; sudo -u git git show $branch:$file > $file
end

This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.

oddmu

But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.

I think the better way forward is to move this information elsewhere. The README is not the documentation.

And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.