2024-08-11 Serving bare git on the web

This page describes the setup of src.alexschroeder.ch. That's where I host public source repositories.

src.alexschroeder.ch

In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.

Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos. Time to fiddle with the Apache config.

hosting git repos

I changed my site to the following in order to just serve `/home/git` from a subdomain:

<VirtualHost *:443>
    ServerAdmin alex@alexschroeder.ch
    ServerName src.alexschroeder.ch
    Include conf-enabled/blocklist.conf
    SSLEngine on
    DocumentRoot /home/git
    <Directory /home/git>
        Options Indexes
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>

For this to work, you need a `post-update` hook that calls `git update-server-info`.

Knowing that I'm also going to serve the bare git repositories via the web, the hook also needs to generate an `index.html` file.

Furthermore, given that the repositories have a `description` file, I update `/home/git/.htaccess` accordingly.

I prepared the hook that I want to install in every repository and saved it as /home/git/.post-update.

/home/git/.post-update

This is what it looks like, written in Perl.

#!/usr/bin/perl
use Modern::Perl;
use File::Slurper qw(read_text write_text);
use File::Temp qw(tempfile);
use Encode qw(decode_utf8);
use Cwd;

qx(/usr/bin/git update-server-info);

# create index.html
my $branch = qx(git branch --show-current);
chomp $branch;
my $template = read_text("/home/git/.readme.html");
my $dir = getcwd;
my $title = $dir;
$title =~ s/\.git$//;
my $body = decode_utf8(qx(/usr/bin/git show $branch:README.md));
my ($fh, $filename) = tempfile(SUFFIX => '.md');
write_text($filename, $body);
my $pagename = substr($filename, 0, -3);
my $html = decode_utf8(qx(/home/oddmu/oddmu html $pagename));
unlink($filename);
write_text("index.html", sprintf($template, $title, $html, $title, $title));

# update description
if (-r "description") {
    my $description = read_text("description");
    chomp $description;
    my $htaccess = read_text("/home/git/.htaccess");
    write_text("/home/git/.htaccess~", $htaccess);
    my @lines = grep { !/ $title\.git$/ } split(/\n/, $htaccess);
    push(@lines, "AddDescription \"$description\" $title.git");
    my (@new, @descriptions);
    for my $line (@lines) {
	if ($line =~ /^AddDescription .* (\S+\.git)$/) {
	    push(@descriptions, [length($1), $line]);
	} else {
	    push(@new, $line);
	}
    }
    for my $description (sort { $b->[0] <=> $a->[0] } @descriptions) {
	push(@new, $description->[1]);
    }
    write_text("/home/git/.htaccess", join("\n", @new));
}

I turn Markdown into HTML using oddmu but feel free to use some other command-line tool like `cmark`.

oddmu

Next, I created a symlink in every git repository's `hooks` directory.

Using the Fish shell, assuming that `/home/git` is where all the repositories are, owned by the user `git`, and that you're using the root account:

for d in *.git; sudo -u git ln -sf /home/git/.post-update $d/hooks/post-update; end

The hook uses /home/git/.readme.html as a template for the `index.html` file.

/home/git/.readme.html

This is what it looks like:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="format-detection" content="telephone=no">
    <meta name="viewport" content="width=device-width">
    <title>%s</title>
    <style>
html { max-width: 70ch; padding: 1ch; margin: auto; }
body { hyphens: auto; }
    </style>
  </head>
  <body>
    <nav>
      <a href="https://src.alexschroeder.ch/">Source code repositories</a>
    </nav>
    <main>
%s</main>
    <footer>
      <h2>Clone</h2>
      <pre>
<mark>git clone https://src.alexschroeder.ch/%s.git</mark>
      </pre>
      <h2>Contact</h2>
      <p>If you like it, send an email to Alex Schroeder &lt;<a href="mailto:alex@gnu.org?subject=I+like+%s">alex@gnu.org</a>&gt; ❤️
    </footer>
  </body>
</html>

I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.

The `/home/git` directory has an `.htaccess` file that starts out containing the following:

HeaderName .top.html
IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase
IndexOrderDefault Descending Date
IndexIgnore *~ .* Makefile
IndexHeadInsert "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"
IndexOptions Charset=UTF-8

The rest of the file is all the AddDescription directives added by the `post-update` hook.

The /home/git/.top.html file contains a fragment to add to the top of the index:

/home/git/.top.html

<style>
body { max-width: 80ch }
table { overflow-x: auto }
td { padding: 0.5ex 1em 0 0; white-space: nowrap }
td:nth-child(3) { white-space: wrap }
</style>
<h1>Source code repositories</h1>
<p>
  Hello!
</p>
<p>
  I'm Alex Schroeder.
  These are my source code repositories. You can find out more about me on
  <a href="https://alexschroeder.ch/">my blog</a>. There, you'll also find a page
  listing ways to <a href="https://alexschroeder.ch/view/Contact">contact me</a>.
</p>
<p>
  As for the git repositories, you should be able to clone them as they are.
  For example:
</p>
<pre>
  git clone https://src.alexschroeder.ch/oddmu.git
</pre>
<p>
  For more about this setup, see
  <a href="https://idiomdrottning.org/hosting-git-repos">How to host git repos</a>
  by <a href="https://idiomdrottning.org/users/Sandra">@Sandra</a> and my post,
  <a href="https://alexschroeder.ch/edit/2024-08-11-bare-git">2024-08-11 Serving bare git on the web</a>.
</p>

The only thing that's strange is that this lists all the repositories by the last modification date of the `index.html` file contained within. That's not good.

I ended looping through all the directories a few times as I kept finding bugs in my `post-update` hook, so I ended up writing a `Makefile`. That's the reason `Makefile` is listed in the `IndexIgnore` directive for Apache, above.

This is the `Makefile`:

SHELL=/usr/bin/fish

# Regenerate the index.html files. Set their modification time because
# it looks like FancyIndex uses the index.html modification date.
update-indexes:
	for f in *.git; \
	  cd "$f"; \
	  sudo -u git hooks/post-update; \
	  sudo -u git git log -1 --format='%at' \
	   | xargs -I{} date -d @{} '+%Y-%m-%d %H:%M:%S' \
	   | xargs -I{} touch index.html --date {}; \
	  cd ..; \
	end

So now this will regenerate all the `index.html` files:

make

In any case, now we're done.

#Butlerian Jihad #Git #Administration

*2024-08-12**. I wondered about links from the README to local files. Right now, linking to images and files hosted in the same repository doesn't work since they don't exist in the raw repository. The question then becomes, as far as I am concerned, whether this README is supposed to speak to developers or end-users? If it is for developers, then pictures, screenshots, PDF files and all of that don't need to be linked from the repository. If you are interested in these things, do a `git clone --depth 1` and investigate locally.

If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:

for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)')
    set dir (dirname $file)
    if test ! -d $dir
        mkdir -p $dir
    end
    echo $file; sudo -u git git show $branch:$file > $file
end

This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.

oddmu

But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.

I think the better way forward is to move this information elsewhere. The README is not the documentation.

And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.