In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.
For I while I thought that `legit` was the answer. It was nice and fast and all that. But recently, `git clone` no longer worked. It calls `git upload-pack` as an intermediate workaround for #1062. This was failing for some reason, however. I tinkered with it for a while but didn't get anywhere.
Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos. Time to fiddle with the Apache config.
I changed my site to the following in order to just serve `/home/git` from a subdomain:
<VirtualHost *:443> ServerAdmin alex@alexschroeder.ch ServerName src.alexschroeder.ch Include conf-enabled/blocklist.conf SSLEngine on DocumentRoot /home/git <Directory /home/git> Options Indexes AllowOverride All Require all granted </Directory> </VirtualHost>
For this to work, you need a `post-update` hook that calls `git update-server-info`.
Knowing that I'm also going to serve the bare git repositories via the web, the hook also needs to generate an `index.html` file.
Furthermore, given that the repositories have a `description` file, I update `/home/git/.htaccess` accordingly.
I prepared the hook that I want to install in every repository and saved it as `/home/git/.post-update`.
This is what it looks like, written in Perl:
#!/usr/bin/perl use Modern::Perl; use File::Slurper qw(read_text write_text); use File::Temp qw(tempfile); use Encode qw(decode_utf8); use Cwd; qx(/usr/bin/git update-server-info); # create index.html my $branch = qx(git branch --show-current); chomp $branch; my $template = read_text("/home/git/.readme.html"); my $dir = getcwd; my $title = $dir; $title =~ s/\.git$//; my $body = decode_utf8(qx(/usr/bin/git show $branch:README.md)); my ($fh, $filename) = tempfile(SUFFIX => '.md'); write_text($filename, $body); my $pagename = substr($filename, 0, -3); my $html = decode_utf8(qx(/home/oddmu/oddmu html $pagename)); unlink($filename); write_text("index.html", sprintf($template, $title, $html, $title, $title)); # update description if (-r "description") { my $description = read_text("description"); chomp $description; my $htaccess = read_text("/home/git/.htaccess"); write_text("/home/git/.htaccess~", $htaccess); my @lines = grep { !/ $title\.git$/ } split(/\n/, $htaccess); push(@lines, "AddDescription \"$description\" $title.git"); my (@new, @descriptions); for my $line (@lines) { if ($line =~ /^AddDescription .* (\S+\.git)$/) { push(@descriptions, [length($1), $line]); } else { push(@new, $line); } } for my $description (sort { $b->[0] <=> $a->[0] } @descriptions) { push(@new, $description->[1]); } write_text("/home/git/.htaccess", join("\n", @new)); }
I turn Markdown into HTML using oddmu but feel free to use some other command-line tool like `cmark`.
Next, I created a symlink in every git repository's `hooks` directory.
Using the Fish shell, assuming that `/home/git` is where all the repositories are, owned by the user `git`, and that you're using the root account:
for d in *.git; sudo -u git ln -sf /home/git/.post-update $d/hooks/post-update; end
The hook uses `/home/git/.readme.html` as a template for the `index.html` file.
This is what it looks like:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="format-detection" content="telephone=no"> <meta name="viewport" content="width=device-width"> <title>%s</title> <style> html { max-width: 70ch; padding: 1ch; margin: auto; } body { hyphens: auto; } </style> </head> <body> <nav> <a href="https://src.alexschroeder.ch/">Source code repositories</a> </nav> <main> %s</main> <footer> <h2>Clone</h2> <pre> <mark>git clone https://src.alexschroeder.ch/%s.git</mark> </pre> <h2>Contact</h2> <p>If you like it, send an email to Alex Schroeder <<a href="mailto:alex@gnu.org?subject=I+like+%s">alex@gnu.org</a>> ❤️ </footer> </body> </html>
I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.
The `/home/git` directory has an `.htaccess` file that starts out containing the following:
HeaderName .top.html IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase IndexOrderDefault Descending Date IndexIgnore *~ .* Makefile IndexHeadInsert "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">" IndexOptions Charset=UTF-8
The rest of the file is all the AddDescription directives added by the `post-update` hook.
The `.top.html` file contains a fragment to add to the top of the index:
<style> body { max-width: 80ch } table { overflow-x: auto } td { padding: 0.5ex 1em 0 0; white-space: nowrap } td:nth-child(3) { white-space: wrap } </style> <h1>Source code repositories</h1> <p> Hello! </p> <p> I'm Alex Schroeder. These are my source code repositories. You can find out more about me on <a href="https://alexschroeder.ch/">my blog</a>. There, you'll also find a page listing ways to <a href="https://alexschroeder.ch/view/Contact">contact me</a>. </p> <p> As for the git repositories, you should be able to clone them as they are. For example: </p> <pre> git clone https://src.alexschroeder.ch/oddmu.git </pre> <p> For more about this setup, see <a href="https://idiomdrottning.org/hosting-git-repos">How to host git repos</a> by <a href="https://idiomdrottning.org/users/Sandra">@Sandra</a> and my post, <a href="https://alexschroeder.ch/edit/2024-08-11-bare-git">2024-08-11 Serving bare git on the web</a>. </p>
The only thing that's strange is that this lists all the repositories by the last modification date of the `index.html` file contained within. That's not good.
I ended looping through all the directories a few times as I kept finding bugs in my `post-update` hook, so I ended up writing a `Makefile`. That's the reason `Makefile` is listed in the `IndexIgnore` directive for Apache, above.
This is the `Makefile`:
SHELL=/usr/bin/fish # Regenerate the index.html files. Set their modification time because # it looks like FancyIndex uses the index.html modification date. update-indexes: for f in *.git; \ cd "$f"; \ sudo -u git hooks/post-update; \ sudo -u git git log -1 --format='%at' \ | xargs -I{} date -d @{} '+%Y-%m-%d %H:%M:%S' \ | xargs -I{} touch index.html --date {}; \ cd ..; \ end
So now this will regenerate all the `index.html` files:
make
In any case, now we're done.
Take a look!
#Butlerian Jihad #Git #Administration
If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:
for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)') set dir (dirname $file) if test ! -d $dir mkdir -p $dir end echo $file; sudo -u git git show $branch:$file > $file end
This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.
But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.
I think the better way forward is to move this information elsewhere. The README is not the documentation.
And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.