In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.
For I while I thought that `legit` was the answer. It was nice and fast and all that. But recently, `git clone` no longer worked. It calls `git upload-pack` as an intermediate workaround for #1062. This was failing for some reason, however. I tinkered with it for a while but didn't get anywhere.
Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos. I made some changes to my Apache config and now `git clone` works again.
The key is that you need a `post-update` hook that calls `git update-server-info`. Each git repository already comes with a `post-update.sample` hook containing the necessary code, so I needed to loop over all the bare repositories I had and rename the example hook.
Using the Fish shell:
for d in *.git sudo -u git mv $d/hooks/post-update.sample $d/hooks/post-update end
Sadly, this is not good enough.
In order to generate an `index.html` file for every repository, I need a hook that regenerates it. If you know how to determine whether regeneration can be skipped, I'd love to hear how to that.
This hook also updates or adds the `AddDescription` lines I need.
I prepared a hook that I wanted to install in every repository and saved it as `~/post-update`.
This is what it looks like, using the Fish shell:
#!/usr/bin/fish git update-server-info # create index.html set branch (git branch --show-current) set template (cat /home/git/.readme.html | string collect) set title (basename (pwd)) set body (git show $branch:README.md | cmark --to html | string collect) printf "$template" "$title" "$body" "$title" > index.html # update description set description (cat description) sed --in-place=~ --expression "/ $title/d" /home/git/.htaccess printf "AddDescription \"$description\" $title\n" >> /home/git/.htaccess
(I need the title twice, once for the title and once for the reminder on how to clone.)
I turn Markdown into HTML using `cmark`. Common Mark is the closest we have to a standard, I guess.
The template `/home/git/.readme.html` looks like this:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="format-detection" content="telephone=no"> <meta name="viewport" content="width=device-width"> <title>%s</title> <style> html { max-width: 70ch; padding: 1ch; margin: auto; } body { hyphens: auto; } </style> </head> <body> <nav> <a href="https://src.alexschroeder.ch/">Source code repositories</a> </nav> <main> %s</main> <footer> <h2>Clone</h2> <pre> <mark>git clone https://src.alexschroeder.ch/%s</mark> </pre> </footer> </body> </html>
So now I needed to distribute the `post-update` hook to every repository and run it once. I wrote yet another Fish script, `~/recreate-index`:
#!/usr/bin/fish for d in /home/git/*.git echo $d cd $d cp ~/post-update hooks/ chown git:git hooks/post-update chmod 775 hooks/post-update sudo -u git git hook run post-update end
I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.
Now, for the Apache web server – I changed my site to the following:
<VirtualHost *:443> ServerAdmin alex@alexschroeder.ch ServerName src.alexschroeder.ch Include conf-enabled/blocklist.conf SSLEngine on DocumentRoot /home/git <Directory /home/git> Options Indexes AllowOverride All Require all granted </Directory> </VirtualHost>
The `/home/git` directory has an `.htaccess` file that starts out containing the following:
HeaderName .top.html IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase IndexOrderDefault Descending Date IndexHeadInsert "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">" IndexOptions Charset=UTF-8
The `.top.html` file contains a fragment to add to the top of the index:
<style> body { max-width: 80ch } table { overflow-x: auto } td { padding: 0.5ex 1em 0 0; white-space: nowrap } td:nth-child(3) { white-space: wrap } </style> <h1>Source code repositories</h1> <p> Hello! </p> <p> I'm Alex Schroeder. These are my source code repositories. You can find out more about me on <a href="https://alexschroeder.ch/">my blog</a>. There, you'll also find a page listing ways to <a href="https://alexschroeder.ch/view/Contact">contact me</a>. </p> <p> As for the git repositories, you should be able to clone them as they are. For example: </p> <pre> git clone https://src.alexschroeder.ch/oddmu.git </pre> <p> For more about this setup, see <a href="https://idiomdrottning.org/hosting-git-repos">How to host git repos</a> by <a href="https://idiomdrottning.org/users/Sandra">@Sandra</a> and my post, <a href="https://alexschroeder.ch/edit/2024-08-11-bare-git">2024-08-11 Serving bare git on the web</a>. </p>
Take a look!
#Butlerian Jihad #Git #Administration
If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:
for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)') set dir (dirname $file) if test ! -d $dir mkdir -p $dir end echo $file; sudo -u git git show $branch:$file > $file end
This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.
But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.
I think the better way forward is to move this information elsewhere. The README is not the documentation.
And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.