💾 Archived View for gem.twunk.uk › log › 2023-05-08-this-gemini-server.gmi captured on 2024-02-05 at 09:32:05. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-05-24)

-=-=-=-=-=-=-

This Gemini Server (part one)

I've been at least a little bit interested in Gemini since I first found out about it. Originally I was going to write a Gemini client, and I started doing that years ago, but never finished.

This time I wrote a Gemini server instead. It's really trivial. It's written in Rust and uses various crates for things:

Rustls for the TLS implementation

TLS / x.509 self-signed certificate generation

Async runtime

Tokio support for Rustls

URL parsing (I might drop this)

Argument parsing

URL percent decoding

And various others.

It just serves files straight from the filesystem. I haven't done anything fancy with splice(2) to serve more efficiently (thinking about it I guess that would need kTLS anyway, which I also haven't touched).

It applies a request and response timeout.

I haven't put a concurrency limit in place yet, but maybe I will at some point.

Lessons from this implementation

Protocol

Firstly, Gemini really really is a very simple protocol to implement. My first implementation didn't use any async, it just spawned a thread for each connection, and frankly that would have been totally fine to stick with.

Since each request is on its own connection, with its own TLS session, and there's no keep-alive to receive multiple requests on a connection, there is basically no connection management to do. And since the request is a single URL on a single CR-LF terminated line, the request parsing is pretty easy.

Oh... but request parsing is actually not *quite* so easy. It's easy except that you still need to do percent-decoding and perhaps normalise the request. And for a "file server" type server it's important that it won't try to read outside its designated serving path so you have to carefully map from the requested path to a file-system path. I'm not sure I've done that fully safely yet.

Building and deploying the server

My little VPS host runs Debian 11 (Bullseye), which has glibc... I don't know but an older version than my personal machines run. Rust statically links crate stuff, but it does dynamically link to glibc (which is correct as far as I know - static linking glibc seems bad maybe?). So just making a release build on my local machine and then uploading the binary didn't work: glibc not compatible.

To build it, I use the Debian 11 docker image and build inside a container using podman. This is fine, but I haven't automated it yet. I basically just do:

output into .cargo/config.toml so the vendored copies of stuff will be used.

I would like the container stuff to all be done on tmpfs but I haven't looked up all the correct incantations for that yet.

Anyway, the result is a binary build with the latest stable rust but on a Debian 11 system with Debian 11's GCC and GLIBC. So that works great and I can just copy the binary up to my little VPS and run it.

Then, server setup on the VPS:

Async in Rust

Rust async remains a mixture of really great and nice to use, but also kind of annoying in the number of dependencies you need to pull in and in the types used.

The best way I've found of dealing with it continues to be to follow the old pure functional programming guidance: Write a pure functional core (which doesn't need to care about async at all but doesn't do any I/O itself), and then wrap that with an async I/O 'driving' layer in the code that does all the messing around with async, futures, timeouts, read/write stuff.

Possible future stuff

It might be nice for the server to cache file content? Except... maybe not, since the operating system already caches file content so it's only important if the request rate is high enough that caching in the server memory lets you skip some system calls.

It might be nice to make the server a systemd "socket activated unit" which would mean it could run in quite an isolated environment.

It might be nice to find a way for the server to load its certificate without needing to retain access to that part of the filesytem. That way the server could be run in an environment in which the only files it can see at all (the only files in its mount namespace) are the files it's allowed to serve.

It might be nice for the server to include some more dynamic stuff, e.g., I'll probably extend the existing /info endpoint to show some server stats.

It would be good to clean up the logging a lot; it's a total mess right now.

Part Two