💾 Archived View for lupine.agency › gemlog › 2023-04-17.gmi captured on 2023-05-24 at 17:43:24. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Adding header anchor links to Galileo

April 17, 2023

Today I added anchor links for headers to Galileo, the Gemini to WWW proxy I use for this website.

The implementation is probably not the best, as it is dependent on the standard library's handling of multibyte[a] and wide strings (and therefore is dependent on the operating system's locale) but it works well enough that I am not scared of running it on my server.

[a] Null-terminated multibyte strings - cppreference.com

At first I was using a separate UTF-8 implementation, using Björn Höhrmann's "Flexible and Economical" UTF-8 Decoder[b] for... well, UTF-8 decoding and my own function for classifying characters into blank and punctuation as well as case folding.

[b] Flexible and Economical UTF-8 Decoder

This worked well... until it didn't. The program kept segfaulting when stumbling upon actually UTF-8 stuff and I couldn't figure out for the love of God why it was happening, so... I stopped debugging.

I thought:

Hm, the standard C library *does* have support for wide character strings AND multibyte character strings... As long as the proxy is run with an UTF-8 locale, it should just workâ„¢!

First, I had to check whether the proxy actually set the locale or not, luckily, it does right when it starts up. Phew!

Quoting the setlocale(3) manpage: "By default, C programs start in the 'C' locale."

Technically, you shouldn't depend on the operating system having the right locale set or whatever, blah blah, but I am using OpenBSD (the best OS ever made) which does not have non-UTF-8 locales (aside from the "C" locale, I guess, which you shouldn't be using anyways!) so it's not a big deal.

Here comes the breaker:

Should I have spent that much time trying to make it Unicode-aware?

Probably not. I might not use it, and the code might not be suitable to contribute upstream because it depends on the user setup being in a certain way, or whatever.

But I did it anyways.