💾 Archived View for lupine.agency › gemlog › 2023-04-17.gmi captured on 2023-05-24 at 17:43:24. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
April 17, 2023
Today I added anchor links for headers to Galileo, the Gemini to WWW proxy I use for this website.
The implementation is probably not the best, as it is dependent on the standard library's handling of multibyte[a] and wide strings (and therefore is dependent on the operating system's locale) but it works well enough that I am not scared of running it on my server.
[a] Null-terminated multibyte strings - cppreference.com
At first I was using a separate UTF-8 implementation, using Björn Höhrmann's "Flexible and Economical" UTF-8 Decoder[b] for... well, UTF-8 decoding and my own function for classifying characters into blank and punctuation as well as case folding.
[b] Flexible and Economical UTF-8 Decoder
This worked well... until it didn't. The program kept segfaulting when stumbling upon actually UTF-8 stuff and I couldn't figure out for the love of God why it was happening, so... I stopped debugging.
I thought:
Hm, the standard C library *does* have support for wide character strings AND multibyte character strings... As long as the proxy is run with an UTF-8 locale, it should just workâ„¢!
First, I had to check whether the proxy actually set the locale or not, luckily, it does right when it starts up. Phew!
Quoting the setlocale(3) manpage: "By default, C programs start in the 'C' locale."
Technically, you shouldn't depend on the operating system having the right locale set or whatever, blah blah, but I am using OpenBSD (the best OS ever made) which does not have non-UTF-8 locales (aside from the "C" locale, I guess, which you shouldn't be using anyways!) so it's not a big deal.
Here comes the breaker:
Probably not. I might not use it, and the code might not be suitable to contribute upstream because it depends on the user setup being in a certain way, or whatever.
But I did it anyways.