💾 Archived View for skyjake.fi › gemlog › 2020-12_idns-in-lagrange.gmi captured on 2022-06-03 at 23:24:32. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-03)

➡️ Next capture (2023-01-29)

-=-=-=-=-=-=-

IDNs in Lagrange v0.13

Those who follow the Gemini mailing list may have noticed a message or two about IDNs and IRIs. This is the first time I'm taking a deeper look at this stuff, so here is what I've learned.

i18n

When it comes to Internationalized Domain Names, I have been blissfully unaware that it basically relies on a kludge that requires applying a complicated, special encoding to convert Unicode domains to a small-ish ASCII representation. Well, RFC 3492 is 17 years old so this is surely something that happens under the hood, a minor implementation detail in the OS? Alas, internationalization has been left to the application layer to worry about, so it needs to be handled manually.

Since Gemini allows UTF-8 encoded URLs, implementing RFC 3492 is virtually a requirement. Otherwise, one cannot make DNS lookups if the domain name contains non-ASCII characters.

As to the rest of the URL, the story is a bit simpler: normalization and escaping reserved characters. The former is needed because Unicode has multiple ways to represent the same character. Applications that deal with UTF-8 already need to use some sort of a Unicode library to actually conform to the standard. Such a library should have routines for normalization so that's one problem that's easy to deal with. (Lagrange uses GNU libunistring.) The other issue is handled by percent-encoding reserved characters, which is also straightforward.

All these encodings and translations should happen automatically and transparently.

Have some URLs with ❤️

Lagrange v0.13 embraces Unicode in both domain names and URL paths:

blekksprut.net with CJK characters (screenshot)

Text rendering

Speaking of Unicode, actually rendering it on screen is not straightforward at all. Lagrange uses custom text rendering routines that currently only support left-to-right text. A small number of special Unicode codepoints are recognized and handled (such as soft hyphens) but many are just ignored, for example variation selectors.

Version 0.13 has a bunch of improvements for text rendering:

Lagrange: features, downloads, what's new

skyjake

📅 2020-12-13

🏷 Lagrange

CC-BY-SA 4.0

skyjake's Gemlog