💾 Archived View for gemini.bortzmeyer.org › gemini › iri.gmi captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Currently (january 2021), the specification seems silent about IRI (Internationalized Resource Identifiers, RFC 3987). It just says "<URL> is a UTF-8 encoded absolute URL" which is absurd (URI must be in US-ASCII). Handling IRI would require more than that, as well as practical advices for software authors.
Issue #1 in the specification work
RFC 5890 on IDN (domain names in Unicode)
It is not clear what servers and clients should do (send an IRI, or accept IRI but convert it to URI or something else). A test with some clients seem to indicate it does not always work.
Testing server, with an IDN in the name (e with accent).
Testing server, an IRI with an IDN and a non-ASCII character in the path
The server at the end is (january 2021) a Gemserv. The domain name was configured in Punycode ('hostname = "xn--gmeaux-bva.bortzmeyer.org"' in config.toml).
Currently (january 2021):
This is more natural for a new protocol, free of HTTP legacy. Limit Punycode to the minimum (the current state of the domain name tree requires Punycode for DNS lookups).
Many software libraries already do so automatically.
Remaining issues:
RFC 5198 on a canonical Internet form of Unicode
Another proposal is to convert all IDNs to Punycode before putting them on the wire, whether in DNS traffic or in Gemini traffic. In that case, the server is configured with a Punycode. Same thing for the path in the URI, use percent-encoding (café → caf%C3%A9). This is how the test server above is configured and it works with Lagrange and Agunua.
This page would become illegal, with its IRI. In this proposal, gemtext (text/gemini files) would have to use US-ASCII URI only.
Solderpunk's summary of the three proposals
His #1 solution is my "Do nothing", his #2 is "Use Punycode and percent-encoding for everything" and his #3 is "Accept IRI as first-class citizens".