💾 Archived View for gemini.bortzmeyer.org › gemini › iri.gmi captured on 2024-02-05 at 09:34:12. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

IRI in Gemini

Currently (january 2021), the specification seems silent about IRI (Internationalized Resource Identifiers, RFC 3987). It just says "<URL> is a UTF-8 encoded absolute URL" which is absurd (URI must be in US-ASCII). Handling IRI would require more than that, as well as practical advices for software authors.

Issue #1 in the specification work

Gemini current specification

RFC 3986 on URI syntax

RFC 3987 on IRI syntax

RFC 5890 on IDN (domain names in Unicode)

What should programs do?

It is not clear what servers and clients should do (send an IRI, or accept IRI but convert it to URI or something else). A test with some clients seem to indicate it does not always work.

Testing server, with an IDN in the name (e with accent).

Testing server, an IRI with an IDN and a non-ASCII character in the path

The server at the end is (january 2021) a Gemserv. The domain name was configured in Punycode ('hostname = "xn--gmeaux-bva.bortzmeyer.org"' in config.toml).

The Gemserv server

Currently (january 2021):

Proposals

Accept IRI as first-class citizens

This is more natural for a new protocol, free of HTTP legacy. Limit Punycode to the minimum (the current state of the domain name tree requires Punycode for DNS lookups).

Many software libraries already do so automatically.

Remaining issues:

RFC 5198 on a canonical Internet form of Unicode

Use Punycode and percent-encoding for everything

Another proposal is to convert all IDNs to Punycode before putting them on the wire, whether in DNS traffic or in Gemini traffic. In that case, the server is configured with a Punycode. Same thing for the path in the URI, use percent-encoding (café → caf%C3%A9). This is how the test server above is configured and it works with Lagrange and Agunua.

Lagrange

Agunua

Do nothing

This page would become illegal, with its IRI. In this proposal, gemtext (text/gemini files) would have to use US-ASCII URI only.

Links

Solderpunk's summary of the three proposals

His #1 solution is my "Do nothing", his #2 is "Use Punycode and percent-encoding for everything" and his #3 is "Accept IRI as first-class citizens".

RFC 3492 On Punycode

RFC 8399, IDN in certificates

The Gemini specification

[Web] The issue in the Go-gemini library