Three possible uses for IRIs

On Tue, Dec 8, 2020 at 2:09 PM <colecmac at protonmail.com> wrote:


> I think some people really were calling for a breaking change to the
> protocol.
> But I'm glad you're not, and I hope we can move on and stop talking about
> it.
> What you propose here is allowing IRIs in link lines only?


Yes.

> Or do you mean allowing
> only IRIs for relative references?
>

No.

> I'm unsure whether that would require an IRI parser or not,


It will not, because conversion can be done before parsing, other than the
trivial parsing required to find the hostname and punycode it.  Once that
is done, converting an IRI reference to a URI reference is as
straightforward as transcoding from one character set to another, and
totally indifferent to the IRI format.  So my two steps for IRI->URI
conversion become three:

1)  NFC normalization.

2) Punycode conversion of the hostname.

3) Percent-encoding: find non-ASCII characters and convert them to %nn%nn,
or %nn%nn%nn, or %nn%nn%nn%nn sequences, where nn is two hex digits.

It turns out that all of this is spelled out in more detail at <
https://tools.ietf.org/html/rfc3987#section-3.1>.  That section says not to
normalize unless you have the IRI in non-digital or non-UTF* format, but
since the world is not full of editors that normalize, I think Gemini
clients need to do it themselves.   That said, most keyboard drivers (even
for hard cases like Vietnamese, which has way too many vowels to dedicate a
key to each) now deliver normalized text to applications.

It's good to know that some existing URI libraries support IRIs, but that
section should be convincing evidence that you can change an IRI to a URI
without parsing it (always excepting the domain name, which is trivial to
find).

But this is still a somewhat-breaking change, as once authors start using
> these, other non-Go clients will likely begin to fail. And the correction
> that Go does is not even complete, because it will not work on query
> strings.
> And even if it did, it would not work in the Gemini way that doesn't allow
> pluses, etc etc.
>

The above transformation will work, however.  Sometimes DIY is the Right
Thing.

> We're almost there with this one, but I still think it's a mistake, and
> it'll
> make Gemini more complex. :/
>

It will.  But in the end, if Gemini succeeds even modestly there will be
more authors than programmers.

[*] 72 lower-case vowel letters: 6 vowels without diacritics plus 6 vowels
with vowel-quality diacritics, as in French, times 6 tone marks (one of
which is "no mark") as in Chinese.  And the same number in upper case.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
It's like if you meet an really old, really rich guy covered in liver
spots and breathing with an oxygen tank, and you say, "I want to be
rich, too, so I'm going to start walking with a cane and I'm going to
act crotchety and I'm going to get liver disease. --Wil Shipley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201208/6720
d39c/attachment.htm>

---

Previous in thread (22 of 32): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (24 of 32): 🗣️ colecmac (a) protonmail.com (colecmac (a) protonmail.com)

View entire thread.