[spec] IRIs, IDNs, and all that international jazz

🗣️ From: spinner (gemini (a) stillspinning.cc)
📅 Sent: 2020-12-25 09:02
📧 Message 57 of 109
> I briefly experimented with percent-encoded Japanese and Norwegian
addresses on some of my capsules, but quickly gave up and went back to pure
ASCII.
> *Not* because typing in percent-encoded names was annoying, but because I
realized how hard it was to verbally convey my Japanese addresses to my
Norwegian friends and vice versa. The de facto universality of ASCII might
something to embrace, not something to run away from, if we want to be
serious about being inclusive.

Verbally conveying addresses doesn't seem like a situation to optimize for;
doesn't seem to happen so often, at least in my life as a Japanese-speaking
internet user. Even among such occasions among future gemininauts, I
conjecture that, most of the time, both parties will speak Japanese and the
address can be quickly spelled out in Japanese.

For end-users, reading, following and writing links probably will be the
most likely ways you interact with URLs.

1. Read/follow links with a user-friendly name/title: If the URL is
non-ascii: Encoding of the URL may not matter much, since it will be
hidden. If the client is capable of showing the URL upon focus or
something, showing it in unicode is far more accessible that
percent-encoding
2. Read/follow links with bare URL: If the URL is non-ascii: more
accessible to be able to read the URL in its non-ascii form
3. Write links to URLs that I control: More inclusive and convenient to be
able to use and write URLs using the script that I'm used to.
4. Write links to URLs that I don't control: It'll be more
accessible/convenient to be able to write the URL in non-ascii characters.
Copying a non-ascii URL off of a web browser's address bar will probably
percent-encode it (just tried it on desktop Chrome), but I shouldn't have
to rely on such tools.

While embracing ASCII may work when we have control over URLs we read and
write, it falls short in terms of accessibility when linking to, say,
Wikipedia, which uses non-ascii page names.

If the aim is to support i18n/inclusivity as a principle/ideal/a 100%
thing, adopting standards such as IRI/IDN(/ASCII) may make sense; if the
motivation is out of practical concerns (whether people will find
themselves reading and writing non-ascii URLs a lot and we want to make
their lives easier in that case), having clients percent-encode path
components before sending requests may suffice for now..?

>From my standpoint, chances/expectations of a particular component of a URL
having non-ascii characters:

- protocol: none
- domain: 2% of the time (8.3 million IDNs [1] / total domain names 370.7
million [2]) - but, for me, nearly none in practice. I suppose it depends
on the person
- path/query/fragment: fairly often, since I use (Japanese) Wikipedia a lot

[1] https://idnworldreport.eu/ (2020 Q1)
[2] https://www.verisign.com/en_US/domain-names/dnib/index.xhtml (2020 Q3)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201225/c9e4
369d/attachment.htm>
---
Previous in thread (56 of 109): 🗣️ John Cowan (cowan (a) ccil.org)
Next in thread (58 of 109): 🗣️ bie (bie (a) 202x.moe)
View entire thread.