> I briefly experimented with percent-encoded Japanese and Norwegian addresses on some of my capsules, but quickly gave up and went back to pure ASCII. > *Not* because typing in percent-encoded names was annoying, but because I realized how hard it was to verbally convey my Japanese addresses to my Norwegian friends and vice versa. The de facto universality of ASCII might something to embrace, not something to run away from, if we want to be serious about being inclusive. Verbally conveying addresses doesn't seem like a situation to optimize for; doesn't seem to happen so often, at least in my life as a Japanese-speaking internet user. Even among such occasions among future gemininauts, I conjecture that, most of the time, both parties will speak Japanese and the address can be quickly spelled out in Japanese. For end-users, reading, following and writing links probably will be the most likely ways you interact with URLs. 1. Read/follow links with a user-friendly name/title: If the URL is non-ascii: Encoding of the URL may not matter much, since it will be hidden. If the client is capable of showing the URL upon focus or something, showing it in unicode is far more accessible that percent-encoding 2. Read/follow links with bare URL: If the URL is non-ascii: more accessible to be able to read the URL in its non-ascii form 3. Write links to URLs that I control: More inclusive and convenient to be able to use and write URLs using the script that I'm used to. 4. Write links to URLs that I don't control: It'll be more accessible/convenient to be able to write the URL in non-ascii characters. Copying a non-ascii URL off of a web browser's address bar will probably percent-encode it (just tried it on desktop Chrome), but I shouldn't have to rely on such tools. While embracing ASCII may work when we have control over URLs we read and write, it falls short in terms of accessibility when linking to, say, Wikipedia, which uses non-ascii page names. If the aim is to support i18n/inclusivity as a principle/ideal/a 100% thing, adopting standards such as IRI/IDN(/ASCII) may make sense; if the motivation is out of practical concerns (whether people will find themselves reading and writing non-ascii URLs a lot and we want to make their lives easier in that case), having clients percent-encode path components before sending requests may suffice for now..? >From my standpoint, chances/expectations of a particular component of a URL having non-ascii characters: - protocol: none - domain: 2% of the time (8.3 million IDNs [1] / total domain names 370.7 million [2]) - but, for me, nearly none in practice. I suppose it depends on the person - path/query/fragment: fairly often, since I use (Japanese) Wikipedia a lot [1] https://idnworldreport.eu/ (2020 Q1) [2] https://www.verisign.com/en_US/domain-names/dnib/index.xhtml (2020 Q3) -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201225/c9e4 369d/attachment.htm>
---
Previous in thread (56 of 109): 🗣️ John Cowan (cowan (a) ccil.org)