Philip Linde writes: > On Mon, 07 Dec 2020 20:06:27 -0800 > "Emma Humphries" <ech at emmah.net> wrote: > >> I'm perplexed that "ease of programming" is considered more important than "ease of adoption." > > Consider "ease of programming" and in particular stability a subset of > "ease of adoption". There are numerous client and server > implementations because it is easy to implement, and easy to maintain > because the protocol is relatively stable even in these early stages. > The different software allows people with different goals to adopt the > protocol, and helps in weeding out shortcomings of clarity in the > specification by analysis of the subtle differences between > implementations. > >> You mention that not every language supports the libraries needed for internationalized URLs. >> >> What does that lose the project vs. accessibility and broader adoption by non-English-speaking users for who Gemini would be a boon with limited bandwidth and hardware? > > It seems more likely that a change to this end would hurt adoption. > Numerous pieces of existing Gemini software would immediately be > invalidated. Not all of them will be updated to accommodate the change. > I could perhaps see a more pressing need for the change if internet > users worldwide weren't already used to transliteration. It's such a > small part as well. UTF-8 is acceptable (and default) in text/gemini > documents, and the text content of a capsule can indeed be written in > any of the scripts supported by Unicode. Hey, I'm new to this list, and a new Gemini user, but this topic is fairly important to me. It's discouraging to see a lot of fear-mongering around this topic already. Some points that have come up a few times already in this thread as well as the IDN thread that I think are worth addressing: 1. Homograph attacks Stephane has already mentioned in a different response that homograph attacks are fairly rare. I don't have the knowledge to say whether or not that's accurate, but I can speak to how they're mitigated. In general, browsers will render the domain in the URI bar if all of the characters in the each section belong to the same script. As an example, https://?pple.com will not render correctly in Firefox in the URI bar, but https://?????.com/ will render correctly (both domains do not exist if you want to check). The other half of this comes down to domain registrars not allowing registrations of domains with homographs (depends on the TLD, of course). What this comes down to, is that Gemini clients, if they wish to mitigate this type of attack, should apply the same algorithm as web browsers. Again, given the preference for client certs for authenticating sessions, it doesn't seem like this attack would have dire consequences anyway. I also think I saw someone mention that they're worried about it from the IRI side as well? That attack doesn't seem like much of a realistic case, since if they direct you to a different page on the same server, you're well, still on the same server. This only becomes problematic in the case of shared hosting of untrusted tenants. 2. Normalization There's been a bit of fear-mongering about normalization which I can totally understand, since a first look at Unicode technical reports and the 4 normalization forms looks intimidating at first glance. However, as pointed out in a few RFCs, NFC is more or less the only normalization form that you need to worry about in *most* circumstances. Typed URIs should be normalized in NFC, both on server-side and client-side. When resolving files to the filesystem, the filename should be normalized to NFC. (this all assumes that your fs supports Unicode paths). NFKC becomes more relevant in the case that you want to implement something like search, or find in page, or something. You may want a user to be able to type in something like 'e' have their find include everything whose NFKC form is basically an 'e' (see the full set here: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3ANFKC_Casef old%3De%3A%5D&g=&i=). 3. Language support Normalization is generally supported across different languages p easily. Python has it in its stdlib: https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize Golang has support: https://pkg.go.dev/golang.org/x/text/unicode/norm Rust: https://unicode-rs.github.io/unicode-normalization/unicode_normalization/index.html C get its support through the venerable libicu library (you're already using libs for TLS): https://unicode-org.github.io/icu/userguide/transforms/normalization/ I will say that I don't know of any explicit IRI-handling libraries, nor do I know what the state of support is in different URI-handling libraries, but it will be something I play with as I work on gemini projects. I'm happy to share my experiences when I have more of them. :) - To address some non-technical points, I don't think that starting a new protocol and then deciding to ignore internationalization is necessarily the right way to go. In a lot of cases, internationalization sucks because of legacy support, and gemini doesn't *have* legacy to preserve compatibility. As I understand it, that's why TLS is mandatory, even though it arguably locks out some retro systems from being able to use it. Personally, I'd like to see the spec say something about how this is handled before any type of freeze takes place. -- worr
---
Previous in thread (12 of 32): 🗣️ Sean Conner (sean (a) conman.org)