On Mon, 28 Dec 2020 12:59:12 +0100 Solene Rapenne <solene at perso.pw> wrote: > I don't understand what you mean by normalizing the request. > For the hostname, I see no reason to write "?crire.hostname" as > "e'crire.hostname" if it what you mean. There are sometimes multiple ways to represent the same characters in Unicode. For example "?" in the composed form is just U+00E9, and in the decomposed form it's U+0065, U+0301. These are visually indistinguishable and take the same meaning in Unicode, but encoded as UTF-8 or UTF-32 a byte-by-byte or even code point-by-code point comparison will not indicate that they are. Therefore the Unicode consortium defines a process called normalization to either "fully compose" characters (turn sequences like U+0065, U+0301 into their composed form, U+00E9) or "fully decompose" (which works the other way around). To support this you'd need a database that the Unicode consortium distributes. If you are using GLib (or are ready to use GLib) it has functions for this. > What I see as an issue would be people using puny code if we go > using IRI. That would mean the server will have to check the puny > code of the hostname to check to a request using the punycode. > > A library will certainly be required for that. Again, if you're willing to use GLib, some functions for this exist in glib/gi18n.h -- Philip -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/43a8 32d2/attachment.sig>
---
Previous in thread (8 of 16): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)
Next in thread (10 of 16): 🗣️ Solderpunk (solderpunk (a) posteo.net)