What is required to be IRI compliant?

🗣️ From: Philip Linde (linde.philip (a) gmail.com)
📅 Sent: 2020-12-28 12:24
📧 Message 9 of 16

On Mon, 28 Dec 2020 12:59:12 +0100
Solene Rapenne <solene at perso.pw> wrote:

> I don't understand what you mean by normalizing the request.
> For the hostname, I see no reason to write "?crire.hostname" as
> "e'crire.hostname" if it what you mean.

There are sometimes multiple ways to represent the same characters in
Unicode. For example "?" in the composed form is just U+00E9, and in
the decomposed form it's U+0065, U+0301. These are visually
indistinguishable and take the same meaning in Unicode, but encoded as
UTF-8 or UTF-32 a byte-by-byte or even code point-by-code point
comparison will not indicate that they are.

Therefore the Unicode consortium defines a process called normalization
to either "fully compose" characters (turn sequences like U+0065,
U+0301 into their composed form, U+00E9) or "fully decompose" (which
works the other way around). To support this you'd need a database that
the Unicode consortium distributes.

If you are using GLib (or are ready to use GLib) it has functions for
this.

> What I see as an issue would be people using puny code if we go
> using IRI. That would mean the server will have to check the puny
> code of the hostname to check to a request using the punycode.
> 
> A library will certainly be required for that.

Again, if you're willing to use GLib, some functions for this exist in
glib/gi18n.h

-- 
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/43a8
32d2/attachment.sig>

---

Previous in thread (8 of 16): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (10 of 16): 🗣️ Solderpunk (solderpunk (a) posteo.net)

View entire thread.