On Sat Dec 26, 2020 at 1:32 AM CET, Omar Polo wrote: > In the last two days I took the time to write first a proper URL > parser[0], and than extending it to support IRIs[1]. Turns out, once > you have a URL parser (not hard to do at all), you almost have a > complete IRI parser. As Sean wrote, you basically have to replace the > unreserved rule to allow other utf8 characters and you're done. And > even if you're uncomfortable doing this, the RFC lists the valid ranges, > so adding a couple of checks isn't the end of the world (if you want to > be 100% compliant, whatever that means). > > (And all of this comes from one that has never, ever, implemented a > IRI/URI parser before, that has read for the first time the rfc3986 > while writing the code and has successfully -- I believe -- implemented > a full IRI parser in less than 500 lines of C, with comments and > everything, without using anything other than the standard library. > Heck, the parser doesn't even allocates memory.) This is, more and more, how I'm conceptualising things. Parsing/validating IRIs is not actually remotely difficult at all. Algorithmically it's an extremely minor change to parsing/validating URIs. The apparent pain exists only because the world has apparently been very slow about packaging code up for this into major libraries/languages, probably because HTTP's ASCII-only nature reduces demand. If we adopt IRIs, I would actually encourage Gemini software authors who find their language lacking tools for this not to write custom code for it that lives only in their software, but to actually try to get the functionality accepted upstream into standard libraries, or widely used third-party libraries. This is generally useful functionality that's in no way Gemini-specific, and having easy support for it everywhere makes the world a better place regardless of whether Gemini thrives or declines. I don't really think the alleged difficulty of handling IRIs is a good argument against accepting them. I'm now more interested in learning/thinking about normalisation issues, which have been relatively under discussed so far. It's possible this is where the real trouble lies. Breaking a UTF-8 IRI up into (scheme, authority, path) is not a substantial hurdle. Cheers, Solderpunk
---
Previous in thread (79 of 109): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)
Next in thread (81 of 109): 🗣️ Solderpunk (solderpunk (a) posteo.net)