[spec] IRIs, IDNs, and all that international jazz



>What I'd be most interested in hearing, at this point, is client 
>authors letting me know whether the standard library in the language 
>their client is implemented in can straightforwardly:
>
>1. Parse and relativise URLs with non-ASCII characters (so, yes, okay,
>   technically not URLs at all, you know what I mean) in paths and/or
>   domains?
>2. Transform back and forth between URIs and IRIs?
>3. Do DNS lookups of IDNs without them being punycoded first?  You can
>   test this with r?ksm?rg?s.josefsson.org.

I've looked into the situation in Haskell. It isn't nearly as good as 
I'd expected. The standard uri library 'network-uri' is strictly 3986. 
There is an 'iri' library, but it isn't widely used and doesn't seem to 
be very actively maintained: I can't even get it to install with recent 
ghc (ghc-8.8.4). It only deals with parsing and rendering, afaict 
there's no normalisation or "absolutising", nor anything on transforming 
between URIs and IRIs.

As for question 3, the answer appears to be no. In ghci:
> :set -package network
package flags have changed, resetting and loading new packages...
> import Network.Socket
> getAddrInfo (Just $ defaultHints {addrSocketType = Stream}) (Just 
"r?ksm?rg?s.josefsson.org") (Just "1965")

type/protocol: AddrInfo {addrFlags = [], addrFamily = AF_UNSPEC, 
addrSocketType = Stream, addrProtocol = 0, addrAddress = 0.0.0.0:0, 
addrCanonName = Nothing}, host name: Just 
"r\228ksm\246rg\229s.josefsson.org", service name: Just "1965"): does not 
exist (Name or service not known)

So library support isn't perfect. However: converting between 
utf8-encoded IRIs and URIs seems pretty trivial to implement by hand 
(Step 2 in section 3.1 of the rfc, and its inverse), and there are 
punycode implementations in standard haskell libraries (e.g. in the 
'encoding' package), so I am not at all scared by option 3. I'd just 
convert IRIs to URIs for internal use and manipulation, then convert 
back when displaying, and punycode when making requests. I'm not sure 
I'm not being naive here -- someone please explain the subtleties (or 
tell me to read the existing threads on this more carefully) if so!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201223/ee90
c268/attachment.sig>

---

Previous in thread (34 of 109): 🗣️ Shawn Nock (shawn (a) provisoire.ca)

Next in thread (36 of 109): 🗣️ Jacob Moody (moody (a) posixcafe.org)

View entire thread.