[spec] IRIs, IDNs, and all that international jazz
- 🗣️ From: mbays (a) sdf.org (mbays (a) sdf.org)
- 📅 Sent: 2020-12-23 14:00
- 📧 Message 35 of 109
- Tuesday, 2020-12-22 at 16:13 +0100 - Solderpunk <solderpunk at posteo.net>:
>What I'd be most interested in hearing, at this point, is client
>authors letting me know whether the standard library in the language
>their client is implemented in can straightforwardly:
>
>1. Parse and relativise URLs with non-ASCII characters (so, yes, okay,
> technically not URLs at all, you know what I mean) in paths and/or
> domains?
>2. Transform back and forth between URIs and IRIs?
>3. Do DNS lookups of IDNs without them being punycoded first? You can
> test this with r?ksm?rg?s.josefsson.org.
I've looked into the situation in Haskell. It isn't nearly as good as
I'd expected. The standard uri library 'network-uri' is strictly 3986.
There is an 'iri' library, but it isn't widely used and doesn't seem to
be very actively maintained: I can't even get it to install with recent
ghc (ghc-8.8.4). It only deals with parsing and rendering, afaict
there's no normalisation or "absolutising", nor anything on transforming
between URIs and IRIs.
As for question 3, the answer appears to be no. In ghci:
> :set -package network
package flags have changed, resetting and loading new packages...
> import Network.Socket
> getAddrInfo (Just $ defaultHints {addrSocketType = Stream}) (Just
"r?ksm?rg?s.josefsson.org") (Just "1965")
- ** Exception: Network.Socket.getAddrInfo (called with preferred socket
type/protocol: AddrInfo {addrFlags = [], addrFamily = AF_UNSPEC,
addrSocketType = Stream, addrProtocol = 0, addrAddress = 0.0.0.0:0,
addrCanonName = Nothing}, host name: Just
"r\228ksm\246rg\229s.josefsson.org", service name: Just "1965"): does not
exist (Name or service not known)
So library support isn't perfect. However: converting between
utf8-encoded IRIs and URIs seems pretty trivial to implement by hand
(Step 2 in section 3.1 of the rfc, and its inverse), and there are
punycode implementations in standard haskell libraries (e.g. in the
'encoding' package), so I am not at all scared by option 3. I'd just
convert IRIs to URIs for internal use and manipulation, then convert
back when displaying, and punycode when making requests. I'm not sure
I'm not being naive here -- someone please explain the subtleties (or
tell me to read the existing threads on this more carefully) if so!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201223/ee90
c268/attachment.sig>
---
Previous in thread (34 of 109): 🗣️ Shawn Nock (shawn (a) provisoire.ca)
Next in thread (36 of 109): 🗣️ Jacob Moody (moody (a) posixcafe.org)
View entire thread.