Some reading on IRIs and IDNs

It was thus said that the Great Michael Lazar once stated:
> I've been following along with my own software in the background.

  Thank you.  Without an implementation it is difficult to see where the
landmines are.  So, with that said ...

> First of all, my domain registrar won't even let me put unicode characters
> in an A record without automatically converting them to punycode for me.
> 
> caf?.mozz.us -> xn--caf-dma.mozz.us

  Okay.

> Next, my naive python test client just kind of works as-is [0][1]. It will
> convert unicode DNS names to punycode under the hood before doing the lookup.
> Any unicode in the URL (IRI?) is left alone because.. why would a
> client ever muck
> around with the URL that the user gives them? That sounds like a bad idea to
> me.

  That's debatable.  The percent encoding doesn't change the meaning, just
the "envelope" so-to-speak.  

> My server (running jetforce) also works as-is. All I had to do was add an entry
> for "caf?.mozz.us" as a recognized hostname, and there you go.

  Okay, about that.  I modified my own stupid-simple client to support IRIs
and to convert the hostname via punycode (finally!).  The code changes in
the client weren't that large (once I got the punycode module written, it
was one line to switch from URI parsing to IRI parsing, one line to add the
punycode module, and one line modified to punycode the host when making a
connection) but I'm encountering an issue.  If I use:

	gemini://caf?.mozz.us/files/?????.txt

(and send that as the request) It works, and I get the file.  But when I go
to:

	gemini://xn--caf-dma.mozz.us/files/%F0%9D%92%BB%F0%9D%92%B6%F0%9D%93%83%F0
%9D%92%B8%F0%9D%93%8E.txt

(and send that as the request) I get an error 53 (no proxy allowed).  When I
go to:

	gemini://caf?.mozz.us/files/%F0%9D%92%BB%F0%9D%92%B6%F0%9D%93%83%F0%9D%92%
B8%F0%9D%93%8E.txt

(and send that as the request) it works as well.  I would expect the second
example to work along with the first and third examples. They all reference
the same resource in the same server.

  Another issue that I've thought of, the length of each request---the first
is 53 bytes, the second is 99 bytes and the third is 93 bytes.  This *could*
be an issue with respect to the the overall limit of 1024 bytes for a
request.

  As far as servers go, GLV-1.12556 still uses the URL parser, and would
choke on an IRI being given as a request (since it expects non-ASCII
characters to be encoded per RFC-3986).  That would be an easy fix for me
(just switch to the IRI parser) but allowing IRIs would be an actual change
to the protocol.  I'm just saying.

> Does this mean my server is already compliant? What else should I try?

  Perhaps allow "xn--caf-dma.mozz.us" as a hostname?

  -spc

> [0] https://github.com/michael-lazar/jetforce/blob/master/jetforce_client.py
> [1] It's nice to finally get a win for python after fighting with TLS
> for so long

---

Previous in thread (18 of 62): 🗣️ Michael Lazar (lazar.michael22 (a) gmail.com)

Next in thread (20 of 62): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

View entire thread.