Some reading on IRIs and IDNs
- 🗣️ From: Stephane Bortzmeyer (stephane (a) sources.org)
- 📅 Sent: 2020-12-09 08:38
- 📧 Message 4 of 62
On Wed, Dec 09, 2020 at 12:26:51AM -0500,
Sean Conner <sean at conman.org> wrote
a message of 73 lines which said:
> DNS *can* support UTF-8, but such support isn't wide, nor is it a
> standard.
Wrong. DNS 2181, which clarifies that "any binary string whatever can
be used as the label of any resource record" is part of the Standards
Track. The reasons why few people use UTF-8 in domain names are:
- Ignorance and lack of care for internationalization,
- Non-DNS reasons such as the fact that existing libraries and applications
may react badly to non-ASCII domain names,
- As often, the problem of normalization. DNS has normalization of
names (the case-insensitivity rule) but it does not extend to
Unicode.
> It does have an RFC (RFC-3492) and said RFC does contain code for
> encoding and decoding punycode (but it's in C, and the API is
> ... not what I would define but it can be worked with).
There is an implemention of Punycode in every standard library,
whatever your language.
> so a domain name like "??.english.s?d?r.???" is converted thusly:
In Python (but it is as simple in any other language):
>>> print(codecs.encode("??.English.s?d?r.???", encoding="idna"))
b'xn--99zt52a.English.xn--sdr-rlad.xn--wgbh1c'
(Note that the encodings.idna library of Python standard library is
limited to IDN v1.)
So, almost nothing to do for the programmer. I don't agree with your
assessment that IDN is simpler than IRI.
> I'm not even sure what name should be in a certificate for an
> IDN---the full UTF-8 version, or the punycode version, or both?
> What's currently done in HTTP land about this? (answering this will
> at least point in a direction, even if we don't want to go that
> direction).
gemini://gemini.bortzmeyer.org/rfc-mirror/rfc8399.txt
But it depends on the CA. It seems Let's Encrypt does not want to
handle UTF-8 and requires Punycode.
> [3] The domain name "gemini.conman.org" has three labels, "gemini",
> "conman" and "org". The term "label" is DNS lingo.
Let's be picky, there are four, there is also the root :-)
---
Previous in thread (3 of 62): 🗣️ Stephane Bortzmeyer (stephane (a) sources.org)
Next in thread (5 of 62): 🗣️ Jason McBrayer (jmcbray (a) carcosa.net)
View entire thread.