C program to resolve UTF-8 IDNs

1. Scot (gmi1 (a) scotdoyle.com)

Hi Sean,

I think you were wondering whether it's possible to resolve UTF-8
IDNs with a C or Lua program. This Linux C program [7] will do so
via glibc's getaddrinfo(). For platforms without glibc such as the
BSDs, ports of the libidn library [1] are available.

Microsoft provides IdnToAscii() [2] for use with Windows 7 and it
looks as if they support automatic conversion to punycode [3]
in their implemenation of getaddrinfo() as of Windows 8.

Slides 30 through 48 of this presentation [4] describe how Apple
approaches network name internationalization. iOS 10 and
macOS Sierra will try to query a given UTF-8 name via DNS. If
the server doesn't have an entry for the UTF-8 name, it
converts to punycode and tries again.

Android 2.3 has the java.net.IDN package [5] for converting
IDNs and Android 7.0 has android.icu.text.IDNA [6].

Thoughts?
Scot


[1] https://www.gnu.org/software/libidn/
[2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii
[3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfo
[4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lardrb7
/714/714_networking_for_the_modern_internet.pdf?dl=1
[5] https://developer.android.com/reference/kotlin/java/net/IDN
[6] https://developer.android.com/reference/android/icu/text/IDNA


[7] Public domain example C program to resolve UTF-8 IDNs

#define _GNU_SOURCE
#include <locale.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>

#define ADDR_LEN 1024

void show_ip(char *name) {
  struct addrinfo filter, *info;
  char address[ADDR_LEN];
  void *addr;
  memset (&filter, 0, sizeof(filter));
  filter.ai_family = PF_UNSPEC;
  filter.ai_socktype = SOCK_STREAM;
  filter.ai_flags = AI_IDN;
  if (getaddrinfo (name, NULL, &filter, &info) == 0) {
    printf ("\nHost: %s\n", name);
    while (info) {
      if (info->ai_family == AF_INET) {
        addr = &((struct sockaddr_in *) info->ai_addr)->sin_addr;
        inet_ntop (AF_INET, addr, address, ADDR_LEN);
        printf ("  IPv4 address: %s\n", address);
      } else if (info->ai_family == AF_INET6) {
        addr = &((struct sockaddr_in6 *) info->ai_addr)->sin6_addr;
        inet_ntop (AF_INET6, addr, address, ADDR_LEN);
        printf ("  IPv6 address: %s\n", address);
      }
      info = info->ai_next;
    }
  } else
    perror ("getaddrinfo");
  freeaddrinfo(info);
}

int main() {
  setlocale(LC_ALL, "");
  show_ip("蛸.jp");
  show_ip("xn--td2a.jp");
  show_ip("gémeaux.bortzmeyer.org");
  show_ip("xn--gmeaux-bva.bortzmeyer.org");
  show_ip("café.mozz.us");
  show_ip("xn--caf-dma.mozz.us");
}

Link to individual message.

2. Sean Conner (sean (a) conman.org)

It was thus said that the Great Scot once stated:
> Hi Sean,

  Hi Scot!

> I think you were wondering whether it's possible to resolve UTF-8
> IDNs with a C or Lua program. This Linux C program [7] will do so
> via glibc's getaddrinfo(). For platforms without glibc such as the
> BSDs, ports of the libidn library [1] are available.
> 
> Microsoft provides IdnToAscii() [2] for use with Windows 7 and it
> looks as if they support automatic conversion to punycode [3]
> in their implemenation of getaddrinfo() as of Windows 8.
> 
> Slides 30 through 48 of this presentation [4] describe how Apple
> approaches network name internationalization. iOS 10 and
> macOS Sierra will try to query a given UTF-8 name via DNS. If
> the server doesn't have an entry for the UTF-8 name, it
> converts to punycode and tries again.
> 
> Android 2.3 has the java.net.IDN package [5] for converting
> IDNs and Android 7.0 has android.icu.text.IDNA [6].
> 
> Thoughts?

  Thank you *so* much for the sample code, and providing much needed
background information about this.  This will make the final decision on IRI
support a bit easier to make.  I was not aware of the GNU-specific AI_IDN
flag to getaddrinfo() (and I wish it was standard actually), and yes, it
does work on systems that have it (Linux).

  Just one thing though---you might want to be careful when pasting code
into email---the code below contained non-breaking spaces that my C compiler
(gcc) did not like.  It's an easy fix, so this is mostly an FYI getting
weird errors when compiling the code below.

  Again, thank you for your message.

  -spc

> [1] https://www.gnu.org/software/libidn/
> [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii
> [3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfo
> [4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lardr
b7/714/714_networking_for_the_modern_internet.pdf?dl=1
> [5] https://developer.android.com/reference/kotlin/java/net/IDN
> [6] https://developer.android.com/reference/android/icu/text/IDNA
> [7] Public domain example C program to resolve UTF-8 IDNs

P.S.  Here's the code with the non-breaking spaces removed.

#include <locale.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>

#define ADDR_LEN 1024

void show_ip(char *name) {
  struct addrinfo filter, *info;
  char address[ADDR_LEN];
  void *addr;
  memset (&filter, 0, sizeof(filter));
  filter.ai_family = PF_UNSPEC;
  filter.ai_socktype = SOCK_STREAM;
  filter.ai_flags = AI_IDN;
  if (getaddrinfo (name, NULL, &filter, &info) == 0) {
    printf ("\nHost: %s\n", name);
    while (info) {
      if (info->ai_family == AF_INET) {
        addr = &((struct sockaddr_in *) info->ai_addr)->sin_addr;
        inet_ntop (AF_INET, addr, address, ADDR_LEN);
        printf ("  IPv4 address: %s\n", address);
      } else if (info->ai_family == AF_INET6) {
        addr = &((struct sockaddr_in6 *) info->ai_addr)->sin6_addr;
        inet_ntop (AF_INET6, addr, address, ADDR_LEN);
        printf ("  IPv6 address: %s\n", address);
      }
      info = info->ai_next;
    }
  } else
  perror ("getaddrinfo");
  freeaddrinfo(info);
}

int main() {
  setlocale(LC_ALL, "");
  show_ip("蛸.jp");
  show_ip("xn--td2a.jp");
  show_ip("gémeaux.bortzmeyer.org");
  show_ip("xn--gmeaux-bva.bortzmeyer.org");
  show_ip("café.mozz.us");
  show_ip("xn--caf-dma.mozz.us");
}

Link to individual message.

3. Scot (gmi1 (a) scotdoyle.com)

On 3/17/21 9:13 PM, Sean Conner wrote:
> It was thus said that the Great Scot once stated:
>> Hi Sean,
>    Hi Scot!
> 
>> I think you were wondering whether it's possible to resolve UTF-8
>> IDNs with a C or Lua program. This Linux C program [7] will do so
>> via glibc's getaddrinfo(). For platforms without glibc such as the
>> BSDs, ports of the libidn library [1] are available.
>> 
>> ...
>> 
>> Thoughts?
>    Thank you *so* much for the sample code, and providing much needed
> background information about this.  This will make the final decision on IRI
> support a bit easier to make.  I was not aware of the GNU-specific AI_IDN
> flag to getaddrinfo() (and I wish it was standard actually), and yes, it
> does work on systems that have it (Linux).
> 
>    Just one thing though---you might want to be careful when pasting code
> into email---the code below contained non-breaking spaces that my C compiler
> (gcc) did not like.  It's an easy fix, so this is mostly an FYI getting
> weird errors when compiling the code below.
> 
>    Again, thank you for your message.
I appreciate the work you are doing and am glad that this might
have helped.

And thanks for posting the corrected code. I think it's missing the
first line, which is necessary on my system:
#define _GNU_SOURCE

>    -spc
> 
>> [1] https://www.gnu.org/software/libidn/
>> [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii
>> [3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tc
pip-getaddrinfo
>> [4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lard
rb7/714/714_networking_for_the_modern_internet.pdf?dl=1
>> [5] https://developer.android.com/reference/kotlin/java/net/IDN
>> [6] https://developer.android.com/reference/android/icu/text/IDNA
>> [7] Public domain example C program to resolve UTF-8 IDNs
> P.S.  Here's the code with the non-breaking spaces removed.
> 
> #include <locale.h>
> #include <netdb.h>
> #include <stdio.h>
> #include <string.h>
> ...
>

Link to individual message.

---

Previous Thread: [ANN] beepboop.systems

Next Thread: CGI programs and the gemini protocol