Hi Sean, I think you were wondering whether it's possible to resolve UTF-8 IDNs with a C or Lua program. This Linux C program [7] will do so via glibc's getaddrinfo(). For platforms without glibc such as the BSDs, ports of the libidn library [1] are available. Microsoft provides IdnToAscii() [2] for use with Windows 7 and it looks as if they support automatic conversion to punycode [3] in their implemenation of getaddrinfo() as of Windows 8. Slides 30 through 48 of this presentation [4] describe how Apple approaches network name internationalization. iOS 10 and macOS Sierra will try to query a given UTF-8 name via DNS. If the server doesn't have an entry for the UTF-8 name, it converts to punycode and tries again. Android 2.3 has the java.net.IDN package [5] for converting IDNs and Android 7.0 has android.icu.text.IDNA [6]. Thoughts? Scot [1] https://www.gnu.org/software/libidn/ [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii [3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfo [4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lardrb7 /714/714_networking_for_the_modern_internet.pdf?dl=1 [5] https://developer.android.com/reference/kotlin/java/net/IDN [6] https://developer.android.com/reference/android/icu/text/IDNA [7] Public domain example C program to resolve UTF-8 IDNs #define _GNU_SOURCE #include <locale.h> #include <netdb.h> #include <stdio.h> #include <string.h> #include <arpa/inet.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/types.h> #define ADDR_LEN 1024 void show_ip(char *name) { struct addrinfo filter, *info; char address[ADDR_LEN]; void *addr; memset (&filter, 0, sizeof(filter)); filter.ai_family = PF_UNSPEC; filter.ai_socktype = SOCK_STREAM; filter.ai_flags = AI_IDN; if (getaddrinfo (name, NULL, &filter, &info) == 0) { printf ("\nHost: %s\n", name); while (info) { if (info->ai_family == AF_INET) { addr = &((struct sockaddr_in *) info->ai_addr)->sin_addr; inet_ntop (AF_INET, addr, address, ADDR_LEN); printf (" IPv4 address: %s\n", address); } else if (info->ai_family == AF_INET6) { addr = &((struct sockaddr_in6 *) info->ai_addr)->sin6_addr; inet_ntop (AF_INET6, addr, address, ADDR_LEN); printf (" IPv6 address: %s\n", address); } info = info->ai_next; } } else perror ("getaddrinfo"); freeaddrinfo(info); } int main() { setlocale(LC_ALL, ""); show_ip("蛸.jp"); show_ip("xn--td2a.jp"); show_ip("gémeaux.bortzmeyer.org"); show_ip("xn--gmeaux-bva.bortzmeyer.org"); show_ip("café.mozz.us"); show_ip("xn--caf-dma.mozz.us"); }
It was thus said that the Great Scot once stated: > Hi Sean, Hi Scot! > I think you were wondering whether it's possible to resolve UTF-8 > IDNs with a C or Lua program. This Linux C program [7] will do so > via glibc's getaddrinfo(). For platforms without glibc such as the > BSDs, ports of the libidn library [1] are available. > > Microsoft provides IdnToAscii() [2] for use with Windows 7 and it > looks as if they support automatic conversion to punycode [3] > in their implemenation of getaddrinfo() as of Windows 8. > > Slides 30 through 48 of this presentation [4] describe how Apple > approaches network name internationalization. iOS 10 and > macOS Sierra will try to query a given UTF-8 name via DNS. If > the server doesn't have an entry for the UTF-8 name, it > converts to punycode and tries again. > > Android 2.3 has the java.net.IDN package [5] for converting > IDNs and Android 7.0 has android.icu.text.IDNA [6]. > > Thoughts? Thank you *so* much for the sample code, and providing much needed background information about this. This will make the final decision on IRI support a bit easier to make. I was not aware of the GNU-specific AI_IDN flag to getaddrinfo() (and I wish it was standard actually), and yes, it does work on systems that have it (Linux). Just one thing though---you might want to be careful when pasting code into email---the code below contained non-breaking spaces that my C compiler (gcc) did not like. It's an easy fix, so this is mostly an FYI getting weird errors when compiling the code below. Again, thank you for your message. -spc > [1] https://www.gnu.org/software/libidn/ > [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii > [3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfo > [4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lardr b7/714/714_networking_for_the_modern_internet.pdf?dl=1 > [5] https://developer.android.com/reference/kotlin/java/net/IDN > [6] https://developer.android.com/reference/android/icu/text/IDNA > [7] Public domain example C program to resolve UTF-8 IDNs P.S. Here's the code with the non-breaking spaces removed. #include <locale.h> #include <netdb.h> #include <stdio.h> #include <string.h> #include <arpa/inet.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/types.h> #define ADDR_LEN 1024 void show_ip(char *name) { struct addrinfo filter, *info; char address[ADDR_LEN]; void *addr; memset (&filter, 0, sizeof(filter)); filter.ai_family = PF_UNSPEC; filter.ai_socktype = SOCK_STREAM; filter.ai_flags = AI_IDN; if (getaddrinfo (name, NULL, &filter, &info) == 0) { printf ("\nHost: %s\n", name); while (info) { if (info->ai_family == AF_INET) { addr = &((struct sockaddr_in *) info->ai_addr)->sin_addr; inet_ntop (AF_INET, addr, address, ADDR_LEN); printf (" IPv4 address: %s\n", address); } else if (info->ai_family == AF_INET6) { addr = &((struct sockaddr_in6 *) info->ai_addr)->sin6_addr; inet_ntop (AF_INET6, addr, address, ADDR_LEN); printf (" IPv6 address: %s\n", address); } info = info->ai_next; } } else perror ("getaddrinfo"); freeaddrinfo(info); } int main() { setlocale(LC_ALL, ""); show_ip("蛸.jp"); show_ip("xn--td2a.jp"); show_ip("gémeaux.bortzmeyer.org"); show_ip("xn--gmeaux-bva.bortzmeyer.org"); show_ip("café.mozz.us"); show_ip("xn--caf-dma.mozz.us"); }
On 3/17/21 9:13 PM, Sean Conner wrote: > It was thus said that the Great Scot once stated: >> Hi Sean, > Hi Scot! > >> I think you were wondering whether it's possible to resolve UTF-8 >> IDNs with a C or Lua program. This Linux C program [7] will do so >> via glibc's getaddrinfo(). For platforms without glibc such as the >> BSDs, ports of the libidn library [1] are available. >> >> ... >> >> Thoughts? > Thank you *so* much for the sample code, and providing much needed > background information about this. This will make the final decision on IRI > support a bit easier to make. I was not aware of the GNU-specific AI_IDN > flag to getaddrinfo() (and I wish it was standard actually), and yes, it > does work on systems that have it (Linux). > > Just one thing though---you might want to be careful when pasting code > into email---the code below contained non-breaking spaces that my C compiler > (gcc) did not like. It's an easy fix, so this is mostly an FYI getting > weird errors when compiling the code below. > > Again, thank you for your message. I appreciate the work you are doing and am glad that this might have helped. And thanks for posting the corrected code. I think it's missing the first line, which is necessary on my system: #define _GNU_SOURCE > -spc > >> [1] https://www.gnu.org/software/libidn/ >> [2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-idntoascii >> [3] https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tc pip-getaddrinfo >> [4] https://devstreaming-cdn.apple.com/videos/wwdc/2016/714urluxe140lard rb7/714/714_networking_for_the_modern_internet.pdf?dl=1 >> [5] https://developer.android.com/reference/kotlin/java/net/IDN >> [6] https://developer.android.com/reference/android/icu/text/IDNA >> [7] Public domain example C program to resolve UTF-8 IDNs > P.S. Here's the code with the non-breaking spaces removed. > > #include <locale.h> > #include <netdb.h> > #include <stdio.h> > #include <string.h> > ... >
---