Petite Abeille petite.abeille at gmail.com
Sun Jan 3 16:02:54 GMT 2021
- - - - - - - - - - - - - - - - - - -
On Jan 3, 2021, at 14:46, Stephane Bortzmeyer <stephane at sources.org> wrote:
UTF-8 has a quasi-monopoly.
Not quite.
For text/gemini, your stats read:
• Unspecified: 42,322• utf-8: 6,513• us-ascii: 3
Unspecified rules. By far. Most likely plain ASCII in practice.
Could you run #file --mime-type --mime-encoding on all these text/gemini?
$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | file --brief --mime-type --mime-encoding -text/plain; charset=utf-8
Validating the encoding would be informative as well:
$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | iconv -f utf-8 -t utf-8
/dev/null; echo $?0
Ditto for guessing the actual language:
https://polyglot.readthedocs.io/en/latest/Detection.html
℀ ±𝟤¢