<-- back to the mailing list

[spec] Limit valid encodings of text/gemini to UTF-8

Petite Abeille petite.abeille at gmail.com

Sun Jan 3 16:02:54 GMT 2021

- - - - - - - - - - - - - - - - - - - 
On Jan 3, 2021, at 14:46, Stephane Bortzmeyer <stephane at sources.org> wrote:
UTF-8 has a quasi-monopoly.

Not quite.

For text/gemini, your stats read:

• Unspecified: 42,322• utf-8: 6,513• us-ascii: 3

Unspecified rules. By far. Most likely plain ASCII in practice.

Could you run #file --mime-type --mime-encoding on all these text/gemini?

$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | file --brief --mime-type --mime-encoding -text/plain; charset=utf-8

Validating the encoding would be informative as well:

$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | iconv -f utf-8 -t utf-8

/dev/null; echo $?0

Ditto for guessing the actual language:

echo $(openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null ) | polyglot detect | cut -d' ' -f1 | uniqEnglish

https://polyglot.readthedocs.io/en/latest/Detection.html

℀ ±𝟤¢