💾 Archived View for rawtext.club › ~sloum › geminilist › 004682.gmi captured on 2023-11-14 at 10:39:55. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

<-- back to the mailing list

[spec] Limit valid encodings of text/gemini to UTF-8

Côme Chilliet come at chilliet.eu

Sun Jan 3 16:11:31 GMT 2021

- - - - - - - - - - - - - - - - - - - 

Le dimanche 3 janvier 2021, 17:02:54 CET Petite Abeille a écrit :

On Jan 3, 2021, at 14:46, Stephane Bortzmeyer <stephane at sources.org> wrote:
UTF-8 has a quasi-monopoly.
Not quite.
For text/gemini, your stats read:
• Unspecified: 42,322
• utf-8: 6,513
• us-ascii: 3
Unspecified rules. By far. Most likely plain ASCII in practice.

No, the specification specifies that default is utf-8, so unspecified is utf-8.I do not set the charset in my server headers as it is redundant because I always send utf-8.

Ditto for guessing the actual language:
# echo $(openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null ) | polyglot detect | cut -d' ' -f1 | uniq
English
https://polyglot.readthedocs.io/en/latest/Detection.html

Language is not the same, because the specification explicitely says that there is no default, so my server always send the lang= header tag for text/gemini content.

Côme