💾 Archived View for rawtext.club › ~sloum › geminilist › 004670.gmi captured on 2024-02-05 at 11:22:51. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

<-- back to the mailing list

[spec] Limit valid encodings of text/gemini to UTF-8

Stephane Bortzmeyer stephane at sources.org

Sun Jan 3 13:49:43 GMT 2021

- - - - - - - - - - - - - - - - - - - 

On Mon, Dec 28, 2020 at 02:16:27PM +0100, Philip Linde <linde.philip at gmail.com> wrote a message of 69 lines which said:

While it is the case that impact is minimal, I suggest that the
specification reflects the much simpler situation these statistics
indicate rather than keep itself open to the general problem of
representing text/gemini in encodings that might not even have the
meta information characters encoded in the same way, and—if IRIs are
introduced—creates the problem of how IRIs should be represented in
e.g. ISO-8859-1.

Note also that saying "gemtexts MUST be in UTF-8" is noteverything. We may (or may be not) also want to mandate end-of-lines(they can be represented with CR, LF, CR-LF, LS or PS, the last twobeing purely Unicode, not present in ASCII) and normalization.

If we go that way, there is an existing standard for Unicode text, RFC5198 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5198.txt>. Itmandates CR-LF and normalization NFC.