💾 Archived View for rawtext.club › ~sloum › geminilist › 001339.gmi captured on 2020-09-24 at 01:57:03. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

Ambiguity in spec regarding line endings

Ryan Kavanagh rak at rak.ac

Thu Jun 4 17:08:44 BST 2020

- - - - - - - - - - - - - - - - - - - 

I'm reading the current version of the spec, and have come across thefollowing ambiguous paragraph in §3.3:

When in canonical form, media subtypes of the "text" type use CRLF as the text line break. Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body. Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via HTTP.

How do the second and third sentences interact? In particular, how does

[...] when it is done consistently for an entire response body.

interact with

Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via HTTP.

How should Gemini clients behave when both CRLF and LF appear in thesame text/gemini transmission? Are both to be equivalently treated asline breaks?

I've looked through the archives to see what has been said in the pastabout line breaks, and the two following messages appear most relevant:

On Sat, Sep 07, 2019 at 04:30:14PM -0400, Jason McBrayer wrote:

IMO, it makes sense to require CRLF in the plain text parts of the
protocol (after requests, after the status line of a response), but I
don't think that the text/gemini file format needs to have CR/LF; IMO
clients should be prepared to accept either LF or CR/LF just as they
would with text/plain. And maybe if we're serious about supporting old
devices, clients should be prepared for bare CR, too (Classic MacOS).
But it's a pain in the arse to authors to have to save text documents
with non-native line endings, and I don't feel like servers need to be
in the business of reformatting the content they serve.

On Sun, Sep 08, 2019 at 02:42:08PM +0000, solderpunk wrote:

I will admit that the current liberal use of CRLF throughout the
Gemini spec is the result of me blindly copying from Gopher and other
RFCs (as Sean mentioned, it's ubiquitous).

Here's [0,1] some of the history of requiring CRLF in network protocolsand in requiring CRLF for text/ subtypes [2] during transmission.

TL;DR: every system has a different native line ending sequence (LF vsCR vs CRLF). To ensure all can communicate with each other (and tosimplify parsing of communications), transmissions are required torepresent all line endings in text formats by CRLF. Line endings used inthe local storage of text files have *nothing to do* with the lineendings used in transmission, and clients are expected to convert fromCRLF to whatever local format is preferred. So indeed, servers are inthe business of reformatting text/* content that they serve, and they doso to ensure interoperability between systems with different line endingconventions.

I think there's a conceptual point to be made here: text/gemini filesare not binary data, but rather, *text files*. This means that theirtransmission should not attempt to provide byte-for-byte identicalcopies of the local data, but should instead follow well-defined andagreed-upon representations. If your goal is to transmit a byte-for-byteidentical copy of your file, there are other mime types you can use toaccomplish this (e.g., application/octet-stream).

The FTP protocol makes a similar conceptual distinction. It allows fortext transmission (ASCII and EBCDIC types), where end-of-lines aredefined to be CLRF (ASCII type) and NL (EBCDIC type). It also allows fora stream / binary transfer mode for transmitting text (and other data)without any conversion. Quoting from the RFC [4, §3.4]:

For the purpose of standardized transfer, the sending host will translate its internal end of line or end of record denotation into the representation prescribed by the transfer mode and file structure, and the receiving host will perform the inverse translation to its internal denotation. [...] Since these transformations imply extra work for some systems, identical systems transferring non-record structured text files might wish to use a binary representation and stream mode for the transfer.

However, in keeping with Postel's law, I suggest allowing clients toaccept LF as a line ending, as is done by RFC 7230 §3.5 [3]:

Although the line terminator for the start-line and header fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.

Conclusion:

To eliminate ambiguity and to make the gemini protocol consistent withevery other text transmission protocol I know of, I propose amending theambiguous paragraph in the spec as follows:

As specified in RFC 2046 §4.1.1, the canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. For robustness, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR in text media.

Best,Ryan

[0] https://www.rfc-editor.org/old/EOLstory.txt[1] https://tools.ietf.org/html/rfc318 [ page 8, "End of Line Convention" ][2] https://tools.ietf.org/html/rfc2046#section-4.1.1[3] https://tools.ietf.org/html/rfc7230#section-3.5[4] https://tools.ietf.org/html/rfc959

-- |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F|\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A-------------- next part --------------A non-text attachment was scrubbed...Name: signature.ascType: application/pgp-signatureSize: 1873 bytesDesc: not availableURL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200604/fbf5d0ed/attachment.sig>