💾 Archived View for gemi.dev › gemini-mailing-list › 000179.gmi captured on 2023-11-04 at 12:30:57. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I'm reading the current version of the spec, and have come across the following ambiguous paragraph in ?3.3: When in canonical form, media subtypes of the "text" type use CRLF as the text line break. Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body. Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via HTTP. How do the second and third sentences interact? In particular, how does [...] when it is done consistently for an entire response body. interact with Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via HTTP. How should Gemini clients behave when both CRLF and LF appear in the same text/gemini transmission? Are both to be equivalently treated as line breaks? I've looked through the archives to see what has been said in the past about line breaks, and the two following messages appear most relevant: On Sat, Sep 07, 2019 at 04:30:14PM -0400, Jason McBrayer wrote: > IMO, it makes sense to require CRLF in the plain text parts of the > protocol (after requests, after the status line of a response), but I > don't think that the text/gemini file format needs to have CR/LF; IMO > clients should be prepared to accept either LF or CR/LF just as they > would with text/plain. And maybe if we're serious about supporting old > devices, clients should be prepared for bare CR, too (Classic MacOS). > But it's a pain in the arse to authors to have to save text documents > with non-native line endings, and I don't feel like servers need to be > in the business of reformatting the content they serve. On Sun, Sep 08, 2019 at 02:42:08PM +0000, solderpunk wrote: > I will admit that the current liberal use of CRLF throughout the > Gemini spec is the result of me blindly copying from Gopher and other > RFCs (as Sean mentioned, it's ubiquitous). Here's [0,1] some of the history of requiring CRLF in network protocols and in requiring CRLF for text/ subtypes [2] during transmission. TL;DR: every system has a different native line ending sequence (LF vs CR vs CRLF). To ensure all can communicate with each other (and to simplify parsing of communications), transmissions are required to represent all line endings in text formats by CRLF. Line endings used in the local storage of text files have *nothing to do* with the line endings used in transmission, and clients are expected to convert from CRLF to whatever local format is preferred. So indeed, servers are in the business of reformatting text/* content that they serve, and they do so to ensure interoperability between systems with different line ending conventions. I think there's a conceptual point to be made here: text/gemini files are not binary data, but rather, *text files*. This means that their transmission should not attempt to provide byte-for-byte identical copies of the local data, but should instead follow well-defined and agreed-upon representations. If your goal is to transmit a byte-for-byte identical copy of your file, there are other mime types you can use to accomplish this (e.g., application/octet-stream). The FTP protocol makes a similar conceptual distinction. It allows for text transmission (ASCII and EBCDIC types), where end-of-lines are defined to be CLRF (ASCII type) and NL (EBCDIC type). It also allows for a stream / binary transfer mode for transmitting text (and other data) without any conversion. Quoting from the RFC [4, ?3.4]: For the purpose of standardized transfer, the sending host will translate its internal end of line or end of record denotation into the representation prescribed by the transfer mode and file structure, and the receiving host will perform the inverse translation to its internal denotation. [...] Since these transformations imply extra work for some systems, identical systems transferring non-record structured text files might wish to use a binary representation and stream mode for the transfer. However, in keeping with Postel's law, I suggest allowing clients to accept LF as a line ending, as is done by RFC 7230 ?3.5 [3]: Although the line terminator for the start-line and header fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR. Conclusion: To eliminate ambiguity and to make the gemini protocol consistent with every other text transmission protocol I know of, I propose amending the ambiguous paragraph in the spec as follows: As specified in RFC 2046 ?4.1.1, the canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. For robustness, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR in text media. Best, Ryan [0] https://www.rfc-editor.org/old/EOLstory.txt [1] https://tools.ietf.org/html/rfc318 [ page 8, "End of Line Convention" ] [2] https://tools.ietf.org/html/rfc2046#section-4.1.1 [3] https://tools.ietf.org/html/rfc7230#section-3.5 [4] https://tools.ietf.org/html/rfc959 -- |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F |\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A
> On Jun 4, 2020, at 18:08, Ryan Kavanagh <rak at rak.ac> wrote: > > As specified in RFC 2046 ?4.1.1, the canonical form of any MIME > "text" subtype MUST always represent a line break as a CRLF > sequence. For robustness, a recipient MAY recognize a single LF as > a line terminator and ignore any preceding CR in text media. $ delcr | gemini | addcr https://cr.yp.to/ucspi-tcp/addcr.html
I disagree with this idea, as it adds a signifigant burden to both server implementations and client implementations running on unix systems. On Thu, Jun 04, 2020 at 12:08:44PM -0400, Ryan Kavanagh wrote: > I'm reading the current version of the spec, and have come across the > following ambiguous paragraph in ?3.3: > > When in canonical form, media subtypes of the "text" type use CRLF > as the text line break. Gemini relaxes this requirement and allows > the transport of text media with plain LF alone (but NOT a plain CR > alone) representing a line break when it is done consistently for an > entire response body. Gemini clients MUST accept CRLF and bare LF > as being representative of a line break in text media received via > HTTP. > > How do the second and third sentences interact? In particular, how does > > [...] when it is done consistently for an entire response body. > > interact with > > Gemini clients MUST accept CRLF and bare LF as being representative > of a line break in text media received via HTTP. > > How should Gemini clients behave when both CRLF and LF appear in the > same text/gemini transmission? Are both to be equivalently treated as > line breaks? > > I've looked through the archives to see what has been said in the past > about line breaks, and the two following messages appear most relevant: > > On Sat, Sep 07, 2019 at 04:30:14PM -0400, Jason McBrayer wrote: > > IMO, it makes sense to require CRLF in the plain text parts of the > > protocol (after requests, after the status line of a response), but I > > don't think that the text/gemini file format needs to have CR/LF; IMO > > clients should be prepared to accept either LF or CR/LF just as they > > would with text/plain. And maybe if we're serious about supporting old > > devices, clients should be prepared for bare CR, too (Classic MacOS). > > But it's a pain in the arse to authors to have to save text documents > > with non-native line endings, and I don't feel like servers need to be > > in the business of reformatting the content they serve. > > On Sun, Sep 08, 2019 at 02:42:08PM +0000, solderpunk wrote: > > I will admit that the current liberal use of CRLF throughout the > > Gemini spec is the result of me blindly copying from Gopher and other > > RFCs (as Sean mentioned, it's ubiquitous). > > Here's [0,1] some of the history of requiring CRLF in network protocols > and in requiring CRLF for text/ subtypes [2] during transmission. > > TL;DR: every system has a different native line ending sequence (LF vs > CR vs CRLF). To ensure all can communicate with each other (and to > simplify parsing of communications), transmissions are required to > represent all line endings in text formats by CRLF. Line endings used in > the local storage of text files have *nothing to do* with the line > endings used in transmission, and clients are expected to convert from > CRLF to whatever local format is preferred. So indeed, servers are in > the business of reformatting text/* content that they serve, and they do > so to ensure interoperability between systems with different line ending > conventions. > > I think there's a conceptual point to be made here: text/gemini files > are not binary data, but rather, *text files*. This means that their > transmission should not attempt to provide byte-for-byte identical > copies of the local data, but should instead follow well-defined and > agreed-upon representations. If your goal is to transmit a byte-for-byte > identical copy of your file, there are other mime types you can use to > accomplish this (e.g., application/octet-stream). > > The FTP protocol makes a similar conceptual distinction. It allows for > text transmission (ASCII and EBCDIC types), where end-of-lines are > defined to be CLRF (ASCII type) and NL (EBCDIC type). It also allows for > a stream / binary transfer mode for transmitting text (and other data) > without any conversion. Quoting from the RFC [4, ?3.4]: > > For the purpose of standardized transfer, the sending host will > translate its internal end of line or end of record denotation into > the representation prescribed by the transfer mode and file > structure, and the receiving host will perform the inverse > translation to its internal denotation. [...] Since these > transformations imply extra work for some systems, identical systems > transferring non-record structured text files might wish to use a > binary representation and stream mode for the transfer. > > However, in keeping with Postel's law, I suggest allowing clients to > accept LF as a line ending, as is done by RFC 7230 ?3.5 [3]: > > Although the line terminator for the start-line and header fields > is the sequence CRLF, a recipient MAY recognize a single LF as a > line terminator and ignore any preceding CR. > > Conclusion: > > To eliminate ambiguity and to make the gemini protocol consistent with > every other text transmission protocol I know of, I propose amending the > ambiguous paragraph in the spec as follows: > > As specified in RFC 2046 ?4.1.1, the canonical form of any MIME > "text" subtype MUST always represent a line break as a CRLF > sequence. For robustness, a recipient MAY recognize a single LF as > a line terminator and ignore any preceding CR in text media. > > Best, > Ryan > > [0] https://www.rfc-editor.org/old/EOLstory.txt > [1] https://tools.ietf.org/html/rfc318 > [ page 8, "End of Line Convention" ] > [2] https://tools.ietf.org/html/rfc2046#section-4.1.1 > [3] https://tools.ietf.org/html/rfc7230#section-3.5 > [4] https://tools.ietf.org/html/rfc959 > > -- > |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F > |\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A
Hi, On Thu, Jun 04, 2020 at 12:23:26PM -0400, prisonpotato at tilde.team wrote: > I disagree with this idea, as it adds a signifigant burden to both > server implementations and client implementations running on unix > systems. The proposed modification *reduces* the burden for clients on all systems. Indeed, clients are currently required to accept *both* CRLF and bare LF as being representative of a line break. The proposed change requires them only to accept CRLF, while giving them the option to also accept LF if they so desire. This means: clients satisfying the spec now will satisfy the spec after the change. You are correct in that it increases the burden for servers (regardless of the host system): they must convert bare LF endings to CRLF before transmitting text. Whether or not this change imposes a "significant" burden is subjective. For what it's worth, the gopher protocol specifies CRLF line endings, and gophernicus manages to do this conversion with ~5 lines of code [0, 1]. The question boils down to a cost-benefit analysis of: preserving spec compliance for existing servers and not having servers worry about line endings versus respecting network protocol conventions that have been established for decades and not violating a MUST requirement of RFC 2046 ?4.1.1 Best, Ryan [0] https://github.com/gophernicus/gophernicus/blob/master/src/file.c#L68 [1] https://github.com/gophernicus/gophernicus/blob/master/src/string.c#L122 -- |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F |\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A
> The proposed change requires them only to accept CRLF This is a problem I think, because any clients that do this will fail to properly display the vast majority on Gemini content. I don't see the problem with mandating handling LF as well. Just split lines on LF and you're done for both types. makeworld
I agree. This seems like a more sensible solution
After some thought and discussion, I'd like to retract my proposed amendment. I misinterpreted that paragraph of the spec as implying something that it isn't. Best, Ryan -- |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F |\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A
It was thus said that the Great prisonpotato at tilde.team once stated: > I disagree with this idea, as it adds a signifigant burden to both > server implementations and client implementations running on unix > systems. And I disagree with this disagreement. Requiring only LF produces an undue hardship on Windows systems which use both CR and LF. And Windows is
On Thu, 4 Jun 2020 17:53:14 -0400 Sean Conner <sean at conman.org> wrote: > And I disagree with this disagreement. Requiring only LF produces > an undue hardship on Windows systems which use both CR and LF. And > Windows is *still* the most popular operating system out there. What if somebody decides to write a Gemini client for FreeDOS? IIRC, that OS still uses CR and LF for line endings, just like MS-DOS did. Wouldn't a LF-only requirement hamper such an effort? -- Matthew Graybosch https://www.matthewgraybosch.com All opinions are my own. Harrisburg, PA USA "Out of order?! Even in the future nothing works!"
It was thus said that the Great Ryan Kavanagh once stated: > > Here's [0,1] some of the history of requiring CRLF in network protocols > and in requiring CRLF for text/ subtypes [2] during transmission. > > [0] https://www.rfc-editor.org/old/EOLstory.txt > [1] https://tools.ietf.org/html/rfc318 > [ page 8, "End of Line Convention" ] Thank you for this. I'm saving the references for future discussions on this topic. -spc
Isn't it only a text editor issue? Wordpad is working for unix like text file... Sent with ProtonMail Secure Email. ??????? Original Message ??????? On Friday 5 June 2020 00:21, Matthew Graybosch <hello at matthewgraybosch.com> wrote: > On Thu, 4 Jun 2020 17:53:14 -0400 > Sean Conner sean at conman.org wrote: > > > And I disagree with this disagreement. Requiring only LF produces > > an undue hardship on Windows systems which use both CR and LF. And > > Windows is still the most popular operating system out there. > > What if somebody decides to write a Gemini client for FreeDOS? IIRC, > that OS still uses CR and LF for line endings, just like MS-DOS did. > Wouldn't a LF-only requirement hamper such an effort? > > ------------------------------------------------------------------------- --------------------------------------------------------------------------- ---------------------------------------------- > > Matthew Graybosch https://www.matthewgraybosch.com > All opinions are my own. Harrisburg, PA USA > > "Out of order?! Even in the future nothing works!"
---
Previous Thread: Lightweight Unicode Author Client Hinting - LUACH proposal
Next Thread: [ANN] New server: gemini://gemini.marmaladefoo.com/