πΎ Archived View for gemi.dev βΊ gemini-mailing-list βΊ 001045.gmi captured on 2024-08-31 at 19:24:48. Gemini links have been rewritten to link to archived content
β¬ οΈ Previous capture (2023-12-28)
-=-=-=-=-=-=-
Hi all, Since I don't have (and am unable to create) a gitlab account, I wrote a Gemlog post detailing my responses to a bunch of the issues on the gitlab repos for Sean Conner's spec revisions. Posting here to increase the likelyhood that other relevant people will be able to see it. => //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP ~nytpu -- Alex // nytpu alex@nytpu.com gpg --locate-external-key alex@nytpu.com https://useplaintext.email/
Alex // nytpu <alex@nytpu.com> writes: > [[PGP Signed Part:Undecided]] > Hi all, > > Since I don't have (and am unable to create) a gitlab account, I wrote a > Gemlog post detailing my responses to a bunch of the issues on the > gitlab repos for Sean Conner's spec revisions. I can wholeheartedly agree with the gitlab rant. I've never used it before and was quite shocked of how bad it is. Even github is "decent" in this regard, on a technical level at least. I can at least *read* a README, the code or the issues with w3m. But it's a sailed ship. We can only try to prevent similar moves in the future. > Posting here to increase the likelyhood that other relevant people will > be able to see it. > > => //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP I'm not sure if this is the best place to reply to you post, but the alternative would be to open gitlab, post your link under the mentioned issues and reply there which... I don't really want to do it. If I can avoid open gitlab, all the better ;-) 1. whitespace after gemtext elements I don't have strong opinion on this, but on the other hand I don't see a real motivation to require a space in your post nor in the gitlab discussion. Whitespaces should not be mandatory if not strictly required to separate fields (like in a link line) in my opinion at least. But yes, I do always write '# hello there' and not '#hello there'. 2. BOM > If you're making something for non-tech people to use and they use bad > editors that include a BOM, it should be your responsibility to remove > it before publishing the document. I'm not sure this would be viable. If you look at the original report from Gnuserland you'd see a confused user that doesn't know what a BOM is or how to deal with it. He simply typed something in his preferred text editor (which is mis-configured btw, why would someone on unix force CRLF line endings is beyond my understanding), published and it was slightly broken. Declaring it out-of-scope for the protocol but reminding client authors that bad documents may have a BOM in the best practice document seem the most sensible solution to me. I even thought about adding some kind of "feedback" to the user on how the page is structured. Say some kind of linter for things like hard wrapping, bom, etc. It may become annoying thought. 3. close_notify Is it still a problem? :D (Sometimes I've left dangling questions like this hoping for Bortzmeyer to chime in and share some stats. In the past it worked, hope he share some this time too ;-) 4. dumb new feature proposals I just love reading them ;) Taking this in slightly OT direction: in what manner should client authors experiment with extensions in their clients? I know there isn't a reply, if the project is mine I can do the hell I want with it, and since most (all?) clients are free software I can take an existing one and modify the hell out of it, and I'm grateful for this. I know also the "don't extend gemini" mantra, and I repeat myself too. But does improving how a document is rendered account as extending the protocol? If I, say, replace the "---" lines with a nice separator, does it count as extending gemini or just a rendering nicety of the client? (multi-level lists gravitates too much toward the extension side I guess, but who cares) > ~nytpu
On Mon, Oct 11, 2021 at 08:57:59AM +0200, Omar Polo <op@omarpolo.com> wrote a message of 87 lines which said: > 3. close_notify > > Is it still a problem? :D Yes :-( > (Sometimes I've left dangling questions like this hoping for Bortzmeyer > to chime in and share some stats. In the past it worked, hope he share > some this time too ;-) "50.4Β % of URLs do NOT send a proper TLS shutdown (application close). Even 36.8Β % of those who return status 20 are in that case." The future RFC on HTTP (completely rewritten and reorganised) has a nice explanation: 9.8. TLS Connection Closure TLS uses an exchange of closure alerts prior to (non-error) connection closure to provide secure connection closure; see Section 6.1 of [TLS13]. When a valid closure alert is received, an implementation can be assured that no further data will be received on that connection. When an implementation knows that it has sent or received all the message data that it cares about, typically by detecting HTTP message boundaries, it might generate an "incomplete close" by sending a closure alert and then closing the connection without waiting to receive the corresponding closure alert from its peer. An incomplete close does not call into question the security of the data already received, but it could indicate that subsequent data might have been truncated. As TLS is not directly aware of HTTP message framing, it is necessary to examine the HTTP data itself to determine whether messages were complete. Handling of incomplete messages is defined in Section 8. When encountering an incomplete close, a client SHOULD treat as completed all requests for which it has received as much data as specified in the Content-Length header or, when a Transfer-Encoding of chunked is used, for which the terminal zero-length chunk has been received. A response that has neither chunked transfer coding nor Content-Length is complete only if a valid closure alert has been received. Treating an incomplete message as complete could expose implementations to attack. A client detecting an incomplete close SHOULD recover gracefully. Clients MUST send a closure alert before closing the connection. Clients that do not expect to receive any more data MAY choose not to wait for the server's closure alert and simply close the connection, thus generating an incomplete close on the server side. Servers SHOULD be prepared to receive an incomplete close from the client, since the client can often determine when the end of server data is. Servers MUST attempt to initiate an exchange of closure alerts with the client before closing the connection. Servers MAY close the connection after sending the closure alert, thus generating an incomplete close on the client side. And also: 11.3. Message Integrity ... Care is needed however to ensure that connection closure cannot be used to truncate messages (see Section 9.8). User agents might refuse to accept incomplete messages or treat them specially. For example, a browser being used to view medical history or drug interaction information needs to indicate to the user when such information is detected by the protocol to be incomplete, expired, or corrupted during transfer. Such mechanisms might be selectively enabled via user agent extensions or the presence of message integrity metadata in a response.
Hello Alex, I find using GitLab horrificly expedient, it would be nice to not be dependent on it. I am currently working on creating a GemText based issue tracker, leveraging git repos and a simplified directory structure. Hopefully, one day we can federate issue repos, using tools like grokmirror and gitolite. And not be dependent on one gitforge in particular. Things I intend to work on include proxies for http issue pages and kanban boards. Im a big fan of elinks (though I stopped when it languished and need to package the recent fork, felinks, which is developing Gemini compatability). Should it get packaged on Guix (which Id like to get around to) I will try that for a parsing environment. Perhaps people can federate GemText equivalents as part of an eLinks (et al) hook. Jonathan McHugh indieterminacy@libre.brussels Alex // nytpu <alex@nytpu.com> writes: > [[PGP Signed Part:Undecided]] > Hi all, > > Since I don't have (and am unable to create) a gitlab account, I wrote a > Gemlog post detailing my responses to a bunch of the issues on the > gitlab repos for Sean Conner's spec revisions. > > Posting here to increase the likelyhood that other relevant people will > be able to see it. > > => //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP > > ~nytpu
On Mon, 11 Oct 2021 at 09:12, Omar Polo <op@omarpolo.com> wrote: > 1. whitespace after gemtext elements > > I don't have strong opinion on this, but on the other hand I don't see a > real motivation to require a space in your post nor in the gitlab > discussion. Whitespaces should not be mandatory if not strictly > required to separate fields (like in a link line) in my opinion at > least. But yes, I do always write '# hello there' and not '#hello > there'. As someone who's making a basic gemini client, having the whitespace makes it alot simpler, you can just split the line on the space and do a `switch` on the first part. Not having a space means you'd have to test if the line starts with different things, which would be very annoying and slower in most cases. Having the whitespace is easier for clients, and also looks better. I see no downside to enforcing it in the spec (a SHOULD or MUST). > Taking this in slightly OT direction: in what manner should client > authors experiment with extensions in their clients? I know there isn't > a reply, if the project is mine I can do the hell I want with it, and > since most (all?) clients are free software I can take an existing one > and modify the hell out of it, and I'm grateful for this. > > I know also the "don't extend gemini" mantra, and I repeat myself too. Clients can do what the hell they like IMO, as long as things that transmit over the net obey the spec. So gemtext is pretty unlimited, but making protocol requests is strictly limited. Something like replacing `---` is entirely a client-side thing and affects no one but the reader. The spec is a baseline for a minimum working thing, there's a reason alot of it is "SHOULD''/"MAY" rather than "MUST". -Oliver Simmons (GoodClover)
Oliver Simmons <oliversimmo@gmail.com> writes: > On Mon, 11 Oct 2021 at 09:12, Omar Polo <op@omarpolo.com> wrote: >> 1. whitespace after gemtext elements >> >> I don't have strong opinion on this, but on the other hand I don't see a >> real motivation to require a space in your post nor in the gitlab >> discussion. Whitespaces should not be mandatory if not strictly >> required to separate fields (like in a link line) in my opinion at >> least. But yes, I do always write '# hello there' and not '#hello >> there'. > > As someone who's making a basic gemini client, having the whitespace > makes it alot simpler, you can just split the line on the space and do > a `switch` on the first part. > Not having a space means you'd have to test if the line starts with > different things, which would be very annoying and slower in most > cases. > Having the whitespace is easier for clients, I've seen this argument in the gitlab issue too, but sorry, I don't believe it. In what language(s) splitting a string is faster than checking for a prefix? Splitting requires the allocation of multiple objects, while the prefix only requires a scan of the first few bytes. To be more precise: splitting on a space will always be slower than checking for a prefix even if we ignore the cost of allocating the strings because you'd have to first scan the string for the first space (which can be far into the line) and then the cost of comparing strings (i.e. another scan) while checking for a prefix requires always to only compare the first few bytes. Even if we eventually decide to mandate a whitespace, checking for a prefix would still lead to better and faster code. > and also looks better. I totally agree! It *absolutely* looks better, but I think we shouldn't account for aesthetic too much in the spec, as they tend to change from time to time and from one person to another. > I see no downside to enforcing it in the spec (a SHOULD or MUST). My argument is kind the opposite: if there isn't a (strong) reason for requiring something, then that something MUST be optional. Whitespaces are required in the link line to separate unambiguously the link from the label, the other whitespaces in the "special" lines don't serve this purpose so they need to be completely optional.
>As someone who's making a basic gemini client, having the whitespace >makes it alot simpler, you can just split the line on the space and do >a `switch` on the first part. >Not having a space means you'd have to test if the line starts with >different things, which would be very annoying and slower in most >cases. Doesn't the spec say that line type indicators are only three characters maximum? It also implies that line type indicators should be the first thing on the line and that nothing should come before them (i.e. no whitespace before the indicator). That should mean that simply taking a three character substring of the line should be enough to determine whether it has a line type indicator and, if so, which type. That should be relatively easy and quick to parse as there's only about 5-6 different cases to handle.
Stephane Bortzmeyer <stephane@sources.org> writes: > On Mon, Oct 11, 2021 at 08:57:59AM +0200, > Omar Polo <op@omarpolo.com> wrote > a message of 87 lines which said: > >> 3. close_notify >> >> Is it still a problem? :D > > Yes :-( > >> (Sometimes I've left dangling questions like this hoping for Bortzmeyer >> to chime in and share some stats. In the past it worked, hope he share >> some this time too ;-) > > "50.4Β % of URLs do NOT send a proper TLS shutdown (application > close). Even 36.8Β % of those who return status 20 are in that case." It's worst than what I thought! We know what software this servers are using? Thanks for chiming in and also for sharing the excerpt about close_notify :) > The future RFC on HTTP (completely rewritten and reorganised) has a > nice explanation: > > 9.8. TLS Connection Closure > > TLS uses an exchange of closure alerts prior to (non-error) > connection closure to provide secure connection closure; see > Section 6.1 of [TLS13]. When a valid closure alert is received, an > implementation can be assured that no further data will be received > on that connection. > > When an implementation knows that it has sent or received all the > message data that it cares about, typically by detecting HTTP message > boundaries, it might generate an "incomplete close" by sending a > closure alert and then closing the connection without waiting to > receive the corresponding closure alert from its peer. > > An incomplete close does not call into question the security of the > data already received, but it could indicate that subsequent data > might have been truncated. As TLS is not directly aware of HTTP > message framing, it is necessary to examine the HTTP data itself to > determine whether messages were complete. Handling of incomplete > messages is defined in Section 8. > > When encountering an incomplete close, a client SHOULD treat as > completed all requests for which it has received as much data as > specified in the Content-Length header or, when a Transfer-Encoding > of chunked is used, for which the terminal zero-length chunk has been > received. A response that has neither chunked transfer coding nor > Content-Length is complete only if a valid closure alert has been > received. Treating an incomplete message as complete could expose > implementations to attack. > > A client detecting an incomplete close SHOULD recover gracefully. > > Clients MUST send a closure alert before closing the connection. > Clients that do not expect to receive any more data MAY choose not to > wait for the server's closure alert and simply close the connection, > thus generating an incomplete close on the server side. > > Servers SHOULD be prepared to receive an incomplete close from the > client, since the client can often determine when the end of server > data is. > > Servers MUST attempt to initiate an exchange of closure alerts with > the client before closing the connection. Servers MAY close the > connection after sending the closure alert, thus generating an > incomplete close on the client side. > > And also: > > 11.3. Message Integrity > ... > Care is needed however to ensure that connection closure > cannot be used to truncate messages (see Section 9.8). User agents > might refuse to accept incomplete messages or treat them specially. > For example, a browser being used to view medical history or drug > interaction information needs to indicate to the user when such > information is detected by the protocol to be incomplete, expired, or > corrupted during transfer. Such mechanisms might be selectively > enabled via user agent extensions or the presence of message > integrity metadata in a response. >
On 11/10/2021 13:51, Oliver Simmons wrote: > Clients can do what the hell they like IMO, as long as things that > transmit over the net obey the spec. > So gemtext is pretty unlimited, but making protocol requests is > strictly limited. > Something like replacing `---` is entirely a client-side thing and > affects no one but the reader. The current spec states: Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below). Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion. For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied. Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc. Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content. This gives clients a broad discretion as to what visual modifications they make to text lines by altering font size, colours, spacing, etc. It doesn't appear to go as far as permitting clients to amend or replace the actual text that appears on a text line, and appears to suggest that authors should expect to exercise control over the precise rendering of their "actual textual content". (At least, my interpretation of the second last sentence is that clients may allow users to customise appearance of text lines by altering text colour, not text itself, though I appreciate it's slightly ambiguous.) The problem I have with separators and similar visual niceties is that they involve deleting or replacing text that was put there by the author. What if an author didn't want to put a separator there, but really wanted to put "---"? Unless the spec provides that "---" means a separator it is not reasonable to expect authors to know that. In truth I'm not sure in what circumstances a "---" text line would be intended as something other than a separator, but I'm sure other authors are more imaginative than I am. To take another example, I have regularly encountered situations where a single * in a markdown document is incorrectly interpreted as marking the beginning of italicised text, so the rest of the document is italicised inappropriately. I'd like for that not to become commonplace in Geminispace. Separately, on the whitespace issue, I do think it would be helpful to clarify in the spec whether whitespace is mandatory, particularly for headers. For example, should the line "#### Hello" be interpreted as (i) a level 3 header whose text is "# Hello", or (ii) a text line whose text is "#### Hello"? AFAIK that is ambiguous unless there is a clear stance on mandatory whitespace in the spec.
On Mon, 11 Oct 2021 15:44:33 +0100, Alan Bunbury wrote: > For example, should the line "#### Hello" be interpreted as (i) > a level 3 header whose text is "# Hello", or (ii) a text line whose text > is "#### Hello"? AFAIK that is ambiguous unless there is a clear stance > on mandatory whitespace in the spec. Considering 5.3 of the current spec, "#### Hello" is to be interpreted as (i), i. e. as a level-three-header with content "# Hello", I guess: > It is possible to unambiguously determine a line's type purely by inspecting its first three characters. https://gemini.circumlunar.space/docs/specification.gmi
October 11, 2021 10:44 AM, "Alan Bunbury" <gemini@bunburya.eu> wrote: > In truth I'm not sure in what circumstances a "---" text line would be intended as something other > than a separator, but I'm sure other authors are more imaginative than I am. To take another > example, I have regularly encountered situations where a single * in a markdown document is > incorrectly interpreted as marking the beginning of italicised text, so the rest of the document is > italicised inappropriately. I'd like for that not to become commonplace in Geminispace. I fail to see how replacing a line that has only `---` on it with a graphical separator is anything like the runaway italics thing you mentioned. Still, I can kind of see where you're going with that. > Separately, on the whitespace issue, I do think it would be helpful to clarify in the spec whether > whitespace is mandatory, particularly for headers. For example, should the line "#### Hello" be > interpreted as (i) a level 3 header whose text is "# Hello", or (ii) a text line whose text is > "#### Hello"? AFAIK that is ambiguous unless there is a clear stance on mandatory whitespace in the > spec. That is not ambiguous, with or without mandatory whitespace. As Plain Text pointed out, the max amount of characters used to determine the linetype is the first 3, per 5.3 in the gemtext spec (awkwardly numbered because it was originally part of the protocol spec): > It is possible to unambiguously determine a line's type purely by inspecting its first three characters. Therefore, any (good) client will see that the first 3 characters of the line are "###" and correctly call it what it is: a level 3 header with the text "# Hello". I fail to see how that would be ambiguous (I guess the spec doesn't do *that* good of a job explaining it, but I would think you could catch on by the fact that the section on header lines only gives examples of #, ##, and ###). Just my two cents, Robert "khuxkm" Miles
On Mon, 11 Oct 2021 at 15:12, Omar Polo <op@omarpolo.com> wrote: > I've seen this argument in the gitlab issue too [β¦] I haven't checked the issue yet, will do after sending this. > In what language(s) splitting a string is faster than > checking for a prefix? Splitting requires the allocation of multiple > objects, while the prefix only requires a scan of the first few bytes. I said simpler, not faster. What you said is true in some cases, but not everyone is striving for optimisation speed-wise. It'll depend on the language used, but splitting allows you to use a simple equality switch statement, which isn't possible by checking with a prefix. The way I understand your message, I would have to use an else-if list, which is hardly ideal. e.g. in C#: ``` // If it's <3 chars then just treat it as a text line (the default). switch ((line.Length < 3) ? "" : line.Substring(0, 3).Split(" ", 2)[0]) { case "=>": β¦ case "* ": β¦ β¦ and so on β¦ default: β¦ } ``` vs ``` if (line.StartsWith("=> ") { β¦ } else if (line.StartsWith("* ") { β¦ } β¦ and so on β¦ else { β¦ } ``` At the least, it should be required for link (as you said) and list lines ("* "). I've seen where people have tried to use *emphasis* at the start of a line and got a bullet point by mistake. > > I see no downside to enforcing it in the spec (a SHOULD or MUST). > > My argument is kind the opposite: if there isn't a (strong) reason for > requiring something, then that something MUST be optional. Whitespaces > are required in the link line to separate unambiguously the link from > the label, the other whitespaces in the "special" lines don't serve this > purpose so they need to be completely optional. At the very least it should be recommended by the spec IMO. -Oliver Simmons (GoodClover)
On Mon, 11 Oct 2021 at 15:07, Chris McGowan <cmcgowan9990@gmail.com> wrote: > Doesn't the spec say that line type indicators are only three characters maximum? > It also implies that line type indicators should be the first thing on the line and that nothing should come before them (i.e. no whitespace before the indicator). Yes and yup, we're talking about the space after the indicator. > That should mean that simply taking a three character substring of the line should be enough to determine whether it has a line type indicator and, if so, which type. That should be relatively easy and quick to parse as there's only about 5-6 different cases to handle. Unfortunately, no. For example, take this line: `# Foo bar I'm a level-1 title` A 3-char substring of that would yield "# F", which isn't useful. It would work if the spec required (MUST) you to add whitespace padding the indicator to three characters, but that's not how it is. To determine the line-type you have to do a starts-with check or split on the space like me and Omar are saying. -Oliver Simmons (GoodClover)
> Unfortunately, no. For example, take this line: `# Foo bar I'm a level-1 title` > A 3-char substring of that would yield "# F", which isn't useful. In what way isn't it useful? It tells you literally everything you need to know. An example (in Perl): ``` my $first3 = substr $line, 0, 3; # slightly magical regex, /g will return an array of matches, # assigning back to a scalar gives us a count of matches if ( my $level = $first3 =~ m/(#)+/g ) { return "Level $level header"; } elsif ( $first3 =~ m/=>/ ) { return "Link" } elsif( $first3 =~ m/```/ ) { return "preformatted"; } elsif( $first3 =~ m/\*/ ) { return "list item"; } elsif ( $first3 =~ m/>/ ) { return 'Blockquote'; } else { return "Text"; } ``` That's a simplified, very naive gemtext parser I wrote in my email client in about 3 minutes. It took longer to remember all of the list types than it did to write the code for them. In fact, the substring isn't even necessary in this code as I could anchor the regex at the start of the line like so: ``` if ( $line =~ m/^\*/ ) { return "list item"; } ``` but that's largely true for languages which have decent regex support. If you weren't using one of those (i.e. C) or are for some reason allergic to regexes you could simply index the string to determine the line type (note: this would likely improve speed, but probably only a imperceptibly small amount and likely wouldn't be worth it.) Just to really drive home the point that this isn't a difficult task, here's the version I wouldn't write unless I was using C (still in Perl though): ``` # Note: split here is because perl doesn't allow direct subscripting of # strings. In languages that do allow that, this other array is # unnecessary and you could use $line directly. my @first3 = split( "", substr( $line, 0, 3)); if ( $first3[0] eq '#' ) { if ( $first3[1] eq '#' ) { if ( $first3[2] eq '#' ) { return "Level 3 header"; } return "Level 2 header"; } return "Level 1 header"; } elsif ( $first3[0] eq '=' && $first3[1] eq '>' ) { return "link"; } elsif ( $first3[0] eq '*' ) { return 'List Item'; } elsif ($first3[0] eq '>' ) { return "Blockquote"; } elsif( $first3[0] eq '`' && $first3[1] eq '`' && $first3[2] eq '`' ) { return "preformatted"; } else { return "Text"; } ``` It's a bit more annoying to write, sure but it's still really simple. That's ~33 lines of code (mostly because of the Allman style braces, honestly.) It only took me 5 minutes to write. In summary, I hardly think it's impossible or even difficult to unambiguously parse gemtext without having a mandatory space.
Oliver Simmons <oliversimmo@gmail.com> writes: > On Mon, 11 Oct 2021 at 15:12, Omar Polo <op@omarpolo.com> wrote: >> I've seen this argument in the gitlab issue too [β¦] > > I haven't checked the issue yet, will do after sending this. > >> In what language(s) splitting a string is faster than >> checking for a prefix? Splitting requires the allocation of multiple >> objects, while the prefix only requires a scan of the first few bytes. > > I said simpler, not faster. What you said is true in some cases, but > not everyone is striving for optimisation speed-wise. You didn't said "faster", true, but said that (emphasis mine) > Not having a space means you'd have to test if the line starts with > different things, which would be very annoying and **slower** in most > cases. I was contradicting that. > It'll depend on the language used, but splitting allows you to use a > simple equality switch statement, which isn't possible by checking > with a prefix. (btw, checking for equality inside a switch statement doesn't work for strings in languages like C or Java. Err... yes, it works, but it's not same the equality you mean ;-) > The way I understand your message, I would have to use an else-if > list, which is hardly ideal. This depends on the language design. Some languages allows expression inside switches, like Go IIRC, so you could write switch { case strings.HasPrefix(line, "*"): // ... case strings.HasPrefix(line, "###"): // ... ... } other allows to do more elaborate things (clojure for example) (defn has-prefix? [prefix str] (str/starts-with? str prefix)) (condp has-prefix? line "*" :item "=>" :link "###" :header-3 ,,,) Even when we take into account an ancient language like C, you could take advantage that the first byte of a line is enough to get an idea of its type and greatly reduce the number of chained ifs: (this is more or less what I have in telescope) switch (*line) { case '*': return LINE_ITEM; case '>': return LINE_QUOTE; case '=': if (line[1] == '>') return LINE_LINK; break; case '#': /* some ifs to check whether is a level 1, 2 or 3 */ ... case '`': /* check for a ``` marker */ ... } return LINE_TEXT; I don't think taking into account the particularities of one specific programming language is a wise choice for a markup language meant to be written by humans for humans. The question should thus become: is it intuitive for a random user that #hello world and # hello world are effectively the same line? Let's forget the code when tackling these issues, we think better when we're not in front of a keyboard. > [...] > > At the least, it should be required for link (as you said) and list > lines ("* "). Probably I was too ambiguous. My point was that in a link line a space in necessary between the link and the label, not after the marker. So, outside of the mandatory space to separate a link and its label, whitespaces are irrelevant. > I've seen where people have tried to use *emphasis* at > the start of a line and got a bullet point by mistake. I've seen people writing like that, and a conforming client (IMHO) should consider those lines items. It's like using => something like this <= to highlight text and then complaining that a client mis-render a line because the author tried to "highlight" the first words and now it's a link. Who cares? Gemini doesn't have inline formatting, so why bother trying to support it? (I've used some *emphasis* on some pages too, but more I write and more I think I shouldn't, it's easier to read without too much noise. That's my opinion, at least.) Anyway, whatever the final decision will be, I hope we could at least ensure that all the clients are consistent in their rendering. >> > I see no downside to enforcing it in the spec (a SHOULD or MUST). >> >> My argument is kind the opposite: if there isn't a (strong) reason for >> requiring something, then that something MUST be optional. Whitespaces >> are required in the link line to separate unambiguously the link from >> the label, the other whitespaces in the "special" lines don't serve this >> purpose so they need to be completely optional. > > At the very least it should be recommended by the spec IMO. > > -Oliver Simmons (GoodClover)
On Mon, 11 Oct 2021 15:15:44 +0000, Plain Text wrote: > https://gemini.circumlunar.space/docs/specification.gmi My try on identifying line types using Python re named groups what became a quite unreadable line, also missing ```, sorry. line.py import re, sys for line in sys.stdin: m = re.match(r'((?P<heads>(?P<h3>###)|(?P<h2>##)|(?P<h1>#))|(?P<list>\* )|(?P<link>=> (?P<url>[^\s]+))|(?P<quote>>))\s*(?P<content>.*)