[spec] comments on the proposed gemini spec revisions
- π§ Messages: 50
- π£οΈ Authors: 24
- π
First Message: 2021-10-10 23:13
- π
Last Message: 2021-10-28 18:20
1. Alex // nytpu (alex (a) nytpu.com)
- π
Sent: 2021-10-10 23:13
- π§ Message 1 of 50
Hi all,
Since I don't have (and am unable to create) a gitlab account, I wrote a
Gemlog post detailing my responses to a bunch of the issues on the
gitlab repos for Sean Conner's spec revisions.
Posting here to increase the likelyhood that other relevant people will
be able to see it.
=> //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP
~nytpu
--
Alex // nytpu
alex@nytpu.com
gpg --locate-external-key alex@nytpu.com
https://useplaintext.email/
Link to individual message.
2. Omar Polo (op (a) omarpolo.com)
- Subject Changed! New Subject: Re: [spec] comments on the proposed gemini spec revisions
- π
Sent: 2021-10-11 06:57
- π§ Message 2 of 50
Alex // nytpu <alex@nytpu.com> writes:
> [[PGP Signed Part:Undecided]]
> Hi all,
>
> Since I don't have (and am unable to create) a gitlab account, I wrote a
> Gemlog post detailing my responses to a bunch of the issues on the
> gitlab repos for Sean Conner's spec revisions.
I can wholeheartedly agree with the gitlab rant. I've never used it
before and was quite shocked of how bad it is. Even github is "decent"
in this regard, on a technical level at least. I can at least *read* a
README, the code or the issues with w3m.
But it's a sailed ship. We can only try to prevent similar moves in the
future.
> Posting here to increase the likelyhood that other relevant people will
> be able to see it.
>
> => //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP
I'm not sure if this is the best place to reply to you post, but the
alternative would be to open gitlab, post your link under the mentioned
issues and reply there which... I don't really want to do it. If I can
avoid open gitlab, all the better ;-)
1. whitespace after gemtext elements
I don't have strong opinion on this, but on the other hand I don't see a
real motivation to require a space in your post nor in the gitlab
discussion. Whitespaces should not be mandatory if not strictly
required to separate fields (like in a link line) in my opinion at
least. But yes, I do always write '# hello there' and not '#hello
there'.
2. BOM
> If you're making something for non-tech people to use and they use bad
> editors that include a BOM, it should be your responsibility to remove
> it before publishing the document.
I'm not sure this would be viable. If you look at the original report
from Gnuserland you'd see a confused user that doesn't know what a BOM
is or how to deal with it. He simply typed something in his preferred
text editor (which is mis-configured btw, why would someone on unix
force CRLF line endings is beyond my understanding), published and it
was slightly broken.
Declaring it out-of-scope for the protocol but reminding client authors
that bad documents may have a BOM in the best practice document seem the
most sensible solution to me.
I even thought about adding some kind of "feedback" to the user on how
the page is structured. Say some kind of linter for things like hard
wrapping, bom, etc. It may become annoying thought.
3. close_notify
Is it still a problem? :D
(Sometimes I've left dangling questions like this hoping for Bortzmeyer
to chime in and share some stats. In the past it worked, hope he share
some this time too ;-)
4. dumb new feature proposals
I just love reading them ;)
Taking this in slightly OT direction: in what manner should client
authors experiment with extensions in their clients? I know there isn't
a reply, if the project is mine I can do the hell I want with it, and
since most (all?) clients are free software I can take an existing one
and modify the hell out of it, and I'm grateful for this.
I know also the "don't extend gemini" mantra, and I repeat myself too.
But does improving how a document is rendered account as extending the
protocol? If I, say, replace the "---" lines with a nice separator,
does it count as extending gemini or just a rendering nicety of the
client?
(multi-level lists gravitates too much toward the extension side I
guess, but who cares)
> ~nytpu
Link to individual message.
3. Stephane Bortzmeyer (stephane (a) sources.org)
- π
Sent: 2021-10-11 08:56
- π§ Message 3 of 50
On Mon, Oct 11, 2021 at 08:57:59AM +0200,
Omar Polo <op@omarpolo.com> wrote
a message of 87 lines which said:
> 3. close_notify
>
> Is it still a problem? :D
Yes :-(
> (Sometimes I've left dangling questions like this hoping for Bortzmeyer
> to chime in and share some stats. In the past it worked, hope he share
> some this time too ;-)
"50.4Β % of URLs do NOT send a proper TLS shutdown (application
close). Even 36.8Β % of those who return status 20 are in that case."
The future RFC on HTTP (completely rewritten and reorganised) has a
nice explanation:
9.8. TLS Connection Closure
TLS uses an exchange of closure alerts prior to (non-error)
connection closure to provide secure connection closure; see
Section 6.1 of [TLS13]. When a valid closure alert is received, an
implementation can be assured that no further data will be received
on that connection.
When an implementation knows that it has sent or received all the
message data that it cares about, typically by detecting HTTP message
boundaries, it might generate an "incomplete close" by sending a
closure alert and then closing the connection without waiting to
receive the corresponding closure alert from its peer.
An incomplete close does not call into question the security of the
data already received, but it could indicate that subsequent data
might have been truncated. As TLS is not directly aware of HTTP
message framing, it is necessary to examine the HTTP data itself to
determine whether messages were complete. Handling of incomplete
messages is defined in Section 8.
When encountering an incomplete close, a client SHOULD treat as
completed all requests for which it has received as much data as
specified in the Content-Length header or, when a Transfer-Encoding
of chunked is used, for which the terminal zero-length chunk has been
received. A response that has neither chunked transfer coding nor
Content-Length is complete only if a valid closure alert has been
received. Treating an incomplete message as complete could expose
implementations to attack.
A client detecting an incomplete close SHOULD recover gracefully.
Clients MUST send a closure alert before closing the connection.
Clients that do not expect to receive any more data MAY choose not to
wait for the server's closure alert and simply close the connection,
thus generating an incomplete close on the server side.
Servers SHOULD be prepared to receive an incomplete close from the
client, since the client can often determine when the end of server
data is.
Servers MUST attempt to initiate an exchange of closure alerts with
the client before closing the connection. Servers MAY close the
connection after sending the closure alert, thus generating an
incomplete close on the client side.
And also:
11.3. Message Integrity
...
Care is needed however to ensure that connection closure
cannot be used to truncate messages (see Section 9.8). User agents
might refuse to accept incomplete messages or treat them specially.
For example, a browser being used to view medical history or drug
interaction information needs to indicate to the user when such
information is detected by the protocol to be incomplete, expired, or
corrupted during transfer. Such mechanisms might be selectively
enabled via user agent extensions or the presence of message
integrity metadata in a response.
Link to individual message.
4. (indieterminacy (a) libre.brussels)
- π
Sent: 2021-10-11 09:51
- π§ Message 4 of 50
Hello Alex,
I find using GitLab horrificly expedient, it would be nice to not be
dependent on it.
I am currently working on creating a GemText based issue tracker,
leveraging git repos and a simplified directory structure.
Hopefully, one day we can federate issue repos, using tools like
grokmirror and gitolite. And not be dependent on one gitforge in
particular.
Things I intend to work on include proxies for http issue pages and
kanban boards.
Im a big fan of elinks (though I stopped when it languished and need to
package the recent fork, felinks, which is developing Gemini
compatability). Should it get packaged on Guix (which Id like to get
around to) I will try that for a parsing environment. Perhaps people can
federate GemText equivalents as part of an eLinks (et al) hook.
Jonathan McHugh
indieterminacy@libre.brussels
Alex // nytpu <alex@nytpu.com> writes:
> [[PGP Signed Part:Undecided]]
> Hi all,
>
> Since I don't have (and am unable to create) a gitlab account, I wrote a
> Gemlog post detailing my responses to a bunch of the issues on the
> gitlab repos for Sean Conner's spec revisions.
>
> Posting here to increase the likelyhood that other relevant people will
> be able to see it.
>
> => //nytpu.com/gemlog/2021-10-10.gmi Available over Gemini and HTTP
>
> ~nytpu
Link to individual message.
5. Oliver Simmons (oliversimmo (a) gmail.com)
- π
Sent: 2021-10-11 12:51
- π§ Message 5 of 50
On Mon, 11 Oct 2021 at 09:12, Omar Polo <op@omarpolo.com> wrote:
> 1. whitespace after gemtext elements
>
> I don't have strong opinion on this, but on the other hand I don't see a
> real motivation to require a space in your post nor in the gitlab
> discussion. Whitespaces should not be mandatory if not strictly
> required to separate fields (like in a link line) in my opinion at
> least. But yes, I do always write '# hello there' and not '#hello
> there'.
As someone who's making a basic gemini client, having the whitespace
makes it alot simpler, you can just split the line on the space and do
a `switch` on the first part.
Not having a space means you'd have to test if the line starts with
different things, which would be very annoying and slower in most
cases.
Having the whitespace is easier for clients, and also looks better.
I see no downside to enforcing it in the spec (a SHOULD or MUST).
> Taking this in slightly OT direction: in what manner should client
> authors experiment with extensions in their clients? I know there isn't
> a reply, if the project is mine I can do the hell I want with it, and
> since most (all?) clients are free software I can take an existing one
> and modify the hell out of it, and I'm grateful for this.
>
> I know also the "don't extend gemini" mantra, and I repeat myself too.
Clients can do what the hell they like IMO, as long as things that
transmit over the net obey the spec.
So gemtext is pretty unlimited, but making protocol requests is
strictly limited.
Something like replacing `---` is entirely a client-side thing and
affects no one but the reader.
The spec is a baseline for a minimum working thing, there's a reason
alot of it is "SHOULD''/"MAY" rather than "MUST".
-Oliver Simmons (GoodClover)
Link to individual message.
6. Omar Polo (op (a) omarpolo.com)
- π
Sent: 2021-10-11 13:29
- π§ Message 6 of 50
Oliver Simmons <oliversimmo@gmail.com> writes:
> On Mon, 11 Oct 2021 at 09:12, Omar Polo <op@omarpolo.com> wrote:
>> 1. whitespace after gemtext elements
>>
>> I don't have strong opinion on this, but on the other hand I don't see a
>> real motivation to require a space in your post nor in the gitlab
>> discussion. Whitespaces should not be mandatory if not strictly
>> required to separate fields (like in a link line) in my opinion at
>> least. But yes, I do always write '# hello there' and not '#hello
>> there'.
>
> As someone who's making a basic gemini client, having the whitespace
> makes it alot simpler, you can just split the line on the space and do
> a `switch` on the first part.
> Not having a space means you'd have to test if the line starts with
> different things, which would be very annoying and slower in most
> cases.
> Having the whitespace is easier for clients,
I've seen this argument in the gitlab issue too, but sorry, I don't
believe it. In what language(s) splitting a string is faster than
checking for a prefix? Splitting requires the allocation of multiple
objects, while the prefix only requires a scan of the first few bytes.
To be more precise: splitting on a space will always be slower than
checking for a prefix even if we ignore the cost of allocating the
strings because you'd have to first scan the string for the first space
(which can be far into the line) and then the cost of comparing strings
(i.e. another scan) while checking for a prefix requires always to only
compare the first few bytes.
Even if we eventually decide to mandate a whitespace, checking for a
prefix would still lead to better and faster code.
> and also looks better.
I totally agree! It *absolutely* looks better, but I think we shouldn't
account for aesthetic too much in the spec, as they tend to change from
time to time and from one person to another.
> I see no downside to enforcing it in the spec (a SHOULD or MUST).
My argument is kind the opposite: if there isn't a (strong) reason for
requiring something, then that something MUST be optional. Whitespaces
are required in the link line to separate unambiguously the link from
the label, the other whitespaces in the "special" lines don't serve this
purpose so they need to be completely optional.
Link to individual message.
7. Chris McGowan (cmcgowan9990 (a) gmail.com)
- π
Sent: 2021-10-11 13:44
- π§ Message 7 of 50
>As someone who's making a basic gemini client, having the whitespace
>makes it alot simpler, you can just split the line on the space and do
>a `switch` on the first part.
>Not having a space means you'd have to test if the line starts with
>different things, which would be very annoying and slower in most
>cases.
Doesn't the spec say that line type indicators are only three characters
maximum? It also implies that line type indicators should be the first
thing on the line and that nothing should come before them (i.e. no
whitespace before the indicator).
That should mean that simply taking a three character substring of the
line should be enough to determine whether it has a line type indicator
and, if so, which type. That should be relatively easy and quick to parse
as there's only about 5-6 different cases to handle.
Link to individual message.
8. Omar Polo (op (a) omarpolo.com)
- π
Sent: 2021-10-11 14:13
- π§ Message 8 of 50
Stephane Bortzmeyer <stephane@sources.org> writes:
> On Mon, Oct 11, 2021 at 08:57:59AM +0200,
> Omar Polo <op@omarpolo.com> wrote
> a message of 87 lines which said:
>
>> 3. close_notify
>>
>> Is it still a problem? :D
>
> Yes :-(
>
>> (Sometimes I've left dangling questions like this hoping for Bortzmeyer
>> to chime in and share some stats. In the past it worked, hope he share
>> some this time too ;-)
>
> "50.4Β % of URLs do NOT send a proper TLS shutdown (application
> close). Even 36.8Β % of those who return status 20 are in that case."
It's worst than what I thought! We know what software this servers are
using?
Thanks for chiming in and also for sharing the excerpt about
close_notify :)
> The future RFC on HTTP (completely rewritten and reorganised) has a
> nice explanation:
>
> 9.8. TLS Connection Closure
>
> TLS uses an exchange of closure alerts prior to (non-error)
> connection closure to provide secure connection closure; see
> Section 6.1 of [TLS13]. When a valid closure alert is received, an
> implementation can be assured that no further data will be received
> on that connection.
>
> When an implementation knows that it has sent or received all the
> message data that it cares about, typically by detecting HTTP message
> boundaries, it might generate an "incomplete close" by sending a
> closure alert and then closing the connection without waiting to
> receive the corresponding closure alert from its peer.
>
> An incomplete close does not call into question the security of the
> data already received, but it could indicate that subsequent data
> might have been truncated. As TLS is not directly aware of HTTP
> message framing, it is necessary to examine the HTTP data itself to
> determine whether messages were complete. Handling of incomplete
> messages is defined in Section 8.
>
> When encountering an incomplete close, a client SHOULD treat as
> completed all requests for which it has received as much data as
> specified in the Content-Length header or, when a Transfer-Encoding
> of chunked is used, for which the terminal zero-length chunk has been
> received. A response that has neither chunked transfer coding nor
> Content-Length is complete only if a valid closure alert has been
> received. Treating an incomplete message as complete could expose
> implementations to attack.
>
> A client detecting an incomplete close SHOULD recover gracefully.
>
> Clients MUST send a closure alert before closing the connection.
> Clients that do not expect to receive any more data MAY choose not to
> wait for the server's closure alert and simply close the connection,
> thus generating an incomplete close on the server side.
>
> Servers SHOULD be prepared to receive an incomplete close from the
> client, since the client can often determine when the end of server
> data is.
>
> Servers MUST attempt to initiate an exchange of closure alerts with
> the client before closing the connection. Servers MAY close the
> connection after sending the closure alert, thus generating an
> incomplete close on the client side.
>
> And also:
>
> 11.3. Message Integrity
> ...
> Care is needed however to ensure that connection closure
> cannot be used to truncate messages (see Section 9.8). User agents
> might refuse to accept incomplete messages or treat them specially.
> For example, a browser being used to view medical history or drug
> interaction information needs to indicate to the user when such
> information is detected by the protocol to be incomplete, expired, or
> corrupted during transfer. Such mechanisms might be selectively
> enabled via user agent extensions or the presence of message
> integrity metadata in a response.
>
Link to individual message.
9. Alan Bunbury (gemini (a) bunburya.eu)
- π
Sent: 2021-10-11 14:44
- π§ Message 9 of 50
On 11/10/2021 13:51, Oliver Simmons wrote:
> Clients can do what the hell they like IMO, as long as things that
> transmit over the net obey the spec.
> So gemtext is pretty unlimited, but making protocol requests is
> strictly limited.
> Something like replacing `---` is entirely a client-side thing and
> affects no one but the reader.
The current spec states:
Text lines should be presented to the user, after being wrapped to the
appropriate width for the client's viewport (see below). Text lines
may be
presented to the user in a visually pleasing manner for general
reading, the
precise meaning of which is at the client's discretion. For example,
variable width fonts may be used, spacing may be normalised, with spaces
between sentences being made wider than spacing between words, and other
such typographical niceties may be applied. Clients may permit users to
customise the appearance of text lines by altering the font, font
size, text
and background colour, etc. Authors should not expect to exercise any
control over the precise rendering of their text lines, only of
their actual
textual content.
This gives clients a broad discretion as to what visual modifications they
make to text lines by altering font size, colours, spacing, etc. It
doesn't appear to go as far as permitting clients to amend or replace the
actual text that appears on a text line, and appears to suggest that
authors should expect to exercise control over the precise rendering of
their "actual textual content". (At least, my interpretation of the second
last sentence is that clients may allow users to customise appearance of
text lines by altering text colour, not text itself, though I appreciate
it's slightly ambiguous.)
The problem I have with separators and similar visual niceties is that
they involve deleting or replacing text that was put there by the author.
What if an author didn't want to put a separator there, but really wanted
to put "---"? Unless the spec provides that "---" means a separator it is
not reasonable to expect authors to know that.
In truth I'm not sure in what circumstances a "---" text line would be
intended as something other than a separator, but I'm sure other authors
are more imaginative than I am. To take another example, I have regularly
encountered situations where a single * in a markdown document is
incorrectly interpreted as marking the beginning of italicised text, so
the rest of the document is italicised inappropriately. I'd like for that
not to become commonplace in Geminispace.
Separately, on the whitespace issue, I do think it would be helpful to
clarify in the spec whether whitespace is mandatory, particularly for
headers. For example, should the line "#### Hello" be interpreted as (i) a
level 3 header whose text is "# Hello", or (ii) a text line whose text is
"#### Hello"? AFAIK that is ambiguous unless there is a clear stance on
mandatory whitespace in the spec.
Link to individual message.
10. Plain Text (text (a) sdfeu.org)
- Subject Changed! New Subject: Re: [spec] [whitespace]
- π
Sent: 2021-10-11 15:15
- π§ Message 10 of 50
On Mon, 11 Oct 2021 15:44:33 +0100, Alan Bunbury wrote:
> For example, should the line "#### Hello" be interpreted as (i)
> a level 3 header whose text is "# Hello", or (ii) a text line whose text
> is "#### Hello"? AFAIK that is ambiguous unless there is a clear stance
> on mandatory whitespace in the spec.
Considering 5.3 of the current spec, "#### Hello" is to be interpreted as
(i), i. e. as a level-three-header with content "# Hello", I guess:
> It is possible to unambiguously determine a line's type purely by
inspecting its first three characters.
https://gemini.circumlunar.space/docs/specification.gmi
Link to individual message.
11. Robert "khuxkm" Miles (khuxkm (a) tilde.team)
- Subject Changed! New Subject: Re: [spec] comments on the proposed gemini spec revisions
- π
Sent: 2021-10-12 20:27
- π§ Message 11 of 50
October 11, 2021 10:44 AM, "Alan Bunbury" <gemini@bunburya.eu> wrote:
> In truth I'm not sure in what circumstances a "---" text line would be
intended as something other
> than a separator, but I'm sure other authors are more imaginative than I
am. To take another
> example, I have regularly encountered situations where a single * in a
markdown document is
> incorrectly interpreted as marking the beginning of italicised text, so
the rest of the document is
> italicised inappropriately. I'd like for that not to become commonplace in Geminispace.
I fail to see how replacing a line that has only `---` on it with a
graphical separator is anything like the runaway italics thing you
mentioned. Still, I can kind of see where you're going with that.
> Separately, on the whitespace issue, I do think it would be helpful to
clarify in the spec whether
> whitespace is mandatory, particularly for headers. For example, should
the line "#### Hello" be
> interpreted as (i) a level 3 header whose text is "# Hello", or (ii) a
text line whose text is
> "#### Hello"? AFAIK that is ambiguous unless there is a clear stance on
mandatory whitespace in the
> spec.
That is not ambiguous, with or without mandatory whitespace. As Plain Text
pointed out, the max amount of characters used to determine the linetype
is the first 3, per 5.3 in the gemtext spec (awkwardly numbered because it
was originally part of the protocol spec):
> It is possible to unambiguously determine a line's type purely by
inspecting its first three characters.
Therefore, any (good) client will see that the first 3 characters of the
line are "###" and correctly call it what it is: a level 3 header with the
text "# Hello". I fail to see how that would be ambiguous (I guess the
spec doesn't do *that* good of a job explaining it, but I would think you
could catch on by the fact that the section on header lines only gives
examples of #, ##, and ###).
Just my two cents,
Robert "khuxkm" Miles
Link to individual message.
12. Oliver Simmons (oliversimmo (a) gmail.com)
- π
Sent: 2021-10-13 18:54
- π§ Message 12 of 50
On Mon, 11 Oct 2021 at 15:12, Omar Polo <op@omarpolo.com> wrote:
> I've seen this argument in the gitlab issue too [β¦]
I haven't checked the issue yet, will do after sending this.
> In what language(s) splitting a string is faster than
> checking for a prefix? Splitting requires the allocation of multiple
> objects, while the prefix only requires a scan of the first few bytes.
I said simpler, not faster. What you said is true in some cases, but
not everyone is striving for optimisation speed-wise.
It'll depend on the language used, but splitting allows you to use a
simple equality switch statement, which isn't possible by checking
with a prefix.
The way I understand your message, I would have to use an else-if
list, which is hardly ideal.
e.g. in C#:
```
// If it's <3 chars then just treat it as a text line (the default).
switch ((line.Length < 3) ? "" : line.Substring(0, 3).Split(" ", 2)[0])
{
case "=>": β¦
case "* ": β¦
β¦ and so on β¦
default: β¦
}
```
vs
```
if (line.StartsWith("=> ") {
β¦
} else if (line.StartsWith("* ") {
β¦
} β¦ and so on β¦
else { β¦ }
```
At the least, it should be required for link (as you said) and list
lines ("* "). I've seen where people have tried to use *emphasis* at
the start of a line and got a bullet point by mistake.
> > I see no downside to enforcing it in the spec (a SHOULD or MUST).
>
> My argument is kind the opposite: if there isn't a (strong) reason for
> requiring something, then that something MUST be optional. Whitespaces
> are required in the link line to separate unambiguously the link from
> the label, the other whitespaces in the "special" lines don't serve this
> purpose so they need to be completely optional.
At the very least it should be recommended by the spec IMO.
-Oliver Simmons (GoodClover)
Link to individual message.
13. Oliver Simmons (oliversimmo (a) gmail.com)
- π
Sent: 2021-10-13 19:02
- π§ Message 13 of 50
On Mon, 11 Oct 2021 at 15:07, Chris McGowan <cmcgowan9990@gmail.com> wrote:
> Doesn't the spec say that line type indicators are only three characters maximum?
> It also implies that line type indicators should be the first thing on
the line and that nothing should come before them (i.e. no whitespace
before the indicator).
Yes and yup, we're talking about the space after the indicator.
> That should mean that simply taking a three character substring of the
line should be enough to determine whether it has a line type indicator
and, if so, which type. That should be relatively easy and quick to parse
as there's only about 5-6 different cases to handle.
Unfortunately, no. For example, take this line: `# Foo bar I'm a level-1 title`
A 3-char substring of that would yield "# F", which isn't useful.
It would work if the spec required (MUST) you to add whitespace
padding the indicator to three characters, but that's not how it is.
To determine the line-type you have to do a starts-with check or split
on the space like me and Omar are saying.
-Oliver Simmons (GoodClover)
Link to individual message.
14. Chris McGowan (cmcgowan9990 (a) gmail.com)
- π
Sent: 2021-10-13 20:38
- π§ Message 14 of 50
> Unfortunately, no. For example, take this line: `# Foo bar I'm a level-1 title`
> A 3-char substring of that would yield "# F", which isn't useful.
In what way isn't it useful? It tells you literally everything you need to
know. An example (in Perl):
```
my $first3 = substr $line, 0, 3;
# slightly magical regex, /g will return an array of matches,
# assigning back to a scalar gives us a count of matches
if ( my $level = $first3 =~ m/(#)+/g )
{
return "Level $level header";
}
elsif ( $first3 =~ m/=>/ )
{
return "Link"
}
elsif( $first3 =~ m/```/ )
{
return "preformatted";
}
elsif( $first3 =~ m/\*/ )
{
return "list item";
}
elsif ( $first3 =~ m/>/ )
{
return 'Blockquote';
}
else
{
return "Text";
}
```
That's a simplified, very naive gemtext parser I wrote in my email client
in about 3 minutes. It took longer to remember all of the list types than
it did to write the code for them. In fact, the substring isn't even
necessary in this code as I could anchor the regex at the start of the line like so:
```
if ( $line =~ m/^\*/ )
{
return "list item";
}
```
but that's largely true for languages which have decent regex support. If
you weren't using one of those (i.e. C) or are for some reason allergic to
regexes you could simply index the string to determine the line type
(note: this would likely improve speed, but probably only a imperceptibly
small amount and likely wouldn't be worth it.)
Just to really drive home the point that this isn't a difficult task,
here's the version I wouldn't write unless I was using C (still in Perl though):
```
# Note: split here is because perl doesn't allow direct subscripting of
# strings. In languages that do allow that, this other array is
# unnecessary and you could use $line directly.
my @first3 = split( "", substr( $line, 0, 3));
if ( $first3[0] eq '#' )
{
if ( $first3[1] eq '#' )
{
if ( $first3[2] eq '#' )
{
return "Level 3 header";
}
return "Level 2 header";
}
return "Level 1 header";
}
elsif ( $first3[0] eq '=' && $first3[1] eq '>' )
{
return "link";
}
elsif ( $first3[0] eq '*' )
{
return 'List Item';
}
elsif ($first3[0] eq '>' )
{
return "Blockquote";
}
elsif( $first3[0] eq '`' && $first3[1] eq '`' && $first3[2] eq '`' )
{
return "preformatted";
}
else
{
return "Text";
}
```
It's a bit more annoying to write, sure but it's still really simple.
That's ~33 lines of code (mostly because of the Allman style braces,
honestly.) It only took me 5 minutes to write.
In summary, I hardly think it's impossible or even difficult to
unambiguously parse gemtext without having a mandatory space.
Link to individual message.
15. Omar Polo (op (a) omarpolo.com)
- π
Sent: 2021-10-13 20:56
- π§ Message 15 of 50
Oliver Simmons <oliversimmo@gmail.com> writes:
> On Mon, 11 Oct 2021 at 15:12, Omar Polo <op@omarpolo.com> wrote:
>> I've seen this argument in the gitlab issue too [β¦]
>
> I haven't checked the issue yet, will do after sending this.
>
>> In what language(s) splitting a string is faster than
>> checking for a prefix? Splitting requires the allocation of multiple
>> objects, while the prefix only requires a scan of the first few bytes.
>
> I said simpler, not faster. What you said is true in some cases, but
> not everyone is striving for optimisation speed-wise.
You didn't said "faster", true, but said that (emphasis mine)
> Not having a space means you'd have to test if the line starts with
> different things, which would be very annoying and **slower** in most
> cases.
I was contradicting that.
> It'll depend on the language used, but splitting allows you to use a
> simple equality switch statement, which isn't possible by checking
> with a prefix.
(btw, checking for equality inside a switch statement doesn't work for
strings in languages like C or Java. Err... yes, it works, but it's not
same the equality you mean ;-)
> The way I understand your message, I would have to use an else-if
> list, which is hardly ideal.
This depends on the language design. Some languages allows expression
inside switches, like Go IIRC, so you could write
switch {
case strings.HasPrefix(line, "*"):
// ...
case strings.HasPrefix(line, "###"):
// ...
...
}
other allows to do more elaborate things (clojure for example)
(defn has-prefix? [prefix str]
(str/starts-with? str prefix))
(condp has-prefix? line
"*" :item
"=>" :link
"###" :header-3
,,,)
Even when we take into account an ancient language like C, you could
take advantage that the first byte of a line is enough to get an idea of
its type and greatly reduce the number of chained ifs:
(this is more or less what I have in telescope)
switch (*line) {
case '*': return LINE_ITEM;
case '>': return LINE_QUOTE;
case '=':
if (line[1] == '>')
return LINE_LINK;
break;
case '#':
/* some ifs to check whether is a level 1, 2 or 3 */
...
case '`':
/* check for a ``` marker */
...
}
return LINE_TEXT;
I don't think taking into account the particularities of one specific
programming language is a wise choice for a markup language meant to be
written by humans for humans.
The question should thus become: is it intuitive for a random user that
#hello world
and
# hello world
are effectively the same line?
Let's forget the code when tackling these issues, we think better when
we're not in front of a keyboard.
> [...]
>
> At the least, it should be required for link (as you said) and list
> lines ("* ").
Probably I was too ambiguous. My point was that in a link line a space
in necessary between the link and the label, not after the marker. So,
outside of the mandatory space to separate a link and its label,
whitespaces are irrelevant.
> I've seen where people have tried to use *emphasis* at
> the start of a line and got a bullet point by mistake.
I've seen people writing like that, and a conforming client (IMHO)
should consider those lines items.
It's like using => something like this <= to highlight text and then
complaining that a client mis-render a line because the author tried to
"highlight" the first words and now it's a link.
Who cares? Gemini doesn't have inline formatting, so why bother trying
to support it?
(I've used some *emphasis* on some pages too, but more I write and more
I think I shouldn't, it's easier to read without too much noise. That's
my opinion, at least.)
Anyway, whatever the final decision will be, I hope we could at least
ensure that all the clients are consistent in their rendering.
>> > I see no downside to enforcing it in the spec (a SHOULD or MUST).
>>
>> My argument is kind the opposite: if there isn't a (strong) reason for
>> requiring something, then that something MUST be optional. Whitespaces
>> are required in the link line to separate unambiguously the link from
>> the label, the other whitespaces in the "special" lines don't serve this
>> purpose so they need to be completely optional.
>
> At the very least it should be recommended by the spec IMO.
>
> -Oliver Simmons (GoodClover)
Link to individual message.
16. Plain Text (text (a) sdfeu.org)
- Subject Changed! New Subject: Re: [spec] [whitespace]
- π
Sent: 2021-10-13 21:12
- π§ Message 16 of 50
On Mon, 11 Oct 2021 15:15:44 +0000, Plain Text wrote:
> https://gemini.circumlunar.space/docs/specification.gmi
My try on identifying line types using Python re named groups
what became a quite unreadable line, also missing ```, sorry.
line.py
import re, sys
for line in sys.stdin:
m =
re.match(r'((?P<heads>(?P<h3>###)|(?P<h2>##)|(?P<h1>#))|(?P<list>\*
)|(?P<link>=> (?P<url>[^\s]+))|(?P<quote>>))\s*(?P<content>.*)