Scheme Section 2 quibble

📧 Messages: 31
🗣️ Authors: 14
📅 First Message: 2020-11-16 23:39
📅 Last Message: 2020-11-19 19:37

acdw <acdw (a) acdw.net>

📅 Sent: 2020-11-16 23:39
📧 Message 1 of 31

Hi gemilist (listini?),

I've got a minor quibble with the spec, section 2, paragraph ... 3(?), 
which I'll quote here.

> <URL> is a UTF-8 encoded absolute URL, of maximum length 1024 bytes. If 
the scheme of the URL is not specified, a scheme of gemini:// is implied.

Specifically, the "scheme of gemini:// is implied" clause is confusing.  
According to the URL spec (https://tools.ietf.org/html/rfc3986), 

> The authority component is preceded by a double slash ("//") and is 
terminated by the next slash ("/"), question mark ("?"), or number sign 
("#") character, or by the end of the URI.

Meaning that the scheme does not, in fact, include a "//" at the end, but 
rather that "//" is a separator between the scheme and the authority.  In 
fact, to actually encode a scheme-agnostic URL in a link, an author needs 
to write "//example.com/path".  For an example, see the links in flounder.online.

I bring this issue up because there have been instances of geminauts linking like this:

=> example.com/path An example link

Which resolves, not to gemini://example.com/path, but to 
./example.com/path on the current server.

To resolve this confusion, I propose is to either

(a) strip the "//" (and probably ":", though I found no particular 
reference to it in the spec) from the "scheme of gemini:// is implied" 
portion of the above paragraph, or

(b) remove the scheme bit altogether.  I personally prefer this because 
it's maximally precise.

I'd love to hear your thoughts on the matter.

-- 
~ acdw
acdw.net | breadpunk.club/~breadw

Link to individual message.

colecmac@protonmail.com <colecmac (a) protonmail.com>

📅 Sent: 2020-11-16 23:51
📧 Message 2 of 31

I think you're confusing what that section is talking about. I believe it is
referring to sending a URL for a request only. It's saying, "when you make a
request, you can leave the gemini:// part out". I don't think it speaks to
links in documents at all, which are governed by the URL RFC.

I agree it's confusing however, because of the use of the word "scheme", while also
including the colon and slashes. I think it's totally fine that those characters
can be left off in the request, but this line should be more clear. How
about saying:

> If the URL does not begin with `gemini://`, then that prefix is implied.
> Leaving off just the `gemini:` portion and starting with `//` also implies
> the gemini scheme, in accordance with the URL spec.

That might be too wordy, and perhaps requiring that all request URLs have a //
would be better. But I don't want to break backwards compatibility.


makeworld

Link to individual message.

Ali Fardan <raiz (a) stellarbound.space>

📅 Sent: 2020-11-17 01:47
📧 Message 3 of 31

On Mon, 16 Nov 2020 23:39:19 +0000
acdw <acdw at acdw.net> wrote:
> > The authority component is preceded by a double slash ("//") and is
> > terminated by the next slash ("/"), question mark ("?"), or number
> > sign ("#") character, or by the end of the URI.  
> 
> Meaning that the scheme does not, in fact, include a "//" at the end,
> but rather that "//" is a separator between the scheme and the
> authority.  In fact, to actually encode a scheme-agnostic URL in a
> link, an author needs to write "//example.com/path".  For an example,
> see the links in flounder.online.
> 
> I bring this issue up because there have been instances of geminauts
> linking like this:
> 
> => example.com/path An example link  
> 
> Which resolves, not to gemini://example.com/path, but
> to ./example.com/path on the current server.

This is wrong, even by web standards, when referencing to a different
host, one must explicitly write a valid URL, you DON'T see:

>	<a href="example.tld/index.html"></a>

> To resolve this confusion, I propose is to either
> 
> (a) strip the "//" (and probably ":", though I found no particular
> reference to it in the spec) from the "scheme of gemini:// is
> implied" portion of the above paragraph, or

In my humble opinion, I think that "//example.tld/" is an
implementation specific hack and has no place in the protocol, a URI
like that is invalid and should not be respected by servers, what
should actually work is providing authority and path like so:
"example.tld/path/", this is discouraged by RFC 3986 (section 4.5) but
it actually makes sense if context is defined, in this case, context is
gemini so a scheme of gemini is implied.

Also, this is the default behavior for web browsers implying scheme of
http(s), which I think is acceptable and convenient behavior, so I
agree with you on that, assuming that's what you meant.

> (b) remove the scheme bit altogether.  I personally prefer this
> because it's maximally precise.

The scheme bit in requests allows for proxies to work, for example,
when I host a proxy instance at "gemini://raiz.proxy/" someone sends a
request of "https://example.tld/", my proxy can fetch the page and send
it back to the client through gemini, I think that's why it's there.

Perhaps there are many many other use cases for this that I haven't
thought of.

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-17 02:07
📧 Message 4 of 31

It was thus said that the Great acdw once stated:
> I've got a minor quibble with the spec, section 2, paragraph ... 3(?),
> which I'll quote here.
> 
> > <URL> is a UTF-8 encoded absolute URL, of maximum length 1024 bytes. If
> > the scheme of the URL is not specified, a scheme of gemini:// is
> > implied.

  [ snip ]

> To resolve this confusion, I propose is to either
> 
> (a) strip the "//" (and probably ":", though I found no particular
> reference to it in the spec) from the "scheme of gemini:// is implied"
> portion of the above paragraph, or
> 
> (b) remove the scheme bit altogether.  I personally prefer this because
> it's maximally precise.
> 
> I'd love to hear your thoughts on the matter.

  This has come up before [1][2], and as I have stated [3][4], the '//' is
considered part of the host (or at least, a marker for the host portion of a
URL) and thus, I think the wording of section 2 should be changed to read

	<URL> is a UTF-8 encoded absolute URL, of maximum length 1024 bytes.
	If the scheme of the URL is not specified, a scheme of gemini: is
	implied.

  -spc

[1]	https://lists.orbitalfox.eu/archives/gemini/2020/001006.html

[2]	https://lists.orbitalfox.eu/archives/gemini/2020/002954.html

[3]	https://lists.orbitalfox.eu/archives/gemini/2020/001009.html

[4]	https://lists.orbitalfox.eu/archives/gemini/2020/002964.html

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-17 02:19
📧 Message 5 of 31

It was thus said that the Great Ali Fardan once stated:
> On Mon, 16 Nov 2020 23:39:19 +0000
> acdw <acdw at acdw.net> wrote:
> > > The authority component is preceded by a double slash ("//") and is
> > > terminated by the next slash ("/"), question mark ("?"), or number
> > > sign ("#") character, or by the end of the URI.  
> > 
> > Meaning that the scheme does not, in fact, include a "//" at the end,
> > but rather that "//" is a separator between the scheme and the
> > authority.  In fact, to actually encode a scheme-agnostic URL in a
> > link, an author needs to write "//example.com/path".  For an example,
> > see the links in flounder.online.
> > 
> > I bring this issue up because there have been instances of geminauts
> > linking like this:
> > 
> > => example.com/path An example link  
> > 
> > Which resolves, not to gemini://example.com/path, but
> > to ./example.com/path on the current server.
> 
> This is wrong, even by web standards, when referencing to a different
> host, one must explicitly write a valid URL, you DON'T see:
> 
> >	<a href="example.tld/index.html"></a>
> 
> > To resolve this confusion, I propose is to either
> > 
> > (a) strip the "//" (and probably ":", though I found no particular
> > reference to it in the spec) from the "scheme of gemini:// is
> > implied" portion of the above paragraph, or
> 
> In my humble opinion, I think that "//example.tld/" is an
> implementation specific hack and has no place in the protocol, a URI
> like that is invalid and should not be respected by servers, 

  It *is* allowed though---it's a schemeless URI and in a given context, it
can be inferred.  Check out RFC-3986 section 5.2.2 (Transforming
Rreferences, aka, resolving a URL with a base URL) and section 5.3
(Component Recomposision) where ':' is appended to the scheme, and '//' is
prefixed to the authority (host) section.

  So, given a URL like this:

	//example.net/path/to/resource

in a resource, if the resource was served up via HTTP, then the scheme is
'http:'; if HTTPS, then 'https:' and if gemini, 'gemini:'.  

  A URL like this:

	example.net/path/to/resource

is, again, per RFC-3986 parsing rules, to be interpreted as a path, not an
authority section then path.  Need I create an example to show this?  I can.

  -spc

Link to individual message.

Ali Fardan <raiz (a) stellarbound.space>

📅 Sent: 2020-11-17 02:28
📧 Message 6 of 31

On Mon, 16 Nov 2020 21:19:16 -0500
Sean Conner <sean at conman.org> wrote:
>   It *is* allowed though---it's a schemeless URI and in a given
> context, it can be inferred.  Check out RFC-3986 section 5.2.2
> (Transforming Rreferences, aka, resolving a URL with a base URL) and
> section 5.3 (Component Recomposision) where ':' is appended to the
> scheme, and '//' is prefixed to the authority (host) section.
> 
>   So, given a URL like this:
> 
> 	//example.net/path/to/resource
> 
> in a resource, if the resource was served up via HTTP, then the
> scheme is 'http:'; if HTTPS, then 'https:' and if gemini, 'gemini:'.  
> 
>   A URL like this:
> 
> 	example.net/path/to/resource
> 
> is, again, per RFC-3986 parsing rules, to be interpreted as a path,
> not an authority section then path.  Need I create an example to show
> this?  I can.

You are correct.

Link to individual message.

Felix Queißner <felix (a) masterq32.de>

📅 Sent: 2020-11-17 08:41
📧 Message 7 of 31

Heya!

>   It *is* allowed though---it's a schemeless URI and in a given context, it
> can be inferred.  Check out RFC-3986 section 5.2.2 (Transforming
> Rreferences, aka, resolving a URL with a base URL) and section 5.3
> (Component Recomposision) where ':' is appended to the scheme, and '//' is
> prefixed to the authority (host) section.
> 
>   So, given a URL like this:
> 
> 	//example.net/path/to/resource
> 
> in a resource, if the resource was served up via HTTP, then the scheme is
> 'http:'; if HTTPS, then 'https:' and if gemini, 'gemini:'.  

I'm using this on gemini sites that are also hosted in web space. This
allows cross-server linking without changing protocol, it's very convenient.

>   A URL like this:
>
> 	example.net/path/to/resource
>
> is, again, per RFC-3986 parsing rules, to be interpreted as a path, not an
> authority section then path.  Need I create an example to show this?
I can.

Exactly.

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-17 09:19
📧 Message 8 of 31

On Tue, 17 Nov 2020 04:47:45 +0300
Ali Fardan <raiz at stellarbound.space> wrote:

> In my humble opinion, I think that "//example.tld/" is an
> implementation specific hack and has no place in the protocol, a URI
> like that is invalid and should not be respected by servers, what
> should actually work is providing authority and path like so:
> "example.tld/path/", this is discouraged by RFC 3986 (section 4.5) but
> it actually makes sense if context is defined, in this case, context is
> gemini so a scheme of gemini is implied.

With respect to RFC3986, it's not a matter of opinion.

It's very much not an implementation specific hack. It's defined in RFC
3986 as "relative-ref", a "network-path reference" specifically.
Non-URIs of the "example.com/hello" style on the other hand are an
implementation specific hack, as you've noted, discouraged by RFC 3986
and not specified in any of the syntaxes it defines. It's obviously
unsuitable for links because it's ambiguous with relative-ref.

Gemini however explicitly only allows "absolute URL" in requests. It
also says that "If the scheme of the URL is not specified, a scheme of
gemini:// is implied."

In terms of RFC 3986, this is nonsense. "gemini://" isn't the scheme.
"gemini" is the scheme, "//" is the beginning of hier-part or
relative-part, and ":" separates the scheme from hier-part.

I've previously called for clarification on this point. One might read
that last sentence as requests by suffix references are allowed,
(which is what you get when you omit "gemini://") or that some
relative-ref are allowed (which is what you get if you literally omit
the scheme and scheme separator).

I'd prefer if the spec could just refer to an expected syntax as
defined in RFC 3986. This would reduce confusion significantly. Skip
all the hacks and allow only e.g. the URI syntax (which does not
include relative-ref) for requests and URI-reference syntax (which
includes URI and relative-ref) for links. Adopt the language of RFC
3986 to describe them.

Last I checked, if you connect to gemini://gemini.circumlunar.space and
request "gemini.circumlunar.space/" you get an error. You may however
request "//gemini.circumlunar.space/" and get the appropriate 20
response. Should gemini.circumlunar.space be considered to be running a
canonical implementation of Gemini?

-- 
Philip

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-17 09:37
📧 Message 9 of 31

On Mon, 16 Nov 2020 21:07:54 -0500
Sean Conner <sean at conman.org> wrote:

>   This has come up before [1][2], and as I have stated [3][4], the '//' is
> considered part of the host (or at least, a marker for the host portion of a
> URL) and thus, I think the wording of section 2 should be changed to read
> 
> 	<URL> is a UTF-8 encoded absolute URL, of maximum length 1024 bytes.
> 	If the scheme of the URL is not specified, a scheme of gemini: is
> 	implied.
> 

"gemini:" is not a valid scheme. ":" is part of the URI and
absolute-URI syntaxes defined in RFC 3986, not the scheme. The spec
should be able to express any sensible acceptable URI syntax in terms of
the syntaxes and terminology defined in RFC 3986. There's no need to add
weird exceptions outside RFC 3986 that aren't already covered in it.

For example, the spec can read: "<URL> is an UTF-8 encoded URI or
network-path reference as defined in RFC 3986" (requests) and "<URL> is
an UTF-8 encoded URI-reference as defined in RFC 3986" (links).

If we want requests like "gemini.circumlunar.space/" to be valid, it
can additionally read that <URL> allows suffix references.

To call it an "absolute URL" is especially concerning since Gemini
apparently allows fragments, but RFC 3986 defines an "absolute-URI"
syntax which does not.

-- 
Philip

Link to individual message.

Ali Fardan <raiz (a) stellarbound.space>

📅 Sent: 2020-11-17 14:42
📧 Message 10 of 31

On Tue, 17 Nov 2020 10:19:52 +0100
Philip Linde <linde.philip at gmail.com> wrote:
> With respect to RFC3986, it's not a matter of opinion.
> 
> It's very much not an implementation specific hack. It's defined in
> RFC 3986 as "relative-ref", a "network-path reference" specifically.
> Non-URIs of the "example.com/hello" style on the other hand are an
> implementation specific hack, as you've noted, discouraged by RFC 3986
> and not specified in any of the syntaxes it defines. It's obviously
> unsuitable for links because it's ambiguous with relative-ref.

I don't know about that, section 3.2 states that authority should be
preceded by a "//", not that it is a part of the authority component,
also, the ABNF representation has no "//" in it.

Suffix references (section 4.5) are only discouraged because of
possible misinterpretation, however in the case of Gemini requests,
people can write their code to handle them just like they write their
code to handle "//example.tld", it's not that hard and looks much much
cleaner, the argument that it could be interpreted as path should also
apply for "//example.tld" too, because it could be interpreted as a
path too, however if the author decided to handle such case, it'll be
handled just fine, you can have your parser treat the text before the
first occurrence of '/' as host subcomponent of authority component if
scheme is not specified just like you have your parser treat the first
occurrence of '/' after the "//" prefix as host subcomponent in the
current way of handling schemeless requests in Gemini, the Gemini
protocol requires passing full URL in requests, therefore, such should
not be interpreted as path because Gemini requests don't allow path
without stating host.

So yeah, I'm not changing my mind, "//example.tld" is a hack because
that is not a valid URI and "//" is supposed to be only present when
scheme is specified, however, "example.tld" is while discouraged,
acceptable for this use case and the RFC even acknowledged it.

Let me quote to you why it is that RFC 3986 discourages its use:

> Although this practice of using suffix references is common, it
> should be avoided whenever possible and should never be used in
> situations where long-term references are expected.

In the case of Gemini requests, they are not a 'long-term' reference,
they're one-time requests, I don't see any downside to not doing it.

> Last I checked, if you connect to gemini://gemini.circumlunar.space
> and request "gemini.circumlunar.space/" you get an error. You may
> however request "//gemini.circumlunar.space/" and get the appropriate
> 20 response. Should gemini.circumlunar.space be considered to be
> running a canonical implementation of Gemini?

You shouldn't look at any particular implementation as a reference for
the spec, I'm assuming gemini.circumlunar.space is running molly-brown,
do you know that molly-brown treats single '\n' as valid request
terminators instead of explicit '\r\n'? (see:
https://tildegit.org/solderpunk/molly-brown/src/branch/master/handler.go#L138),
do you know that if a transaction is finished, molly-brown waits for
the client to close the connection instead of closing it from the
server side, is that spec compliant?

The reason I think molly-brown accepted "//example.tld" in the first
place is because the Go standard library URL parser implementation
accepted this, I don't know if this was a bug or it is intended design,
but that's what it is, other URI parsers that are more strict with
compliance to the RFC will refuse to parse a URI without scheme
present, here is an excerpt from the library's documentation that might
give an idea of how they treat URLs:

> A URL represents a parsed URL (technically, a URI reference).
>
> The general form represented is:
>
> [scheme:][//[userinfo@]host][/]path[?query][#fragment]
>
> URLs that do not start with a slash after the scheme are
> interpreted as:
>
> scheme:opaque[?query][#fragment]

Notice that [scheme:] is enclosed in brackets implying that it is
optional, while [//host] is optional too, the "//" is considered a part
of the authority component by the Go URL parser implementation, this is
why "//example.tld" is accepted while "example.tld" is not, try passing
both strings to url.Parse() and see what you get.

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-17 22:10
📧 Message 11 of 31

It was thus said that the Great Ali Fardan once stated:
> On Tue, 17 Nov 2020 10:19:52 +0100
> Philip Linde <linde.philip at gmail.com> wrote:
> > With respect to RFC3986, it's not a matter of opinion.
> > 
> > It's very much not an implementation specific hack. It's defined in
> > RFC 3986 as "relative-ref", a "network-path reference" specifically.
> > Non-URIs of the "example.com/hello" style on the other hand are an
> > implementation specific hack, as you've noted, discouraged by RFC 3986
> > and not specified in any of the syntaxes it defines. It's obviously
> > unsuitable for links because it's ambiguous with relative-ref.
> 
> I don't know about that, section 3.2 states that authority should be
> preceded by a "//", not that it is a part of the authority component,
> also, the ABNF representation has no "//" in it.
> 
> Suffix references (section 4.5) are only discouraged because of
> possible misinterpretation, however in the case of Gemini requests,
> people can write their code to handle them just like they write their
> code to handle "//example.tld", it's not that hard and looks much much
> cleaner, the argument that it could be interpreted as path should also
> apply for "//example.tld" too, because it could be interpreted as a
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> path too, however if the author decided to handle such case, it'll be
  ^^^^^^^^^
  Citation needed.

  I'm sorry, this just isn't the case.  From the full ABNF in Appendix A:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute 
                 / path-rootless 
                 / path-empty

   URI-reference = URI / relative-ref
   
   absolute-URI  = scheme ":" hier-part [ "?" query ]

   relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

   relative-part = "//" authority path-abempty
                 / path-absolute
                 / path-noscheme
                 / path-empty 

[ NON-PATH RELATED RULES OMITTED FOR SPACE I REPEAT NON-PATH RELATED RULES 
OMITTED FOR SPACE ]

   path          = path-abempty    ; begins with "/" or is empty
                 / path-absolute   ; begins with "/" but not "//"
                 / path-noscheme   ; begins with a non-colon segment
                 / path-rootless   ; begins with a segment
                 / path-empty      ; zero characters

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

  The path parsing rules state a single slash.  Not '/'+, nor '/'*, but a
single '/'.  The only place where more than a single slash is allowed PER
THE @#%@#$@$ ABNF is just prior to the authority, which contains the
hostname.  THE ONLY PLACE!  

  I will also draw your attention to the URI-reference rule, which is there
for some reason, which allows both a full URI, or a RELATIVE URI, which
means that

		//example.com/path/to/resource

IS A VALID URI!  IT IS NOT A HACK!  What part of the ABNF do you not
understand?

> handled just fine, you can have your parser treat the text before the
> first occurrence of '/' as host subcomponent of authority component if
> scheme is not specified just like you have your parser treat the first
> occurrence of '/' after the "//" prefix as host subcomponent in the
> current way of handling schemeless requests in Gemini, the Gemini
> protocol requires passing full URL in requests, therefore, such should
> not be interpreted as path because Gemini requests don't allow path
> without stating host.

  No, the spec allows both the full URI, and a relative URI as long as it
starts with '//' (it has the authority section).  The wording in the spec is
bad and should be changed to clarify it, but that's the current
specification.  

  Again,

		//example.com/path/to/resource

IS NOT A HACK!

> So yeah, I'm not changing my mind, "//example.tld" is a hack because
> that is not a valid URI and "//" is supposed to be only present when
> scheme is specified, however, "example.tld" is while discouraged,
> acceptable for this use case and the RFC even acknowledged it.
> 
> Let me quote to you why it is that RFC 3986 discourages its use:
> 
> > Although this practice of using suffix references is common, it
> > should be avoided whenever possible and should never be used in
> > situations where long-term references are expected.
> 
> In the case of Gemini requests, they are not a 'long-term' reference,
> they're one-time requests, I don't see any downside to not doing it.
> 
> > Last I checked, if you connect to gemini://gemini.circumlunar.space
> > and request "gemini.circumlunar.space/" you get an error. You may
> > however request "//gemini.circumlunar.space/" and get the appropriate
> > 20 response. Should gemini.circumlunar.space be considered to be
> > running a canonical implementation of Gemini?
> 
> You shouldn't look at any particular implementation as a reference for
> the spec, 

  I believe Philip used gemini.circumlunar.space because that's the server
written by solderpunk, author of the specification.  

> I'm assuming gemini.circumlunar.space is running molly-brown,

  Also written by solderpunk.  The bastard!  Writing a Gemini server that
doesn't follow his specification!

> do you know that molly-brown treats single '\n' as valid request
> terminators instead of explicit '\r\n'? (see:
> https://tildegit.org/solderpunk/molly-brown/src/branch/master/handler.go#L138),
> do you know that if a transaction is finished, molly-brown waits for
> the client to close the connection instead of closing it from the
> server side, is that spec compliant?
> 
> The reason I think molly-brown accepted "//example.tld" in the first
> place is because the Go standard library URL parser implementation
> accepted this, I don't know if this was a bug or it is intended design,

  It's by design---see the ABNF above.  

> but that's what it is, other URI parsers that are more strict with
> compliance to the RFC will refuse to parse a URI without scheme
> present, 

  If it does, it's broken by design.  Again, see the ABNF above.

> here is an excerpt from the library's documentation that might
> give an idea of how they treat URLs:
> 
> > A URL represents a parsed URL (technically, a URI reference).
> >
> > The general form represented is:
> >
> > [scheme:][//[userinfo@]host][/]path[?query][#fragment]
> >
> > URLs that do not start with a slash after the scheme are
> > interpreted as:
> >
> > scheme:opaque[?query][#fragment]
> 
> Notice that [scheme:] is enclosed in brackets implying that it is
> optional, while [//host] is optional too, the "//" is considered a part
> of the authority component by the Go URL parser implementation, this is
> why "//example.tld" is accepted while "example.tld" is not, try passing
> both strings to url.Parse() and see what you get.

  Yes, exactly.  Again, that's per the ABNF above.  Why do you not get this? 
Here, have one more excerpt from RFC-3986, this time from section 3:

   The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

and the URL parsing library I have parses those as:

['foo://example.com:8042/over/there?name=ferret#nose'] =
{
  fragment = "nose",
  query = "name=ferret",
  path = "/over/there",
  scheme = "foo",
  port = 8042.000000,
  host = "example.com",
}

['urn:example:animal:ferret:nose'] =
{
  path = "example:animal:ferret:nose",
  scheme = "urn",
}

and because I like belaboring the inanimate equus pleonastically:

["//example.com/path/to/resource"] =
{
  host = "example.com",
  path = "/path/to/resource",
}

["/example.com/path/to/resource"] =
{
  path = "/example.com/path/to/resource",
}

["example.com/path/to/resource"] =
{
  path = "example.com/path/to/resource",
}

  You should try those with the Go URL parser you use and see what YOU get.

  -spc

Link to individual message.

acdw <acdw (a) acdw.net>

📅 Sent: 2020-11-17 22:28
📧 Message 12 of 31

On 2020-11-17 (Tuesday) at 22:10, Sean Conner <sean at conman.org> wrote:

> 
> 		//example.com/path/to/resource
> 
> IS A VALID URI!  IT IS NOT A HACK!  What part of the ABNF do you not
> understand?
[snip]
>   Again,
> 
> 		//example.com/path/to/resource
> 
> IS NOT A HACK!
[snip]
>   -spc
>

Hear, hear!  I was only going to list the Regex implementation[1] at the 
end of the RFC as proof that this wasn't a hack, but I appreciate your 
thoroughness in explanation.

This is, in fact, why I brought it up (apparently, again, sorry about 
that) at all -- the current gemini spec is incompatible in this way with 
the URI spec.  Since a goal of gemini is stated as not reinventing the 
wheel (okay, citation needed, but I think it's pretty much the ~feeling~ 
around here), we should stick to the pre-existing spec as much as 
possible.  I liked the suggested solution from spc (the multiple ones, 
they're all fine, in fact!) for the update in the spec.

I sincerely hope that 99% of geminauts are using URLs as we've discussed 
here, and I just want the spec to reflect their correct usage.

[1]: https://tools.ietf.org/html/rfc3986#appendix-B

-- 
~ acdw
acdw.net | breadpunk.club/~breadw

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-11-17 22:45
📧 Message 13 of 31

On Tue, Nov 17, 2020 at 5:10 PM Sean Conner <sean at conman.org> wrote:

  The path parsing rules state a single slash.  Not '/'+, nor '/'*, but a
> single '/'.  The only place where more than a single slash is allowed PER
> THE @#%@#$@$ ABNF is just prior to the authority, which contains the
> hostname.  THE ONLY PLACE!
>

Correct.

> I will also draw your attention to the URI-reference rule, which is there
> for some reason, which allows both a full URI, or a RELATIVE URI, which
> means that
>
>                 //example.com/path/to/resource
>
> IS A VALID URI!  IT IS NOT A HACK!  What part of the ABNF do you not
> understand?
>

Nope.  It is a valid URI reference, because it is a valid relative
reference.  It is *not* a valid URI.

In what follows, I am going to assume that "URL" and "URI" are synonymous,
which they have been for 15 years since RFC 3986 was published.

> No, the spec allows both the full URI, and a relative URI as long as it
> starts with '//' (it has the authority section).  The wording in the spec
> is
> bad and should be changed to clarify it, but that's the current
> specification.
>

There are two cases:

1) In a Gemini-protocol request line (section 2), the second sentence says
that an absolute URL (that is, a URI without a fragment identifier) is
required.  The third sentence says that if the "scheme://" portion is
missing (in which case it is not a URI, much less an absolute URI), it
should be prefixed with "gemini://" and presumably reparsed.  That's
straightforward.

2) In a link line (section 5.4.2), we are told that there may be an
absolute or a relative URL.  There are no relative URIs, so we can only
interpret this as meaning a relative reference.  We are also told that if
the URL lacks a scheme (which is impossible: a URI always has a scheme)
then the scheme is "gemini".

Now suppose a link line in a resource that is available from "gemini://
example.com/public/this.gmi" has the form "foo/bar/baz.gmi".  We can
interpret this in one of two incompatible ways:

2a) a truncated version of "gemini://foo/bar/baz.gmi".  Note that "foo" is
a perfectly valid host name.

2b) a relative reference, in which case it resolves to "gemini://
example.com/public/foo/bar/baz.gmi".

So the spec is self-contradictory.  In my view interpretation 2a is bogus
and the sentence "If the URL does not include a scheme, a scheme of
gemini:// is implied" in section 5.4.2 should be removed.  What is more, I
would like to see the equivalent sentence "If the scheme of the URL is not
specified, a scheme of gemini:// is implied" removed as well.

> but that's what it is, other URI parsers that are more strict with
> > compliance to the RFC will refuse to parse a URI without scheme
> > present,
>
>   If it does, it's broken by design.  Again, see the ABNF above.
>

It is precisely the ABNF line in RFC 3986 section 3 that says a URI (as
opposed to a URI reference) has to begin with a scheme.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
It's the old, old story.  Droid meets droid.  Droid becomes chameleon.
Droid loses chameleon, chameleon becomes blob, droid gets blob back
again.  It's a classic tale.  --Kryten, Red Dwarf

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-11-17 22:49
📧 Message 14 of 31



> On Nov 17, 2020, at 23:10, Sean Conner <sean at conman.org> wrote:
> 
> belaboring the inanimate equus

Ohhh... pig latin, my favorite! Oggingflay away eadday orsehay!

Quidquid latine dictum sit, altum sonatur!

Link to individual message.

Alex // nytpu <alex (a) nytpu.com>

📅 Sent: 2020-11-17 23:07
📧 Message 15 of 31

I'm going to use a real-world example here because people seem to not
get why this may be a problem.

Let's say I want to start hosting the git repo for my utility
gemlog.sh[a] on gemini. I make a directory on my site, so the full url
would be `gemini://nytpu.com/gemlog.sh/`. Now, say I put a link in my
root index.gmi (`/`) linking to `gemlog.sh`[b]. This is a perfectly
valid link to a directory on my server, but this would instead be
interpreted as the url `gemini://gemlog.sh/` if you use the faulty
method of parsing. (`.sh` is a valid TLD[c] so it wouldn't work even if
you have a whitelist of tlds).


Now, there's a few options to prevent this from happening:

1) Ban periods in all file & directory names. You'd also have to ban it
in filenames, because what if I make the relative link to a file called
`command.com`? Requires large, breaking spec changes.

2) Instead of documents being served as-is and having clients parse
urls, instead force servers to rewrite all urls, checking if it is a
valid directory or not before serving. All clients only expect well-
formed, full urls, and all existing server implementations are in
violation. Requires large, breaking spec changes.

3) Require that links to directories must not be relative if they could
be confused as a uri host. This is an inconsistent, quick fix that is
very ambiguous, because one client may think it's a valid host while
others may not. It also puts the burden on the authors of documents,
because now they have to remember when relative links are allowed and
when they aren't, and test their documents on a variety of clients to
ensure that it is compatible with all their parsing methods. Requires
large, breaking spec changes.

4) Follow the carefully and clearly defined specification[d] that is
over 15 years old and is well-adopted by existing uri parsing libraries.
Requires minimal, non-breaking spec changes, purely for clarity.


I know which one I'd choose. Obviously option 1 is the only real option
here, the outlandish ones like option 4 just make no sense.

[a]: https://tildegit.org/nytpu/gemlog.sh
[b]: so the full line would read:
     `=> gemlog.sh a utility for managing gemlogs from the command line`
[c]: https://en.wikipedia.org/wiki/.sh
[d]: https://tools.ietf.org/html/rfc3986

-- 
Alex // nytpu
alex at nytpu.com
GPG Key: https://www.nytpu.com/files/pubkey.asc
Key fingerprint: 43A5 890C EE85 EA1F 8C88 9492 ECCD C07B 337B 8F5B
https://useplaintext.email/

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-17 23:18
📧 Message 16 of 31

On Tue, 17 Nov 2020 17:45:50 -0500
John Cowan <cowan at ccil.org> wrote:

> In what follows, I am going to assume that "URL" and "URI" are synonymous,
> which they have been for 15 years since RFC 3986 was published.

That may not be an entirely uncontroversial assumption. URLs were
AFAIK last defined by the IETF in RFC 1808, where relative URLs were
first specified and the distinction became necessary. In RFC 1808, an
URL is either an absolute URL or a relative URL (analogous to
relative-ref). In that sense, an URL is rather analogous with
URI-reference of RFC 3986.

I completely agree on all other points, and the point above is only
further reason for clarification. What is and isn't an URL is a bit
loosey-goosey throughout, which is why RFC 3986 is welcome.

-- 
Philip

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-11-17 23:23
📧 Message 17 of 31

On Tue, Nov 17, 2020 at 6:07 PM Alex // nytpu <alex at nytpu.com> wrote:

> Let's say I want to start hosting the git repo for my utility
> gemlog.sh[a] on gemini. I make a directory on my site, so the full url
> would be `gemini://nytpu.com/gemlog.sh/` <http://nytpu.com/gemlog.sh/>.
> Now, say I put a link in my
> root index.gmi (`/`) linking to `gemlog.sh`[b]. This is a perfectly
> valid link to a directory on my server, but this would instead be
> interpreted as the url `gemini://gemlog.sh/` <http://gemlog.sh/> if you
> use the faulty
> method of parsing. (`.sh` is a valid TLD[c] so it wouldn't work even if
> you have a whitelist of tlds).
>

In any case, nothing says a hostname has to be absolute.  If your hostname
is "client.example.com" then you can refer to "server.example.com" as
simply "server".  The only way to tell if "server" is a meaningful host is
to ask the DNS, and the answer can change.

> 4) Follow the carefully and clearly defined specification[d] that is
> over 15 years old and is well-adopted by existing uri parsing libraries.
> Requires minimal, non-breaking spec changes, purely for clarity.
>

Requires a small breaking spec change to remove the sentence about
defaulting to "gemini://" in 5.4.2 and preferably in 2 as well.  But 5.4.2
is self-contradictory and has to be fixed.

My proposal is to rewrite section 2 to say this:

<URL> is an absolute URL according to RFC 3986, of maximum length 1024
bytes.

And to rewrite section 5.4.2 to say this:

<URL> is a URI reference according to RFC 3986.

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-17 23:33
📧 Message 18 of 31

On Tue, 17 Nov 2020 16:07:08 -0700
Alex // nytpu <alex at nytpu.com> wrote:

> I'm going to use a real-world example here because people seem to not
> get why this may be a problem.
> 
> Let's say I want to start hosting the git repo for my utility
> gemlog.sh[a] on gemini. I make a directory on my site, so the full url
> would be `gemini://nytpu.com/gemlog.sh/`. Now, say I put a link in my
> root index.gmi (`/`) linking to `gemlog.sh`[b]. This is a perfectly
> valid link to a directory on my server, but this would instead be
> interpreted as the url `gemini://gemlog.sh/` if you use the faulty
> method of parsing. (`.sh` is a valid TLD[c] so it wouldn't work even if
> you have a whitelist of tlds).

I think that we all actually agree that this can't possibly work for
links. What Ali Fardan is suggesting is to allow suffix references only
in requests, where the ambiguity could be avoided for the simple reason
that the request must contain an authority.

I completely disagree that suffix references should be used anywhere,
but the suggestion is not quite so outlandish as to require any of
options 1-3. It should be avoided for the simple reason that it
precludes option 4.

-- 
Philip

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-17 23:46
📧 Message 19 of 31

It was thus said that the Great John Cowan once stated:
> 
> Requires a small breaking spec change to remove the sentence about
> defaulting to "gemini://" in 5.4.2 and preferably in 2 as well.  But 5.4.2
> is self-contradictory and has to be fixed.
> 
> My proposal is to rewrite section 2 to say this:
> 
> <URL> is an absolute URL according to RFC 3986, of maximum length 1024
> bytes.
> 
> And to rewrite section 5.4.2 to say this:
> 
> <URL> is a URI reference according to RFC 3986.

  I've gone over the path month of logs [1] on my Gemini server and pulled
some stats.

	Total number of requests:		103,422
	Total number of schemeless requests:	    275

And of the schemeless requests:

	client #1	  2 requests
	client #2	  3 requests
	client #3	270 requests

  Given the relative rarity of such requests (0.2% of all requests) and the
number of clients requesting schemeless requests (between 0.3% to 8% [2]) I
would agree with this proposal.  A Gemini request is an absolute URL (per
RFC-3986).

  -spc

[1]	It's all I keep

[2]	Okay, on the Gemini software page [3], I count 37 known clients. 
	There are some others not listed, like CAPCOM, Spacewalk and GUS,
	but even excluding those, 3 out of 37 is 8%.  And assuming that all
	1,187 unique IP addresses were using a unique client, then the
	percentage falls to 0.3%.  The truth is somewhere in between.

	Also, my server probably gets hit by *every* client, as it serves up
	the Gemini Client Torture test.

[3]	https://portal.mozz.us/gemini/gemini.circumlunar.space/software/

Link to individual message.

Waweic <waweic (a) activ.ism.rocks>

📅 Sent: 2020-11-18 00:51
📧 Message 20 of 31

Sean Connor wrote:

> ? The path parsing rules state a single slash.? Not '/'+, nor '/'*,
> but a
> single '/'.? The only place where more than a single slash is allowed
> PER
> THE @#%@#$@$ ABNF is just prior to the authority, which contains the
> hostname.? THE ONLY PLACE!?

I am currently working on a bug in lagrange concerning this question.
It appeared to me, that multiple consecutive slashes might also be
allowed in the query, according to the ABNF, but I may be very wrong
there.

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-18 02:02
📧 Message 21 of 31

It was thus said that the Great Waweic once stated:
> Sean Connor wrote:
> 
> > ? The path parsing rules state a single slash.? Not '/'+, nor '/'*,
> > but a
> > single '/'.? The only place where more than a single slash is allowed
> > PER
> > THE @#%@#$@$ ABNF is just prior to the authority, which contains the
> > hostname.? THE ONLY PLACE!?
> 
> I am currently working on a bug in lagrange concerning this question.
> It appeared to me, that multiple consecutive slashes might also be
> allowed in the query, according to the ABNF, but I may be very wrong
> there.

  In the query section, yes, it should be.  In the path section, it should
be disallowed.  Unfortunately, I checked the ABNF in RFC-3986 and it does
appear to allow double slashes in the path section.  The rules in question:

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )

   segment       = *pchar

  A segment can be 0 or more characters, so per the spec, you could end up
with muliple slashes, and the URL parsing library I use, written against the
ABNF of RFC-3986, does in fact, accept it:

	["path//to//resource"] =
	{
	  path = "path//to//resource",
	}

  There's nothing in the errata [1] about this, but it seems like it should
be fixed.

  -spc

[1]	https://www.rfc-editor.org/errata_search.php?rfc=3986

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-18 07:33
📧 Message 22 of 31

On Tue, 17 Nov 2020 21:02:09 -0500
Sean Conner <sean at conman.org> wrote:

>   There's nothing in the errata [1] about this, but it seems like it should
> be fixed.

Nothing needs to be fixed. Zero length path segments are allowed in
some circumstances, but they are never allowed in a circumstance where
they could cause ambiguities. For this purpose, there are multiple
definitions of path segments, with -nz (non-empty) and -nz-nc
(non-empty, no colon) suffixes"

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

You can see that relative-ref is designed in such a way as to disallow
any ambiguity, by only allowing path-absolute (which starts with a
single slash and a non-empty segment), path-noscheme (which starts with
a non-empty segment not containing a colon) or path-empty (which is
zero characters):

      relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

      relative-part = "//" authority path-abempty
                    / path-absolute
                    / path-noscheme
                    / path-empty

The "path" definition itself can not be distinguished from a
relative-ref or relative-part, but the path definition is never used by
any other definition in the document. If parsing a relative-ref or
URI-reference, this is never a problem.

--
Philip

Link to individual message.

Sudipto Mallick <smallick.dev (a) gmail.com>

📅 Sent: 2020-11-18 07:52
📧 Message 23 of 31

While you are discussing about the specs, please have a look at how
the servers are currently responding to the edge cases.

http://ix.io/2EyQ

Request -> Response (first line only)
The list of known servers from gemini://gus.guru/known-hosts : removed
all non existent servers and *.flounder.online
Test yourself: http://ix.io/2Etk

And if you can, forgive my madness.

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-18 08:42
📧 Message 24 of 31

It was thus said that the Great Sudipto Mallick once stated:
> While you are discussing about the specs, please have a look at how
> the servers are currently responding to the edge cases.
> 
> http://ix.io/2EyQ
> 
> Request -> Response (first line only)
> The list of known servers from gemini://gus.guru/known-hosts : removed
> all non existent servers and *.flounder.online
> Test yourself: http://ix.io/2Etk
> 
> And if you can, forgive my madness.

  Thank you for running this and reporting the results.  I can describe why
you got the results for my server: gemini.conman.org

	gemini.conman.org -> 59 Bad Request
	gemini.conman.org/ -> 59 Bad Request
	gemini.conman.org// -> 59 Bad Request

  These are bad because there's no scheme nor authority (missing a '//') and
thus, these are marked as a bad request.

	//gemini.conman.org -> 20 text/gemini
	//gemini.conman.org/ -> 20 text/gemini
	//gemini.conman.org// -> 59 Bad Request

  These are missing the scheme, but have an authority section [1].  The URL
parser I use adds a '/' for the path if the path does not exist.  That's why
my server does not do a 31-redirect with a missing '/' at the end.  The
double slash at the end is being checked by a modified path-abempty rule. 
The ABNF from the RFC is:

	   path-abempty  = *( "/" segment )

while the URL parser I'm using is doing:

	   path_abempty <- {~ ( '/' segment)+ ~}
                        /  '' -> '/'

  The parsing code is in LPEG [2] and is equivalent to

	   path-abempty = +( "/" segment)
			/ 0<pchar> # and return a '/'

and was written that way to fix an issue inherent with the ABNF of
"0<pchar>" and how parsing works with LPEG.  I can go into details of LPEG
if anyone is interested, but suffice to say, the path_abempty of LPEG is
different from the ABNF of the RFC for a good reason, and this is why the
trailing '//' from the authority section is not parsing.

	gemini://gemini.conman.org -> 20 text/gemini
	gemini://gemini.conman.org/ -> 20 text/gemini
	gemini://gemini.conman.org// -> 59 Bad Request

  A more normal request, and the same explanation from above.  No surprises
for my server (at least, to me).  A more interesting response is from
blekksprut.net and cadence.moe:

	blekksprut.net -> 20 text/gemini
	blekksprut.net/ -> 20 text/gemini
	blekksprut.net// -> 20 text/gemini
	//blekksprut.net -> 51 not found
	//blekksprut.net/ -> 51 not found
	//blekksprut.net// -> 51 not found
	gemini://blekksprut.net -> 20 text/gemini
	gemini://blekksprut.net/ -> 20 text/gemini
	gemini://blekksprut.net// -> 20 text/gemini

	cadence.moe -> 20 text/gemini; charset=utf-8; lang=en
	cadence.moe/ -> 20 text/gemini; charset=utf-8; lang=en
	cadence.moe// -> 20 text/gemini; charset=utf-8; lang=en
	//cadence.moe -> 50 Bliz server: Not found: //cadence.moe
	//cadence.moe/ -> 50 Bliz server: Not found: //cadence.moe/
	//cadence.moe// -> 50 Bliz server: Not found: //cadence.moe//
	gemini://cadence.moe -> 20 text/gemini; charset=utf-8; lang=en
	gemini://cadence.moe/ -> 20 text/gemini; charset=utf-8; lang=en
	gemini://cadence.moe// -> 20 text/gemini; charset=utf-8; lang=en

  These results probably stem from a same issue, but possibly different
servers.  Just going quickly through the results, if there was no problem
with the first grouping (just the domain name), it seems the servers *have* an
issue with the second grouping (leading '//').  Odd.

  Again, thanks for this.

  -spc

[1]	I've been debating if I should mark a missing scheme as a "bad
	request" as I've come around to support that a Gemini server should
	ONLY accept an absolute URL.  I haven't ... yet.

[2]	Lua Parsing Expression Grammar

Link to individual message.

bie <bie (a) 202x.moe>

📅 Sent: 2020-11-18 09:09
📧 Message 25 of 31

On Wed, Nov 18, 2020 at 03:42:57AM -0500, Sean Conner wrote:

>   A more normal request, and the same explanation from above.  No surprises
> for my server (at least, to me).  A more interesting response is from
> blekksprut.net and cadence.moe:
>
> [...] 
>
>   These results probably stem from a same issue, but possibly different
> servers.

They're definitely different servers - blekksprut.net is running on my
own code...
The results were a little surprising, so I'm going to be doing some
bugfixing again tonight ;)

>   Again, thanks for this.

Seconding the thanks, this is great stuff!

bie

Link to individual message.

Sudipto Mallick <smallick.dev (a) gmail.com>

📅 Sent: 2020-11-18 15:29
📧 Message 26 of 31

Statistics from the data I collected:

request
    response code -> percentange
    :
    :

"$host"
    59 -> 55%
    53 -> 22%
    20 -> 4.8%

"$host/"
    59 -> 55%
    53 -> 22%
    51 -> 7.7%
    20 -> 6.8%

"$host//"
    59 -> 55%
    53 -> 22%
    51 -> 7.7%
    20 -> 6.4%

"//$host"
    31 -> 55% (!)
    20 -> 29%
    59 -> 12%
    51 -> 7.7%

"//$host/"
    20 -> 67%
    59 -> 12%
    51 -> 7%
    53 -> 2%
    50 -> 2%

"//$host/"
    20 -> 61%
    59 -> 15%
    51 -> 10%

"gemini://$host"
    31 -> 57.6% (!!)
    20 -> 34%
    30 -> 1.6%

"gemini://$host/"
    20 -> 93%

"gemini://$host//"
    20 -> 84%
    51 -> 6%
out of http://ix.io/2EzQ

Link to individual message.

Sudipto Mallick <smallick.dev (a) gmail.com>

📅 Sent: 2020-11-18 15:35
📧 Message 27 of 31

Clicked send button too fast...
the second "//$host/" should be "//$host//"

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

📅 Sent: 2020-11-18 15:46
📧 Message 28 of 31

On Wed, 18 Nov 2020 20:59:53 +0530
Sudipto Mallick <smallick.dev at gmail.com> wrote:

Very interesting and good summary, Sudipto.

> "//$host"
>     31 -> 55% (!)
>     20 -> 29%
>     59 -> 12%
>     51 -> 7.7%

There is probably some overlap here with hosts that generally serve
redirects for empty paths.

> "gemini://$host"
>     31 -> 57.6% (!!)
>     20 -> 34%
>     30 -> 1.6%
> 
> "gemini://$host/"
>     20 -> 93%

This is alarming IMO. I have expressed it before in the mailing list,
but because of the normalization rules of RFC 3986, an empty path is

	equivalent* to the path "/". Serving a 3x redirect on one and a page on

the other is wrong.

In this case it's likely rather benign that they serve different
content, because I assume that a client will arrive at the same resource
after following a redirect, but it has to be understood that a client
might make these generalizations as well, in which case that client
can't access the resource that's served when requesting an empty path.

It would be interesting to figure out which server software is the
culprit.

-- 
Philip

Link to individual message.

Remco <me (a) rwv.io>

📅 Sent: 2020-11-18 16:16
📧 Message 29 of 31

2020/11/18 16:46, Philip Linde:

>> "gemini://$host"
>>     31 -> 57.6% (!!)
>>     20 -> 34%
>>     30 -> 1.6%
>>
>> "gemini://$host/"
>>     20 -> 93%
>
> This is alarming IMO. I have expressed it before in the mailing list,
> but because of the normalization rules of RFC 3986, an empty path is
> *equivalent* to the path "/". Serving a 3x redirect on one and a page on
> the other is wrong.
>
> In this case it's likely rather benign that they serve different
> content, because I assume that a client will arrive at the same resource
> after following a redirect, but it has to be understood that a client
> might make these generalizations as well, in which case that client
> can't access the resource that's served when requesting an empty path.
>
> It would be interesting to figure out which server software is the
> culprit.

If you want to point fingers:

  https://github.com/michael-lazar/gemini-diagnostics/blob/master/gemini-diagnostics#L440

That's what I based my implementation on and I suspect many others did
so too.

R.

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-11-18 21:09
📧 Message 30 of 31

It was thus said that the Great Remco once stated:
> 2020/11/18 16:46, Philip Linde:
> 
> >> "gemini://$host"
> >>     31 -> 57.6% (!!)
> >>     20 -> 34%
> >>     30 -> 1.6%
> >>
> >> "gemini://$host/"
> >>     20 -> 93%
> >
> > This is alarming IMO. I have expressed it before in the mailing list,
> > but because of the normalization rules of RFC 3986, an empty path is
> > *equivalent* to the path "/". Serving a 3x redirect on one and a page on
> > the other is wrong.
> >
> > In this case it's likely rather benign that they serve different
> > content, because I assume that a client will arrive at the same resource
> > after following a redirect, but it has to be understood that a client
> > might make these generalizations as well, in which case that client
> > can't access the resource that's served when requesting an empty path.
> >
> > It would be interesting to figure out which server software is the
> > culprit.
> 
> If you want to point fingers:
> 
>   https://github.com/michael-lazar/gemini-diagnostics/blob/master/gemini-
diagnostics#L440
> 
> That's what I based my implementation on and I suspect many others did
> so too.

  The test isn't *wrong* per se, it's just testing at the wrong level.  My
server will return:

	gemini://gemini.conman.org	-> 20
	gemini://gemini.conman.org/	-> 20

but

	gemini://gemini.conman.org/test	-> 31 gemini://gemini.conman.org/test/

  which is what that test is testing.  

  -spc

Link to individual message.

Michael Lazar <lazar.michael22 (a) gmail.com>

📅 Sent: 2020-11-19 19:37
📧 Message 31 of 31

On Wed, Nov 18, 2020 at 4:10 PM Sean Conner <sean at conman.org> wrote:
>
> It was thus said that the Great Remco once stated:
> >
> > If you want to point fingers:
> >
> > https://github.com/michael-lazar/gemini-diagnostics/blob/master/gemini-
diagnostics#L440
> >
> > That's what I based my implementation on and I suspect many others did
> > so too.
>
>  The test isn't *wrong* per se, it's just testing at the wrong level. My
> server will return:
>
>     gemini://gemini.conman.org   -> 20
>     gemini://gemini.conman.org/   -> 20
>
> but
>
>     gemini://gemini.conman.org/test -> 31 gemini://gemini.conman.org/test/
>
>   which is what that test is testing.
>
>   -spc

Yes that's probably what I meant to do. It was difficult to write many of the
tests because they can't assume that any particular directory exists on the
server. I didn't realize that the root URL was special in this regard.

I think this is an interesting problem for the gemini protocol. In HTTP you
typically only have one way to write out this request so it's never a problem:

GET / HTTP/1.1

Even though "gemini://example.com" and "gemini://example.com/" are supposed to
be identical per the URL definition, good luck getting gemini developers to
read through 100+ pages of RFCs and implement this correctly.

- Michael

Link to individual message.

---

Previous Thread: Regarding non-finite response bodies

Next Thread: [ANN] Dʒɛmɪni, a gemini server