💾 Archived View for gemi.dev › gemini-mailing-list › 000016.gmi captured on 2023-11-04 at 12:18:24. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-12-28)

-=-=-=-=-=-=-

URLs in request lines

plugd <plugd (a) thelambdalab.xyz>

Hi all,

This question occurred to me when debugging some 5x error responses from
servers: how strictly are servers expected to be when responding to
requests?  In particular, if a client is given a URL such as

gemini://example.com

(i.e. with an empty file name) is it expected to translate it into

gemini://example.com/  ?

The gopher-side of elpher tends to be on the strict side when
representing URLs, due to RFC 1436 strongly emphasizing that gopher
selectors should be regarded as meaningless "opaque" strings which
aren't to be messed with. (Kind of like actor addresses in the actor
model of computation.) So there is _no_ guarantee that a server will
return something sensible for the selector taken from the URL
gopher://example.com/1/ if all you've been given is
gopher://example.com/1.

Should I keep this behaviour for gemini too?  TBH I'd prefer to at least
to able to interpret gemini://exmaple.com as gemini://example.com/ when
no filename is present, and leave everything else alone.  Which would
mean the "/" selector would become the equivalent of "" in gopher.

plugd

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great plugd once stated:
> Hi all,
> 
> This question occurred to me when debugging some 5x error responses from
> servers: how strictly are servers expected to be when responding to
> requests?  In particular, if a client is given a URL such as
> 
> gemini://example.com
> 
> (i.e. with an empty file name) is it expected to translate it into
> 
> gemini://example.com/  ?

  Yes.  In RFC-3986, if you follow the BNF, you'll find this bit:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   path-abempty  = *( "/" segment )

  When parsing a URL like:

	gemini://example.com

we have the 'scheme' portion, then the two '//' which means we're following
the first rule in 'hier-part'.  'authority' is the host part (which I didn't
include) followed by a 'path-abempty', of which there can be 0 or more of,
so that's a perfectly cromulent URL.  It's the responsibility of the


  Semantically speaking, these:

	gemini://example.com
	gemini://example.com/

are the same.

  A related problem is a relative URL like:

	/test/torture/../../test/./torture/../../../../test/./././torture/./0022

  That collapses down to:

	/test/torture/0022

but in this case, I think it would be the responsibiliy of the *client* to
collapse the first into the second (and then add the scheme, host, etc.).  I
know my own server does the collapsing, but I think I may change that.

  -spc

Link to individual message.

plugd <plugd (a) thelambdalab.xyz>

Sean Conner writes:
> It was thus said that the Great plugd once stated:
>> Hi all,
>> 
>> This question occurred to me when debugging some 5x error responses from
>> servers: how strictly are servers expected to be when responding to
>> requests?  In particular, if a client is given a URL such as
>> 
>> gemini://example.com
>> 
>> (i.e. with an empty file name) is it expected to translate it into
>> 
>> gemini://example.com/  ?
>
>   Yes.  In RFC-3986, if you follow the BNF, you'll find this bit:

[snip]

Great, thank you for the explanation.  (I should have RTFM, sorry!)

>
> we have the 'scheme' portion, then the two '//' which means we're following
> the first rule in 'hier-part'.  'authority' is the host part (which I didn't
> include) followed by a 'path-abempty', of which there can be 0 or more of,
> so that's a perfectly cromulent URL.  It's the responsibility of the
> *server* to handle the situation, not the client.

Converting to this canonical form actually makes several things easier
on the client side too.

>   A related problem is a relative URL like:
>
> 	/test/torture/../../test/./torture/../../../../test/./././torture/./0022
>
>   That collapses down to:
>
> 	/test/torture/0022
>
> but in this case, I think it would be the responsibiliy of the *client* to
> collapse the first into the second (and then add the scheme, host, etc.).  I
> know my own server does the collapsing, but I think I may change that.

Fair enough.  The URL library elpher uses doesn't automatically do this,
but it seems simple enough to do (famous last words)..

Thanks for the help,

plugd

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great plugd once stated:
> Sean Conner writes:
> >   A related problem is a relative URL like:
> >
> > 	/test/torture/../../test/./torture/../../../../test/./././torture/./0022
> >
> >   That collapses down to:
> >
> > 	/test/torture/0022
> >
> > but in this case, I think it would be the responsibiliy of the *client* to
> > collapse the first into the second (and then add the scheme, host, etc.).  I
> > know my own server does the collapsing, but I think I may change that.
> 
> Fair enough.  The URL library elpher uses doesn't automatically do this,
> but it seems simple enough to do (famous last words)..

  The algorithm for that is described in  RFC-3986, section 5.2.4.  If you
can handle the examples from the RFC, you should be good to go.

  -spc

Link to individual message.

plugd <plugd (a) thelambdalab.xyz>

Hi again Sean,

Sean Conner writes:
> we have the 'scheme' portion, then the two '//' which means we're following
> the first rule in 'hier-part'.  'authority' is the host part (which I didn't
> include) followed by a 'path-abempty', of which there can be 0 or more of,
> so that's a perfectly cromulent URL.  It's the responsibility of the
> *server* to handle the situation, not the client.

I just read over this again and realised I'd been too hasty in my
earlier response.  You point out that according to the URI RFC an empty
path is a valid URL, and while this is good to know, does the following
necessarily follow?

>   Semantically speaking, these:
>
> 	gemini://example.com
> 	gemini://example.com/
>
> are the same.

For gopher, gopher://example.com/1 and gopher://example.com/1/ are not
semantically the same. (Although they are often - but not always -
treated as such.)  Section 6.2.3 on scheme-based normalization notes
that http://example.com and http://example.com/ are semantically
equivalent, and goes on to suggest that URIs of other schemes _should_
follow this example.  So I suppose we now say that gemini does?

plugd

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great plugd once stated:
> Hi again Sean,

 Hello, plugd.

> Sean Conner writes:
> > we have the 'scheme' portion, then the two '//' which means we're following
> > the first rule in 'hier-part'.  'authority' is the host part (which I didn't
> > include) followed by a 'path-abempty', of which there can be 0 or more of,
> > so that's a perfectly cromulent URL.  It's the responsibility of the
> > *server* to handle the situation, not the client.
> 
> I just read over this again and realised I'd been too hasty in my
> earlier response.  You point out that according to the URI RFC an empty
> path is a valid URL, and while this is good to know, does the following
> necessarily follow?
> 
> >   Semantically speaking, these:
> >
> > 	gemini://example.com
> > 	gemini://example.com/
> >
> > are the same.
> 
> For gopher, gopher://example.com/1 and gopher://example.com/1/ are not
> semantically the same. (Although they are often - but not always -
> treated as such.)  Section 6.2.3 on scheme-based normalization notes
> that http://example.com and http://example.com/ are semantically
> equivalent, and goes on to suggest that URIs of other schemes _should_
> follow this example.  So I suppose we now say that gemini does?

  The URL spec is RFC-3986.  Gopher gets its own URL RFC with RFC-4266.  One
major difference is in the query portion.  To send in a "query" string with
a non-gopher URL, you do:

	http://example.com/?search%20for%20me     (yes, this is valid)

  The same example for Gopher would be:

	gopher://example.com/7search%09look%20for%20me

  It does NOT use the normal query syntax for URLs.  In fact, RFC-4266 even
states:

   A Gopher URL takes the form:

      gopher://<host>:<port>/<gopher-path>

   ...

   Within the <gopher-path>, no characters are reserved.

  So the intent (in my opinion) is that one can decode the <gopher-path>
portion and pass it (minus the first character) verbatim to a gopher server
(of course after decoding any URL-encoded characters, which means that %09
is translated to an ASCII HT (horizontal tab).  Had Gopher been more in line
with URL-3986, then a gopher URL might be more like:

	gopher://example.com/7search?look%20for%20me

but I suspect this wasn't done because of Gopher+, which is covered in
RFC-4266 but I don't know of *any* servers today that support it (although
I'm willing to be corrected on that).  The Gopher+ information, is, of
course, separated from the search portion by another %09 in the URL (see
RFC-4266 section 2.9 for a crazy example of that).

  So, the upshot (as I see it) is that the gopher URL format is divorced
from the RFC-3986 URL and is its own thing.  You can't really say they have
the same semantic rules.  This is also reflected in the caps.txt file you
will sometimes find on gopher servers to address the bit in RFC-1436 that
gopher selectors are opaque and *no* meaning is to be inferred by the
client.

  As far as Gemini goes, I've been parsing Gemini URLs under RFC-3986, just
like http:, https:, ftp: and file:.

  -spc (Did that answer your question?)

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Sean Conner once stated:
>
>   So, the upshot (as I see it) is that the gopher URL format is divorced
> from the RFC-3986 URL and is its own thing.  You can't really say they have
> the same semantic rules.  This is also reflected in the caps.txt file you
> will sometimes find on gopher servers to address the bit in RFC-1436 that
> gopher selectors are opaque and *no* meaning is to be inferred by the
> client.
> 
>   As far as Gemini goes, I've been parsing Gemini URLs under RFC-3986, just
> like http:, https:, ftp: and file:.

  To further clarify things, I have one module to parse gopher URLs [1], and
another one to parse other URLs [2].  And because of how they were written,
I can parse both types of URLs very easily:

	local dump = require "org.conman.table".dump
	local url  = require "org.conman.parsers.url.gopher"
	           + require "org.conman.parsers.url" -- [3]

	dump("link",url:match "gopher://example.com/7search%09look%20for%20me")
	dump("link",url:match "gemini://example.com/search?look%20for%20me")

	link =
	{
	  type = "search",
	  port = 70.000000,
	  scheme = "gopher",
	  host = "example.com",
	  search = "look for me",
	  selector = "search",
	}
	
	link =
	{
	  port = 1965.000000,
	  scheme = "gemini",
	  host = "example.com",
	  path = "/search",
	  query = "look%20for%20me", -- [4]
	}

  A generic parsing of the gopher URL will result in:

	link =
	{
	  scheme = "gopher",
	  host = "example.com",
	  path = "/7search\tlook for me",
	}

  -spc (Gopher really is its own thing ... )

[1]	https://github.com/spc476/LPeg-Parsers/blob/master/url/gopher.lua

[2]	https://github.com/spc476/LPeg-Parsers/blob/master/url.lua

[3]	Yes, I'm adding the results of loading two modules.  This only works
	because of what I'm returning (an LPEG expression [5]), and yes, it
	would look weird to a seasoned Lua programmer.

[4]	I have reasons for not decoding the query string in generic URLs.

[5]	http://www.inf.puc-rio.br/~roberto/lpeg/

Link to individual message.

plugd <plugd (a) thelambdalab.xyz>

Sean Conner writes:
>   So, the upshot (as I see it) is that the gopher URL format is divorced
> from the RFC-3986 URL and is its own thing.  You can't really say they have
> the same semantic rules.  This is also reflected in the caps.txt file you
> will sometimes find on gopher servers to address the bit in RFC-1436 that
> gopher selectors are opaque and *no* meaning is to be inferred by the
> client.

Completely agree that gopher is special here, but my reading of RFC-3986
section 6.2.3 was that itactually allows for some scheme-dependent
behaviour anyway.

>   As far as Gemini goes, I've been parsing Gemini URLs under RFC-3986, just
> like http:, https:, ftp: and file:.
>
>   -spc (Did that answer your question?)

You almost certainly have answered my question, but I'm being really
daft and still not getting it.  (Feel free to give up!) I'm not asking
about what constitutes a valid gemini URL and more about whether it's
already been decided that a server which responds with a 2x status for
gemini://example.com/ must also respond with a 2x (and the same
document) for gemini://example.com - even though they're both valid
URLs.  Or are you saying that this semantic equivalence is already
mandated by RFC-3986?

plugd

Link to individual message.

Sean Conner <sean (a) conman.org>


  The TL;DR of this would be:  the following two URLs SHOULD be the same and
serve up the same page:

        gemini://example.com
        gemini://example.com/ 

  What's below is my thought process leading up to that.

It was thus said that the Great plugd once stated:
> Sean Conner writes:
> >   So, the upshot (as I see it) is that the gopher URL format is divorced
> > from the RFC-3986 URL and is its own thing.  You can't really say they have
> > the same semantic rules.  This is also reflected in the caps.txt file you
> > will sometimes find on gopher servers to address the bit in RFC-1436 that
> > gopher selectors are opaque and *no* meaning is to be inferred by the
> > client.
> 
> Completely agree that gopher is special here, but my reading of RFC-3986
> section 6.2.3 was that itactually allows for some scheme-dependent
> behaviour anyway.

You are right (now that I'm rereading that section).

> >   As far as Gemini goes, I've been parsing Gemini URLs under RFC-3986, just
> > like http:, https:, ftp: and file:.
> >
> >   -spc (Did that answer your question?)
> 
> You almost certainly have answered my question, but I'm being really
> daft and still not getting it.  (Feel free to give up!) 

  Give up?  Never give up!  Never surrender!  By Grabthar's hammer, by the
suns Worvan, you shall be avenged!

  Oh, sorry.  Got carried away there.

  No, this is good.  I'm having to clarify my own thoughts on this.  You
have no idea how much I've written and deleted in writing this reply.

> I'm not asking
> about what constitutes a valid gemini URL and more about whether it's
> already been decided that a server which responds with a 2x status for
> gemini://example.com/ must also respond with a 2x (and the same
> document) for gemini://example.com - even though they're both valid
> URLs.  Or are you saying that this semantic equivalence is already
> mandated by RFC-3986?

  Technically speaking, these two URLs are NOT the same:

	http://example.com/foo
	http://example.com/foo/

and a webserver could technically serve up different content for the two
requests.  But RFC-3986, section 6.2.3 states that the following are the
same for http:

	http://example.com
	http://example.com/

  The question you are asking is---are the these two examples the same for
Gemini?

	gemini://example.com
	gemini://example.com/

  Again, per RFC-3986, they are (unless some document describing Gemini URLs
states otherwise).  I naturally assumed they would be as they are for HTTP.
That's perhaps an unwarrented assumption, but I would have to think the two


  The *only* character that can follow the authority section (hostname,
port, etc) of a URL is the slash.  There is no other valid character that
can go there.  So while it is possible for a Gemini server to serve up two
different resources for the example above, that would, in my opinion,
violate the rule of least surprise.  Even with gopher URLs, RFC-4266,
section 2.1 says the following are the same:

	gopher://example.com
	gopher://example.com/
	gopher://example.com/1

  The slash between the authority section and the selector is required not
only by URL parsing rules, but because there are alphaebtic gopher types. 
Also, RFC-1436 (Introduction) states that gopher selectors are opaque and
have no meaning, so the following two are distinct resources:

	gopher://example.com/1
	gopher://example.com/1/

  Addtionally, these *are* already covered by RFC-3986 (first example
above), but I digress.  So while technically what you say is true (two
different resources) I would recommend against it (rule of least surprise).

  With that out of the way, I do handle the lack of a trailing slash with
the request <gemini://gemini.conman.org> in GLV-1.12556 by sending out a
redirect.  I think mine is the only server that actually does something
different than serve up the top level page [1]. I also do that for any
requests that map to a directory that don't end in a slash [2].

  -spc (Did this answer your question?)

[1]	I tried all other known Gemini servers by just requesting, say:

		gemini://mozz.us

	And they all returned their default page.

[2]	RFC-3986, section 6.2.4.  The Apache web server even has code to
	deal when you give a URL to a filesystem directory that doesn't end
	in a slash:

		A "trailing slash" redirect is issued when the server
		receives a request for a URL http://servername/foo/dirname
		where dirname is a directory.  Directories require a
		trailing slash, so mod_dir issues a redirect to
		http://servername/foo/dirname/.

		http://httpd.apache.org/docs/2.4/mod/mod_dir.html

	I even had to deal with this in GLV-1.12556 to properly handle
	client certificates with respect to the rule of least surprise.

Link to individual message.

Bradley D. Thornton <Bradley (a) NorthTech.US>



On 9/14/2019 3:36 PM, Sean Conner wrote:
> 
>   The TL;DR of this would be:  the following two URLs SHOULD be the same and
> serve up the same page:
> 
>         gemini://example.com
>         gemini://example.com/ 
> 
>   What's below is my thought process leading up to that.
> 
> 
>   Technically speaking, these two URLs are NOT the same:
> 
> 	http://example.com/foo
> 	http://example.com/foo/
> 
> and a webserver could technically serve up different content for the two
> requests.

And that does happen sometimes in the real world, and confuses people.
To me, the trailing slash forces the implication that you're looking for
an index file in that particular directory, whatever kind is specified
by httpd, rather than a file named foo.

After all, there could exist, both a directory *AND* a file named 'foo'.


> But RFC-3986, section 6.2.3 states that the following are the
> same for http:
> 
> 	http://example.com
> 	http://example.com/
>

And IMO, they should be treated as such, but are sometimes not. I've
received 404's on occsion when omitting the trailing slash, and just got
into the habit of (most often) putting it there when I type in an URL.

But your example above isn't the same as the example above that ;)

I do my very best when referring to directory structure to practice this
too. If nothing else, I get to omit the word, "directory" in Howto's and
tuts or procedural documents I write. After all, without using the word
'directory', or using the trailing slash, one *could* be referring to a
regular file type instead of a directory file type ;)


>   The question you are asking is---are the these two examples the same for
> Gemini?
> 
> 	gemini://example.com
> 	gemini://example.com/
> 
>   Again, per RFC-3986, they are (unless some document describing Gemini URLs
> states otherwise).  I naturally assumed they would be as they are for HTTP.
> That's perhaps an unwarrented assumption, but I would have to think the two
> *are* the same, because of the nature of URLs.

What about:
     gemini://example.com/foo
     gemini://example.com/foo/

In the first example, couldn't that refer to either a request for either
an index.gmi in foo/ or a file in / called foo?

I'm not proposing anything. I'm just asking.

> 
>   The *only* character that can follow the authority section (hostname,
> port, etc) of a URL is the slash.  There is no other valid character that
> can go there.  So while it is possible for a Gemini server to serve up two
> different resources for the example above, that would, in my opinion,
> violate the rule of least surprise.  Even with gopher URLs, RFC-4266,
> section 2.1 says the following are the same:
> 
> 	gopher://example.com
> 	gopher://example.com/
> 	gopher://example.com/1
> 
>   The slash between the authority section and the selector is required not
> only by URL parsing rules, but because there are alphaebtic gopher types. 
> Also, RFC-1436 (Introduction) states that gopher selectors are opaque and
> have no meaning, so the following two are distinct resources:
> 
> 	gopher://example.com/1
> 	gopher://example.com/1/

Hm... I don't know that I know the difference there.Perhaps I should
read up in the appropriate section of the RFC below. If that URI were
for HTTP, it would be saying, "Give me the file named '1'. Actually,
before I over think this, it would mean the same thing in HTTP.

> 
>   Addtionally, these *are* already covered by RFC-3986 (first example
> above), but I digress.  So while technically what you say is true (two
> different resources) I would recommend against it (rule of least surprise).
> 

ANYWAY...

I've been invited over to another farm tonight to do some Drinkin'
Lincoln, and I know they've got good sippin' whiskey there lolz.

I've finished fiddling with the main gophermap on gopher://Vger.Cloud
and one client is giving me errors about:

<snip>
sh: 1: Syntax error: "(" unexpected
</snip>

And I have no idea where the heck that is. It's the only client slinging
that error to me and it was giving me two of those, so I apparently, and
somehow inadvertently, fixed one of those errors. I see no open paren's
without a corrsponding closing paren, and then, only in the raw text of
a sentence.

I also Gave Tim a good plug for Elpher both there and on V'Ger's Gemini
site. I really like that browser :)

I also botched my install of Elpher, so maybe could use a little guidance ;)

While upgrading from 2.2.0 yesterday, I found out that I got a little
impatient and didn't terminate the upgrade with an 'x', after the 'u' to
upgrade, and C-Xc'd out. That left both versions installed, and now,
even though I've ripped it out and reinstalled it several times
(including at one point to del the directory (elpher, IIRC) in emacs.d/
and now I am forever not able to get anything but:

<snip>
LOADING GEMINI... (use 'u' to cancel)
</snip>

At least gemini://vger.cloud gives me a definitive 51 error :)

<snip>

---- ERROR -----

When attempting to retrieve gemini://vger.cloud/:
Gemini server reports PERMANENT FAILURE for this request: "51	Not Found".

----------------

Press 'u' to return to the previous page.

</snip>

It's frustrating, because I'm not sure why, and every other single
client brings both sites up just dandy like. And I'm not trying to
access my server w/Elpher from localhost, if that's relevant.

That should answer your question too from yesterday about the errors
with your site Sean ;)

I promise to be patient from now one when updating Melpa packages, and
would like to avoid ripping out Emacs by it's fdisking throat and
re-installing, so if someone's got any clues to fix it in situ, please,
do let me know. I promise to be good next time lol.

-- 
Bradley D. Thornton
Manager Network Services
http://NorthTech.US
TEL: +1.310.421.8268

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Bradley D. Thornton once stated:
> On 9/14/2019 3:36 PM, Sean Conner wrote:
> > 
> >   The TL;DR of this would be:  the following two URLs SHOULD be the same and
> > serve up the same page:
> > 
> >         gemini://example.com
> >         gemini://example.com/ 
> > 
> >   What's below is my thought process leading up to that.
> > 
> > 
> >   Technically speaking, these two URLs are NOT the same:
> > 
> > 	http://example.com/foo
> > 	http://example.com/foo/
> > 
> > and a webserver could technically serve up different content for the two
> > requests.
> 
> And that does happen sometimes in the real world, and confuses people.
> To me, the trailing slash forces the implication that you're looking for
> an index file in that particular directory, whatever kind is specified
> by httpd, rather than a file named foo.
> 
> After all, there could exist, both a directory *AND* a file named 'foo'.

  That would be a good trick.  I don't know of any filesystem that will
allow you to have a file *and* a directory with the same name in the same
location.  


  I had to do that with a custom non-filesystem handler.

> > But RFC-3986, section 6.2.3 states that the following are the
> > same for http:
> > 
> > 	http://example.com
> > 	http://example.com/
> >
> 
> And IMO, they should be treated as such, but are sometimes not. I've
> received 404's on occsion when omitting the trailing slash, and just got
> into the habit of (most often) putting it there when I type in an URL.

  That sounds like a webserver that hasn't been set up to automatically
redirect when it sees a resource that is an actual directory but the request
doesn't include a trailing slash.

> >   The question you are asking is---are the these two examples the same for
> > Gemini?
> > 
> > 	gemini://example.com
> > 	gemini://example.com/
> > 
> >   Again, per RFC-3986, they are (unless some document describing Gemini URLs
> > states otherwise).  I naturally assumed they would be as they are for HTTP.
> > That's perhaps an unwarrented assumption, but I would have to think the two
> > *are* the same, because of the nature of URLs.
> 
> What about:
>      gemini://example.com/foo
>      gemini://example.com/foo/

  What about it?

	gemini://gemini.conman.org/foo
	gemini://gemini.conman.org/foo/

> In the first example, couldn't that refer to either a request for either
> an index.gmi in foo/ or a file in / called foo?

  Try it.  [1]

> I'm not proposing anything. I'm just asking.

  And I'm answering 8-P

  But seriously, I don't think this should be done, even if it can be done.

> I've been invited over to another farm tonight to do some Drinkin'
> Lincoln, and I know they've got good sippin' whiskey there lolz.
> 
> I've finished fiddling with the main gophermap on gopher://Vger.Cloud
> and one client is giving me errors about:
> 
> <snip>
> sh: 1: Syntax error: "(" unexpected
> </snip>

  It's the server.  My own gopher client is failing to show the output. 
Doing it manually reveals the issue:

[spc]lucy:~/source/gopher>nc vger.cloud 70

sh: 1: Syntax error: "(" unexpected
iWelcome to Vger!       TITLE   null.host       1
i               null.host       1

  I only mention in case it wasn't clear if the error was in the client or
server.

  -spc 

[1]	I had to write a custom handler to do that.  I don't have a file
	named 'foo' or a directory named 'foo'.

Link to individual message.

solderpunk <solderpunk (a) SDF.ORG>

>   The TL;DR of this would be:  the following two URLs SHOULD be the same and
> serve up the same page:
> 
>         gemini://example.com
>         gemini://example.com/ 

I'll admit to having not yet done the RFC reading ad thinking required
to know whether the above is in fact already mandated by RFC-3986, but
the behaviour above is what I *strongly* think should be correct for
Gemini, and if it's not RFC-mandated then we should make it so in our
own spec.

>   Again, per RFC-3986, they are (unless some document describing Gemini URLs
> states otherwise).  I naturally assumed they would be as they are for HTTP.
> That's perhaps an unwarrented assumption, but I would have to think the two
> *are* the same, because of the nature of URLs.

Certainly, I want Gemini URLs to be as "generic" as possible, and I
definitely reject the idea of a Gopher-style approach of having our own
URL standard which deviates from the rest of the world.  Being able to
do away with that was (a small) part of the motivation for putting
content type in the Gemini response header.

Are HTTP URLs perfectly generic or do they have some scheme-dependent
details in there?

So far I have basically treated Gemini URLs as following whatever rules
HTTP URLs do.  This is because the URL (un)parsing tools in the Python
standard library don't work nicely on schemes they don't recognise (like
gemini://), and they won't convert relative URLs to absolute URLs
correctly.  So, I have written some fairly ugly code where "gemini://"
is replaced by "http://" before doing stuff using those tools, and then
this is reversed at the end.  I don't really like doing this, but I
dislike it less than writing my own URL-mangling code.  Because of this
limitation, practically speaking Gemini URLs had better act identically
with *some* standard URL scheme that Python is aware of, and HTTP seems
a much more sensible choice than, say, FTP.

> With that out of the way, I do handle the lack of a trailing slash with
> the request <gemini://gemini.conman.org> in GLV-1.12556 by sending out a
> redirect.  I think mine is the only server that actually does something
> different than serve up the top level page [1]. I also do that for any
> requests that map to a directory that don't end in a slash [2].

Something I've written (Gegobi, probably?) sends a redirect to add a
missing trailing slash for requests that map to directories.  This was
necessary for relative URL absolutisation to function correctly.  I
remember thinking at the time it was kind of annoying to have to do
this, but I guess it's the price we have to pay for all of the nice
stuff that using URLs provides.

-Solderpunk

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great solderpunk once stated:
> >   The TL;DR of this would be:  the following two URLs SHOULD be the same and
> > serve up the same page:
> > 
> >         gemini://example.com
> >         gemini://example.com/ 
> 
> I'll admit to having not yet done the RFC reading ad thinking required
> to know whether the above is in fact already mandated by RFC-3986, but
> the behaviour above is what I *strongly* think should be correct for
> Gemini, and if it's not RFC-mandated then we should make it so in our
> own spec.
> 
> >   Again, per RFC-3986, they are (unless some document describing Gemini URLs
> > states otherwise).  I naturally assumed they would be as they are for HTTP.
> > That's perhaps an unwarrented assumption, but I would have to think the two
> > *are* the same, because of the nature of URLs.
> 
> Certainly, I want Gemini URLs to be as "generic" as possible, and I
> definitely reject the idea of a Gopher-style approach of having our own
> URL standard which deviates from the rest of the world.  Being able to
> do away with that was (a small) part of the motivation for putting
> content type in the Gemini response header.
> 
> Are HTTP URLs perfectly generic or do they have some scheme-dependent
> details in there?

  It's general.  RFC-3986 states:

	In general, a URI that uses the generic syntax for authority with an
	empty path should be normalized to a path of "/".  Likewise, an
	explicit ":port", for which the port is empty or the default for the
	scheme, is equivalent to one where the port and its ":" delimiter
	are elided and thus should be removed by scheme-based normalization. 

  So treating the two Gemini URLs listed above as semantically identical is
fine by the RFC.

> So far I have basically treated Gemini URLs as following whatever rules
> HTTP URLs do.  This is because the URL (un)parsing tools in the Python
> standard library don't work nicely on schemes they don't recognise (like
> gemini://), 

  That's probably because there are a ton of URL schemes [1] that don't
follow the generic URL format so it's best to only merge URL formats that
are recognized.

> > With that out of the way, I do handle the lack of a trailing slash with
> > the request <gemini://gemini.conman.org> in GLV-1.12556 by sending out a
> > redirect.  I think mine is the only server that actually does something
> > different than serve up the top level page [1]. I also do that for any
> > requests that map to a directory that don't end in a slash [2].
> 
> Something I've written (Gegobi, probably?) sends a redirect to add a
> missing trailing slash for requests that map to directories.  This was
> necessary for relative URL absolutisation to function correctly.  I
> remember thinking at the time it was kind of annoying to have to do
> this, but I guess it's the price we have to pay for all of the nice
> stuff that using URLs provides.

  True.  And the Apache webserver also does a redirect in such cases.  So
it's a known problem with a known solution.

  -spc

[1]	https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Link to individual message.

plugd <plugd (a) thelambdalab.xyz>

Hi Sean,

Sean Conner writes:

>   The TL;DR of this would be:  the following two URLs SHOULD be the same and
> serve up the same page:
>
>         gemini://example.com
>         gemini://example.com/ 

Great, so I will continue to do what I have been doing: automatically
replacing empty filenames with "/".

>> Completely agree that gopher is special here, but my reading of RFC-3986
>> section 6.2.3 was that itactually allows for some scheme-dependent
>> behaviour anyway.
>
> You are right (now that I'm rereading that section).

Great to know that I'm not going completely mad!

>> You almost certainly have answered my question, but I'm being really
>> daft and still not getting it.  (Feel free to give up!) 
>
>   Give up?  Never give up!  Never surrender!  By Grabthar's hammer, by the
> suns Worvan, you shall be avenged!

holy moly 

>   Oh, sorry.  Got carried away there.

:-)

>   No, this is good.  I'm having to clarify my own thoughts on this.  You
> have no idea how much I've written and deleted in writing this reply.

Thanks for your persistence!

>   Technically speaking, these two URLs are NOT the same:
>
> 	http://example.com/foo
> 	http://example.com/foo/
>
> and a webserver could technically serve up different content for the two
> requests.  But RFC-3986, section 6.2.3 states that the following are the
> same for http:
>
> 	http://example.com
> 	http://example.com/
>
>   The question you are asking is---are the these two examples the same for
> Gemini?
>
> 	gemini://example.com
> 	gemini://example.com/

Exactly.

>   Again, per RFC-3986, they are (unless some document describing Gemini URLs
> states otherwise).  I naturally assumed they would be as they are for HTTP.
> That's perhaps an unwarrented assumption, but I would have to think the two
> *are* the same, because of the nature of URLs.
>
>   The *only* character that can follow the authority section (hostname,
> port, etc) of a URL is the slash.  There is no other valid character that
> can go there.  So while it is possible for a Gemini server to serve up two
> different resources for the example above, that would, in my opinion,
> violate the rule of least surprise.  Even with gopher URLs, RFC-4266,
> section 2.1 says the following are the same:
>
> 	gopher://example.com
> 	gopher://example.com/
> 	gopher://example.com/1
>
>   The slash between the authority section and the selector is required not
> only by URL parsing rules, but because there are alphaebtic gopher types. 
> Also, RFC-1436 (Introduction) states that gopher selectors are opaque and
> have no meaning, so the following two are distinct resources:
>
> 	gopher://example.com/1
> 	gopher://example.com/1/
>
>   Addtionally, these *are* already covered by RFC-3986 (first example
> above), but I digress.  So while technically what you say is true (two
> different resources) I would recommend against it (rule of least surprise).

This is all tremendously helpful and clear.  I agree with semantic
equivalence of "" and "/" filenames or filenames being the optimal
behaviour for servers.  Servers are always free to be relaxed about what
they accept, but clients need to take care that they don't make
assumptions that may not always be true.

In elpher I'm very picky about the gopher selector portion of the URL. I
map both gopher://example.com and gopher://example.com/ to
gopher://example.com/1, but *never* presume to alter anything after the
selector type character.  So gopher://example.com/1 is *never* converted
to gopher://example.com/1/ or vice versa.

For gemini I currently map gemini://example.com to gemini://example.com/
on parsing, but (besides dot segment conversions) never alter anything
else.  (Whether servers treat gemini://example.com/file and
gemini://example.com/file/ as equivalent is none of the client's
business.)

>   With that out of the way, I do handle the lack of a trailing slash with
> the request <gemini://gemini.conman.org> in GLV-1.12556 by sending out a
> redirect.  I think mine is the only server that actually does something
> different than serve up the top level page [1]. I also do that for any
> requests that map to a directory that don't end in a slash [2].

Yes, I noticed this when implementing redirects on the client
side. Elpher shouldn't be asking for "" anymore though.

>   -spc (Did this answer your question?)

Very clearly.  Thank you for your patience!

plugd

Link to individual message.

plugd <plugd (a) thelambdalab.xyz>


Hi Bradley, just quickly:

Bradley D. Thornton writes:
> On 9/14/2019 3:36 PM, Sean Conner wrote:
> I've been invited over to another farm tonight to do some Drinkin'
> Lincoln, and I know they've got good sippin' whiskey there lolz.

> I also Gave Tim a good plug for Elpher both there and on V'Ger's Gemini
> site. I really like that browser :)

Thanks!

> I also botched my install of Elpher, so maybe could use a little guidance ;)

In this order:
1. Try removing the package using M-x package-delete <RET> elpher <RET>
2. Get rid of any remaining elpher-prefixed directories under
~/.emacs.d/elpa/
4. Remove any trace of elpher from your ~/.emacsrc or ~/.emacs.d/init.el scripts.
3. Restart emacs
4. Install again using M-x package-install <RET> elpher <RET>

Any remaining problems are mine, not yours. :-)

> When attempting to retrieve gemini://vger.cloud/:
> Gemini server reports PERMANENT FAILURE for this request: "51	Not Found".

Elpher was in a bit of flux yesterday (there were a slew of patch
releases, at least one of which was too hasty), so it's entirely
possible that you were unlucky and installed a broken release.  In which
case, sorry!  Things should be more stable now.  (I usually try a bit
harder not to break master, but I was actually in the process of rushing
out a fix when I broke something else.  I really need some unit tests...)

plugd

Link to individual message.

---

Previous Thread: IPv6 and gemini

Next Thread: Clarification of current spec