Question About Link Format

1. Ben (benulo (a) systemli.org)

Hello guys, I am not normally one to get involved in the Gemini spec. I 
got into a discussion with Martin about how UTF-8 characters are 
supposed to be handled in links in Gemini documents.

On my site I have a link in a document like this:

=> logarion/ta?ikio--lando-montara.gmi Ta?ikio: Lando Montara

This refers locally to the actual file name on the disk. The main 
question is, is this allowed in Gemini documents? I thought this should 
work because I believe that Gemini is UTF-8 native or by default, and my 
Unix file system (in this case FreeBSD UFS) appears to be in agreement.

The question is whether clients must support this as well, as in my 
experience so far all of them do but Martin's client seems to reject 
this link due to containing non-ASCII characters and doesn't handle it. 
He said that RFC3986 does not allow links to have non-ASCII characters, 
but perhaps this isn't relevant to Gemini's internal encoding and 
document format, but rather for exported URI's (ie made universal).

It does seem a proper URI should best contain %C4%9D in place of ?, but 
the question is whether I should change it in the document? Does the 
internal linking (in my case the link is local/relative) even count as a 
URI?

Ben

-- 
gemini://kwiecien.us/

Link to individual message.

2. Alex Schroeder (alex (a) gnu.org)

On Fri, 2020-07-17 at 13:51 +0430, Ben wrote:
> It does seem a proper URI should best contain %C4%9D in place of ?,
> but 
> the question is whether I should change it in the document? Does the 
> internal linking (in my case the link is local/relative) even count
> as a 
> URI?

I think you should change it to %C4%9D; otherwise you're relying on
clients to do it for you (which might work). But in the example you
give, the stuff after => is a URL, just not an absolute one. It's a
relative URL and the client is supposed to know how to combine it with
the URL of our current URL and to request this document if the user
wants to follow the link. So if the current URL is
gemini://example.org/foo/bar then following the link below will take
the user to gemini://example.org/foo/logarion/ta%C4%9Dikio.

=> logarion/ta?ikio--lando-montara.gmi Ta?ikio: Lando Montara

A good way to think about this would be spaces in file names. Assume
the filename is "ta?ikio: lando montara.gmi". What would you write?
This won't work:

=> logarion/ta?ikio: lando montara.gmi Ta?ikio: Lando Montara

If you escape the spaces, why not escape the rest that needs escaping?

=> logarion/ta?ikio:%20lando%20montara.gmi Ta?ikio: Lando Montara

That's how I reason about it. Or if you want to go all-in, RFC 3986 has
you covered. The only characters that unambiguously never have to
escaped, no matter where they appear, are the unreserved ones:

   Characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include uppercase and
   lowercase letters, decimal digits, hyphen, period, underscore,
   and tilde.

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

   https://tools.ietf.org/html/rfc3986#section-2.3

And to be clear, ALPHA means a-z and A-Z, nothing else.

Cheers
Alex

Link to individual message.

3. Solderpunk (solderpunk (a) posteo.net)

On Fri Jul 17, 2020 at 11:21 AM CEST, Ben wrote:
> Hello guys, I am not normally one to get involved in the Gemini spec. I
> got into a discussion with Martin about how UTF-8 characters are
> supposed to be handled in links in Gemini documents.
>
> On my site I have a link in a document like this:
>
> => logarion/ta?ikio--lando-montara.gmi Ta?ikio: Lando Montara
>
> This refers locally to the actual file name on the disk. The main
> question is, is this allowed in Gemini documents? I thought this should
> work because I believe that Gemini is UTF-8 native or by default, and my
> Unix file system (in this case FreeBSD UFS) appears to be in agreement.

Aaah, I figured we were going to have to deal with this sooner or later.
This has been one of those few remaining unpleasant details in the back
of my mind that I know needs to get sorted out.  It's because of the
existence of things like this that I'm so averse to adding anything new
to the spec - it runs the risk of introducing more things like this,
which aren't obvious at first but then come up only after a few months
of use.

The spec currently uses language like "UTF-8 encoded absolute URL" which
I have to admit has been there since the very earliest version and which
I wrote without any kind of deeper awareness of how this intersected
with existing RFCs.  I've since come to realise that it's very possible
that this language is potentially ambiguous at best, and contradictory
at worse.

I suspect this is going to need a bit of reading and thinking to come up
with a clear stance on and to make appropriate changes on the spec...

> It does seem a proper URI should best contain %C4%9D in place of ?, but
> the question is whether I should change it in the document? Does the
> internal linking (in my case the link is local/relative) even count as a
> URI?

It definitely counts as a relative URI.

Cheers,
Solderpunk

Link to individual message.

---

Previous Thread: [ANN] gemini://caranatar.xyz and Denoscuri

Next Thread: CGI