Some reading on IRIs and IDNs

John Cowan <cowan at ccil.org> writes:

>> 2. Percent-encode reserved characters and non-US-ASCII characters in the
>>    path, query, and fragment components.

> You don't want to escape the ASCII reserved characters, because they should
> already be escaped.  Changing the path /foo/bar.gmi to %25foo%25bar.gmi
> would be Evil and Wrong.  If you really want that path, you have to encode
> it yourself.

Yes, that is quite right. I suppose we are using a different
interpretation of the phrase "reserved characters" here. For clarity, I
meant characters such as those in the string " ?#", which are either
forbidden (when unencoded) within the path, query, and fragment
components or are used to delimit them.

> 2.5. If the IRI is a relative reference, resolve it against the URI of the
> text/gemini file that contains it.

Yep.

>> 4. Send the punycode + percent-encoded URI as the request to the Gemini
>>    server.
>
> Note that fragments must not be sent, so if there is one, chop it off.

I'm not sure that is the case here. To quote the Gemini spec:

========================================================================
1.2 Gemini URI scheme

Resources hosted via Gemini are identified using URIs with the scheme
"gemini". This scheme is syntactically compatible with the generic URI
syntax defined in RFC 3986, but does not support all components of the
generic syntax. In particular, the authority component is allowed and
required, but its userinfo subcomponent is NOT allowed. The host
subcomponent is required. The port subcomponent is optional, with a
default value of 1965. The path, query and fragment components are
allowed and have no special meanings beyond those defined by the generic
syntax. Spaces in gemini URIs should be encoded as %20, not +.
========================================================================

Please note the text about fragment components being allowed. I'm not
currently aware of any good uses for them in Gemini, but the spec
supports them, so I've included that support in my server.

>> 5. The server parses the URI into scheme, host, port, path, query, and
>>    fragment components and then percent-decodes the path, query, and
>>    fragment strings.
>
> Consequently, the server will not get a fragment string.  There would be no
> need for fragment strings if they were understood on the server side;
> they'd just be part of the path.

See above.

>>  6. The parsed and decoded URI information can then either be used to
>>     perform a file retrieval, generate a directory listing, or run a
>>     CGI script, ultimately sending back a valid Gemini response to
>>     the client. Redirect responses should make sure to percent-encode
>>     the path, query, and fragment components of the redirected URI.
>>
>
> Except not the fragment.

Again, see above.

Yada yada...spec compliance...yada yada.

Happy hacking,
  Gary

-- 
GPG Key ID: 7BC158ED
Use `gpg --search-keys lambdatronic' to find me
Protect yourself from surveillance: https://emailselfdefense.fsf.org
=======================================================================
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Why is HTML email a security nightmare? See https://useplaintext.email/

Please avoid sending me MS-Office attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

---

Previous in thread (47 of 62): 🗣️ A. E. Spencer-Reed (easrng (a) gmail.com)

Next in thread (49 of 62): 🗣️ colecmac (a) protonmail.com (colecmac (a) protonmail.com)

View entire thread.