πŸ’Ύ Archived View for rawtext.club β€Ί ~sloum β€Ί geminilist β€Ί 001040.gmi captured on 2020-09-24 at 02:09:43. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

Query Strings

colecmac at protonmail.com colecmac at protonmail.com

Mon May 25 16:54:32 BST 2020

- - - - - - - - - - - - - - - - - - - 

I think it might just make the most sense to say in the spec thatencoding is required, and should be done with percent signs, forspaces too. Like in Sean's message:

?query=what%20is%20this%20madness&lang=en

makeworld

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐On Sunday, May 24, 2020 5:28 PM, Sean Conner <sean at conman.org> wrote:

It was thus said that the Great Brian Evans once stated:
Greetings,
I got a bug report recently for Bombadillo about how I have been handling
query strings.
[ snip ]
I think it would be good to clearly state what is expected of clients and
servers regarding the escaping of querystring values for gemini.
There are three standards conflating here. They are:
[CGI] RFC-3875
[URI] RFC-3986
[WEBFORM] https://www.w3.org/TR/html401/interact/forms.html
I'm going to try to do a summary here (if anyone is interested in the gory
details, check the docs listed above). To encode a URL (per [URI]), the
following characters can be used AS IS:
ALPHA DIGIT - . _ ~
and the following characters MUST always be encoded [1]:
% < > [ ] { } | \ ^ SPACE CONTROL NON-ASCII
The set of characters not included in this depend upon where in the URL is
appears (more on that below).
Encoding a character means converting it to its hex value and preceeding
it with a '%':
##% -
%23%23%25
Each section of a URL (scheme, authority [2], path, query, fragment)
allows certain characters that would otherwise be encoded to NOT be encoded.
I'll concentrate on the query portion since that's the part under question.
The query portion allows the following characters to appear non-encoded:
ALPHA DIGIT - . _ ~ / ? : @
The '=' and '&' are used as sub-delimeters (to separate name and value,
and to separate namevalue pairs). If a '=' or '&' appear in a name or the
value, they have to be encoded.
The '+' sign is listed as a sub-delimeter in [URI], but otherwise says
nothing about it. [CGI] and [WEBFORM] define it differently. [CGI] allows
it, but only if '=' and '&' aren't used (section 4.4):
...?one+two+three '+' ALLOWED
...?one+two=3&three=3 '+' DISALLOWED
And in this case, the '+' is to be treated as a space. In any other case,
the space needs to be encoded:
...?query=what%20is%20this%20madness&lang=en DEFINED
...?query=what+is+this+madness&lang=en UNDEFINED
[WEBFORM] defines the '+' to be a space, but only when the data is being
sent as part of a POST, and the content type is
"application/x-www-form-urlencoded". This doesn't apply at all to Gemini.
Now, it could be that there are webservers (or CGI scripts) that convert
'+' to spaces reguardless. I'm just saying ...
Hopefully, this clears it all up (said as he wipes the mud off his face).
-spc (Don't hesitate to ask any questions ... )
[1] You'd be hard pressed to see these listed in [URI] since they aren't
listed! RFC-1738 lists those characters explicitly, so that's four
references. Sorry.
[2] [URI] calls the host portion "authority".