💾 Archived View for rawtext.club › ~sloum › geminilist › 001034.gmi captured on 2020-09-24 at 02:09:59. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

Query Strings

Sean Conner sean at conman.org

Sun May 24 22:28:58 BST 2020

- - - - - - - - - - - - - - - - - - - 

It was thus said that the Great Brian Evans once stated:

Greetings,
I got a bug report recently for Bombadillo about how I have been handling
query strings.

[ snip ]

I think it would be good to clearly state what is expected of clients and
servers regarding the escaping of querystring values for gemini.

There are three standards conflating here. They are:

[CGI] RFC-3875 [URI] RFC-3986 [WEBFORM] https://www.w3.org/TR/html401/interact/forms.html

I'm going to try to do a summary here (if anyone is interested in the gorydetails, check the docs listed above). To encode a URL (per [URI]), thefollowing characters can be used AS IS:

ALPHA DIGIT - . _ ~

and the following characters MUST always be encoded [1]:

% < > [ ] { } | \ ^ SPACE CONTROL NON-ASCII

The set of characters not included in this depend upon where in the URL isappears (more on that below).

Encoding a character means converting it to its hex value and preceedingit with a '%':

##% -

%23%23%25

Each section of a URL (scheme, authority [2], path, query, fragment)allows certain characters that would otherwise be encoded to NOT be encoded. I'll concentrate on the query portion since that's the part under question. The query portion allows the following characters to appear non-encoded:

ALPHA DIGIT - . _ ~ / ? : @

The '=' and '&' are used as sub-delimeters (to separate name and value,and to separate namevalue pairs). If a '=' or '&' appear in a name or thevalue, they have to be encoded.

The '+' sign is listed as a sub-delimeter in [URI], but otherwise saysnothing about it. [CGI] and [WEBFORM] define it differently. [CGI] allowsit, but *only* if '=' and '&' aren't used (section 4.4):

...?one+two+three '+' ALLOWED ...?one+two=3&three=3 '+' DISALLOWED

And in this case, the '+' is to be treated as a space. In any other case,the space needs to be encoded:

...?query=what%20is%20this%20madness&lang=en DEFINED ...?query=what+is+this+madness&lang=en UNDEFINED

[WEBFORM] defines the '+' to be a space, but only when the data is beingsent as part of a POST, and the content type is"application/x-www-form-urlencoded". This doesn't apply at all to Gemini.

Now, it could be that there are webservers (or CGI scripts) that convert'+' to spaces reguardless. I'm just saying ...

Hopefully, this clears it all up (said as he wipes the mud off his face).

-spc (Don't hesitate to ask any questions ... )

[1] You'd be hard pressed to see these listed in [URI] since they aren't listed! RFC-1738 lists those characters explicitly, so that's four references. Sorry.

[2] [URI] calls the host portion "authority".