I am running a CGI word puzzle game [1] on tilde.team. Recently I noticed that in many cases the input is not URL-encoded correctly.
Direct URL test: sending abc@:[]#/abc via the URL. Only @ and : are allowed to be unencoded.
Direct URL test 1 abc@:[]#/abc
Direct URL test 2 abc@:[?]#/abc
Input Field test: upon request, enter the string abc@:[? ]#/abc (note the space inside the braces!).
The ideal result is:
1: |abc@:%5B%5D%23%2Fabc|
2: |abc@:%5B%3F%5D%23%2Fabc| and
3: |abc@:%5B%3F%20%5D%23%2Fabc|
with the possibility of @ and/or : encoded as well. Neither test should fail. There should be no [, ], #, /, ? or spaces left unencoded.
While it is not specified (or is it?) that the client should clean up the URL by encoding the query component, it really should as the alternative is confusion and unexpected results.
In the direct test 1 LaGrange does not touch the URL and replies |abc@:[]|.
In the direct test 2 LaGrange reports |abc@:[?]|.
In the input field case, LaGrange returns QUERY_STRING: |abc@%3A[%3F%20]%23%2Fabc| a bit wrong, as the [ and ] _should_ be encoded.
I know, from seeing the requests, that some clients are even worse, and fail to encode spaces.
Please verify your client with the above CGIs to see what happens. If the client does not encode the special characters (except maybe @ and :), contact the developer and file a bug report. Send me an email and I will post a summary of how the clients are doing. stack at ctrl-c.club
Gemini protocol provides only one way for clients to send information to the server: by attaching it to the URL after a '?' as a query string. Since the entire request is part of the URL, some special characters in the query-string must be 'percent-encoded' in order to avoid confusion. The encoding: a '%' character followed by a two-digit hexidecimal ASCII representation of the special character to be encoded.
The gemini spec [2] 3.2.1 1x(INPUT) paragraph clearly specifies:
Reserved characters used in the user's input must be "percent-encoded" as per RFC3986, and space characters should also be percent-encoded.
Gemini spec also refers to RFC3986 which is the URL/URI specification document. That document specifies that characters : / ? # [ ] @ are reserved and must be percent-encoded, but : and @ may appear unencoded within the path, query, and fragment...
A bit confusing, but it is clear that spaces and / ? # [ ] and @ should be encoded.
There are two possible ways to provide a query string to a server: (1) by directly typing it into the URL field of the browser, the '?' and all; and (2) as an consequence of the server's 1x (INPUT) response, in which case the browser is expected to somehow get the query from the user and reissue the request.
In the first case, the browser really should sterilize the URL prior to making the request. In the second case, the browser must encode the user input prior to attaching it to the URL and making the request.
If I am somehow misunderstanding the spec or reasons behind certain implementations, please let me know (stack at ctrl-c.club)
[1]
[2]