<-- back to the mailing list

[spec] The Tragedy of &

Gary Johnson lambdatronic at disroot.org

Sun Jan 31 18:07:51 GMT 2021

- - - - - - - - - - - - - - - - - - - 

Sean Conner <sean at conman.org> writes:

Not if the CGI interface is properly written. All I had to do was write
this CGI script and drop it into my tests directory [1]:
gemini://gemini.conman.org/test/pathseg.cgi
It uses only three of the RFC-3875 defined variables, QUERY_STRING,
SCRIPT_NAME and PATH_INFO to do all the work. The script will ask for three
fields and then present a final page with all three fields. But the script
will only work if all three variables are defined per RFC-3975 (PATH_INFO is
the tricky one).
Yes, it's a bit ugly and yes, it's a second class citizen and yes, it
requires a proper CGI module to work, but it can be done without the
configuration you think it does. The script just simply appends each input
field as the path, so if you enter
and a one
and a two
skidoosh
as the values, the final URL will be:
gemini://gemini.conman.org/test/pathseg.cgi/and%20a%20one/and%20a%20two/skidoosh
Yes, I could have done a bit more processing, naming each segment:
/test/pathseg.cgi/name=and%20a%20one/age=and%20a%20two/action=skidoosh
but I was lazy and wanted to just do a proof-of-concept here.

Thanks for sharing some code, Sean. I, of course, realize that one couldwrite a CGI script to pick apart the PATH_INFO for user inputs. Thisissue I raised in my message was that this doesn't make any sense in thecontext of a CGI script which is looked up using the path on the remotefilesystem.

In your example, your script is located at /test/pathseg.cgi. However,lacking side information, I see no indicator (outside of the --admittedly optional -- cgi extension on your file name) of which pathsegments should be considered part of the CGI filename lookup and whichparts are meant to be user input data in your example link:

/test/pathseg.cgi/name=and%20a%20one/age=and%20a%20two/action=skidoosh

This feels like a massive hack to me and an abuse of path segments TBH.

If I were to embrace this approach, I can see that I would have toreprogram my server to do some additional path preprocessing magic. Icould either:

1. Check every sequence of path segments starting from the document root to see if any of them correspond to an executable file or have the blessed CGI file extension for my server.

2. (To use Chris Babcock's suggestion), check every sequence of path segments starting from the document root to see if any of them correspond to a directory containing an index.cgi file.

3. Include an input parameter to my server (on the command line or in a config file) that specifies a particular mapping between path segments and CGI scripts on my filesystem. That is, I would be defining a routing table at server start time. This approach has the unfortunate side effect of preventing users on a pubnix from installing CGI scripts in their ~/public_gemini capsules without getting the server administrator to update the global routing table on their behalf. Alternatively, it would require each ~/public_gemini capsule to include a routing table config file within it if it wanted to support CGI scripts, and these would have to be read and parsed by the server both at server start time and/or on a periodic interval or event-based basis in order to support new user scripts as they are added without having to restart the server.

Once one of these 3 approaches enables the server to successfully detectthat a particular path corresponds to a CGI script that is not actuallylocated where that path is pointing, then the server would need toexecute that script with PATH_INFO bound to the entire path. Everyinstalled CGI script would then be responsible for manually removingSCRIPT_NAME from PATH_INFO and splitting it up to get the user inputs,which puts an additional burden on CGI developers.

So I've now heard from multiple folks that we should all just get onwith these path segment hacks and accept that as the best we can do inGemini.

While I can see that it's technically possible (though arguable ugly) todo so, I suppose my question is:

"What exactly does Gemini lose by allowing chained query parameters?(with &)"

I can't for the life of me see any downside. It should literally be oneline of code changed in your favorite Gemini client. Just append inputsto the query string if one already exists rather than replacing thequery string outright.

I believe John Cowan is right that "include" is too vague a word inSolderpunk's current specification for the 10 INPUT field. Bothappending and replacing are forms of inclusion, so any Gemini clientauthor who chooses to append shouldn't be in violation of the spec as itis currently worded.

And changing that one line in your client could save every CGI scriptwriter in Geminispace a lot of additional work (as clearly demonstratedby the examples shared by Katarina, Chris, and Sean).

Seems like a very positive return on investment for a small change.

What am I missing here, folks?

Any chance of weighing in here, Solderpunk?

With best intentions, Gary

-- GPG Key ID: 7BC158EDUse `gpg --search-keys lambdatronic' to find meProtect yourself from surveillance: https://emailselfdefense.fsf.org=======================================================================() ascii ribbon campaign - against html e-mail/\ www.asciiribbon.org - against proprietary attachments

Why is HTML email a security nightmare? See https://useplaintext.email/

Please avoid sending me MS-Office attachments.See http://www.gnu.org/philosophy/no-word-attachments.html