<-- back to the mailing list

CGI, SCGI and Certificates (was Re: [ANN] Gemini browser for iOS)

Sean Conner sean at conman.org

Thu Jun 11 20:45:58 BST 2020

- - - - - - - - - - - - - - - - - - - 

It was thus said that the Great solderpunk once stated:

On Tue, Jun 09, 2020 at 09:02:24PM -0400, Michael Lazar wrote:
I believe this is using SCRIPT_NAME incorrectly per RFC 3875. The SCRIPT_NAME
should be the part of the URI path that comes before the PATH_INFO [1]. So in
your example:
GEMINI_URL=gemini://lucy.roswell.area51/cgi-bin/beta/foobar?one=1&two=2
SCRIPT_NAME=/cgi-bin/beta
PATH_INFO=/foobar
Is this how cgi-bins are traditionally handled?

Yes.

If a URI paths's prefix
matches the configured cgi-bin path, the standard mapping from URI paths
to the filesystem is interrupted, and the first component of the URI path
*after* the cgi-bin prefix (here `beta`) is the only think looked for on
the disk, with everything else passed along to PATH_INFO? If there is,
for example, a /var/gemini/cgi-bin/beta/ directory on the disk, the
server does not check for an executable named `foobar` in it?

To answer that last question, no.

To explain, let me explain my setup. GLV-1.12556 allows one to usemultiple directories per virtual host for content. I have the followingset up on my development box:

{ path = "^(/cgi%-bin/)(.*)", -- [5] module = "GLV-1.handlers.filesystem", directory = "/home/spc/projects/gemini/non-checkin/cgi-bin", -- ... there are some other directives, not important right now },

{ path = ".*", -- [5] module = "GLV-1.handlers.filesystem", directory = "/home/spc/projects/gemini/non-checkin/lucy.roswell.area51", -- ... more directives ... }

Note that depending upon how things are configured, CGI [1] can be in anydirectory or restricted to a single directory [2]. With GLV-1.12556, anyfile with the 'execute' bit will be treated as a CGI script [3][4]. I justadded a CGI to my main Gemini server:

gemini://gemini.conman.org/test/a-script/foobar?one=1&two=2

the URL is broken up:

location = { host = "gemini.conman.org", port = 1965.000000, path = "/test/a-script/foobar", scheme = "gemini", query = "one=1&two=2", }

the path is matched against each handler's path (in order, first match wins)and the matching one is handed the request. Per the configuration, thismatch result will be:

match = { "/", "test/a-script/foobar", }

The filesystem handler will breakdown the second match element (the firstis considered the "URL filesystem space"---remember, GLV-1.12556 supportsmultiple directories per vhost) and check each segment (for permissions, CGIscript or SCGI script). So the first check is for:

<directory> .. "test"

This is a directory, so it continues, walking down the path. Next ittries:

<directory> .. "test/" .. "a-script"

This is a file with the execute bit set, so it's run. The rest of thematch is used to construct the PATH_INFO

PATH_INFO="/foobar"

and PATH_TRANSLATED

PATH_TRANSLATED=<directory> .. "/foobar"

This does not imply that such a directory exists. If there is no more tothe path (say, the request was to "/test/a-script") then the PATH_INFO andPATH_TRASLATED would not be set.

A Gemini server doesn't have to do what I do. It is certainly in linewith Apache to require CGI scripts to have a particular extension, look forsaid extension and handle things that way without having to walk down thefilesystem checking each component. So hypothetically speaking, a requestlike:

gemini://example.net/foo/bar.cgi

the server can scan for ".cgi", find it, know it's going to execute a CGIscript, but there is nothing more of the URL path, so not set PATH_INFO andPATH_TRANSLATE. But for this:

gemini://example.net/foo/bar.cgi/baz

find the .cgi extension, extract the path up through the extension("/foo/bar.cgi") and because there's more, set up PATH_INFO andPATH_TRANSLATE. There's another message on this list where I give a reallife example where I use PATH_INFO and PATH_TRANSLATED here:

https://lists.orbitalfox.eu/archives/gemini/2020/001485.html

Semi-related: when the server forks off the CGI process, is it
conventional to set that process' working directory to the CGI bin?

It would be conventional to set the working directory to the maindirectory for the host. In my case, given that a host can have multipledirectories, I set the working directory to the handler's directory setting. That value is also set in GEMINI_DOCUMENT_ROOT.

-spc

[1] And SCGI, I support this as well.

[2] That's why I have 'cgi-bin'---to test that configuration.

[3] I didn't bother with extensions for this. I felt that checking for the 'execute' bit was more elegant than just an extension. Also, if CGI has been disabled (server wide, host or directory---the configuration is very fine grained) then I return an error to the client.

[4] There's another method for SCGI.

[5] This is a Lua style regex. The patterns in () are groupings and the filesystem handler wants two groups---the first is the leading portion in URL space that doesn't map to a file system, the second is the portion that does map to a filesystem. The original syntax for this only required one match and I kept that---in that case, the match is redone slightly so that the leading '/' from the URL portion is the first match, then the rest. So the '.*' pattern (which is basically "match all") becomes the pattern "^(/)(.*)". This is an implementation detail of GLV-1.12556, but I thought I should mention it.