💾 Archived View for thrig.me › blog › 2023 › 08 › 29 › gmid-and-cgi.gmi captured on 2023-11-04 at 11:34:09. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-09-08)
➡️ Next capture (2023-11-14)
-=-=-=-=-=-=-
gemini://gemini.omarpolo.com/post/gmid-2.0-first-alpha.gmi
gemini://gemini.omarpolo.com/post/cgi-simple-vs-easy.gmi
So the next version of gmid will drop CGI support. This caused a complaint on the #gemini IRC channel. The problem is that CGI support is pretty trivial to add, something like
pipe(fdpair); pid = fork(); if (pid > 0) { // parent close(fdpair[1]); deal_with_output(fdpair[0]); close(fdpair[0]); waitpid(pid, &status, 0); } else { // child dup2(fdpair[1], STDOUT_FILENO); close(fdpair[1]); execv(cgipath, cgiargs); err(1, "execv failed"); }
which omits a bunch of very important error handling, but shows the gist: fork a new process, wire up standard output of that process to a pipe in the parent, exec the CGI program, parent reads the output from that program and presumably copies it off to some client. Simple! What's the problem here?
Each request forks and execs a new process. This means that a large number of requests (due to, say, the slashdot effect) may cause something bad to happen: requests fail, the server becomes unresponsive, on linux the out-of-memory killer may start blasting away somewhat randomly at the process table, etc. In other words, CGI does not scale well, which can result in poor user experience, and some poor sysadmin (or devops or whatever it is they are called these days) may have to go in and manually clean things up when the system goes sideways, which can be expensive.
Ideally the server would need to support limits so that only so many CGI programs could spawn in a given time period, perhaps to have a "whoops!" page when things are too busy, etc. This is more code that will need to live somewhere (in gmid, or scattered across the CGI scripts), and maybe the implementation is no longer so simple as shown above. But spawning is hardly the worst problem of CGI.
Fork can be slow: shell scripts are often too slow to benchmark on account of needing to fork most everything they do. How bad this is depends on how slow the CGI process is to start, and what all else that process loads or forks out to. Energy efficiency and thus probably a compiled language should probably be an end-goal here, though sometimes one will want to throw something together in a prototyping language.
A server such as gmid may often be run in a chroot. With CGI this means that everything the CGI program needs must be present within the chroot. Not much of a problem if the CGI is a static binary (a Go blob, for example) but more of a problem if there are dynamic libraries (and did you keep them up-to-date with the security patches?) or if a late 1980s dynamic language is used, in which case there will need to be the perlphppythonrubytcl interpreter and support files, perlphppythonrubytcl libraries, any system libraries required by any of the previous, various random configuration files, a small but growing reproduction of /etc, and etc. And also a datababase? And are you keeping all that up-to-date with security patches? Not impossible to support, but it may be a lot of work, and thus not so simple.
chroot(8) is not the only security option these days, though CGI may not not mix well with other options. One problem is that the gemini server must be able to execute external programs to support CGI, so the security policy must allow that. What happens if an attacker is able to change the cgipath variable to point to some other program, maybe to something that they have written under the chroot directory that lets them run arbitrary code, escape the chroot, or do who knows what?
execv(cgipath, cgiargs);
gmid instead does not allow fork/exec calls, which will make it more difficult for an attacker to execute some external script or binary:
void sandbox_main_process(void) { if (pledge("stdio rpath wpath cpath inet dns sendfd", NULL) == -1) fatal("pledge"); } void sandbox_server_process(void) { if (pledge("stdio rpath inet unix dns recvfd", NULL) == -1) fatal("pledge"); } void sandbox_crypto_process(void) { if (pledge("stdio recvfd", NULL) == -1) fatal("pledge"); } void sandbox_logger_process(void) { if (pledge("stdio recvfd", NULL) == -1) fatal("pledge"); }
http://man.openbsd.org/man2/pledge.2
If gmid retained support for CGI, this security gain would not be possible (or there would be the complication of a cgiexec_process, and wiring up yet more code to support that), as gmid would have to be able to fork and exec specific programs (and maybe an attacker could still run some other program...), and then those scripts would also need to be locked down with pledge or similar, which generally ranges from "unlikely" to "not at all" for perlphppythonrubytcl scripts. And there are plenty of security bugs that one can write in those languages. I once had to explain to folks in a top computer science department why their code wasn't so good. They were passing user-supplied arguments through a Perl system() call, which basically means arbitrary remote code execution. That was in the late 2010s—maybe security education has gotten better in the meantime?
gmid does have landlock support on linux, though did drop seccomp: "seccomp is not maintainable and leads to some of the most horrible code I ever had to write". Someone who knows landlock can comment on what the benefits of that are, and whether CGI would be supportable there.
CGI has various other problems that I do not know about or have forgotten about. For the web stuff I supported in the previous decade it was all fastcgi: PHP or Mojolicious. (There was also a MoinMoin instance, but that was setup by someone else, and I eventually got rid of it due to the python 2 support gap and not wanting to support too many things in production—someone else had setup a customized mediawiki with a customized postgres backend, and the web guy had his custom stuff plus some very locked down wordpress instances, and the Windows guys had a sharepoint thing, and various users were asking for gitlab and unrestricted wordpress and drupal and ...)
Simplifying things might help, especially where there are multiple things doing more or less the same thing, and where there is not the bandwidth available to support everything under the sun.
FastCGI (which is supported by gmid) or SCGI are typical alternatives to CGI. These can use a socket, which means the fastcgi process can live off in a distinct chroot and have a distinct security policy, or might even live on some other system. FastCGI (or SCGI) is a bit more work to setup than the "run this here program" of CGI, but generally is worth it.
Another option would be to have a fancy gemini server that executes the necessary code within itself for any dynamic bits. This has pros and cons: a big monolith with everything in one place can have benefits, or could be a great place for an attacker to do whatever they want should they break in.
CGI itself can be run over fastcgi (slowcgi/slowcgi-portable/fcgiwrap, etc) if you do want CGI where only fastcgi is supported, or have a CGI script you need to run right now while you work on migrating it to something else.
So CGI is like insulation: CGI is cheap and easy to deploy. But when the weather gets too cold, or too hot, then maybe you should have spent the time and put in some insulation.
tags #cgi #gmid