💾 Archived View for dcreager.net › redo › background-processes.gmi captured on 2024-05-12 at 14:52:01. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Background processes in redo

A recent conversation with a coworker reminded me of a ‘make’ replacement named ‘redo’. On a lark I decided to update a couple of my projects to use ‘redo’ — in particular, the repo that builds my personal website from a collection of gemtext files.

redo

While working on new posts, I often like to view them locally before syncing them up to the public-facing web server. Because I use vanity paths and ‘index.html’ files liberally, I can't just load the HTML files directly via ‘file:’ URLs. Instead I spin up a lightweight HTTP server (lighttpd) to serve files from the local output directory contain the built site content. And crucially, I have an additional ‘make serve’ target that spins up the server in the background for me, so that I don't have to remember the magic ‘lighttpd’ incantation to use. This takes advantage of how lighttpd will daemonize itself as a background process by default: the make target's rule invokes ‘lighttpd’, which spins up a background process, but make does not wait for that background process to exit before make itself exits.

lighttpd

When porting that make target over to redo, I discovered that redo handles background jobs differently, in a way that required some additional tweaking. Make is presumably waiting for the _shell process_ that it invokes to exit, but does not wait for any background child processes to exit. (I haven't verified this, but I assume it actually waits for the shell process's entire _process group_ to exit, but lighttpd's daemonization logic would remove the HTTP server process from its parent's process group.)

Redo wasn't doing this—invoking ‘redo serve’ would fire off the background HTTP server as expected, but then redo would block waiting for the server to complete. I was very confused how this could be happening!

Redo seems to do something more complicated than make. It does call ‘waitpid’ for each target's subprocess (but not its process group), but only uses that to clean up the child process entry in its process table. That's not the mechanism that it uses to _wait_ for the process to finish. Instead, redo creates a pipe for each job subprocess that it creates, and uses a ‘select’ to call to wait for the read ends of any subprocess pipe to be readable.

redo job control file descriptors

redo waiting for a target process

Those pipes are never written into, and so they only become “readable” when the write file descriptor is closed. This is similar to a common shutdown notification pattern in Go using channels.

Starting and stopping things with a signal channel

However, Unix pipes add a wrinkle, because file descriptors are inherited by child processes when you ‘fork’. You have to close _all_ copies of the pipe's write file descriptor before the ‘select’ call unblocks anything waiting on the read file descriptor.

And that file descriptor inheritance even carries over to the HTTP server's background process! The server inherits a copy of the write file descriptor, and since ‘lighttpd’ itself doesn't know anything about it, it never closes it explicitly—it gets closed by default when the process exits, just like all file descriptors do. And since redo's ‘select’ call won't unblock until every copy of the write fd is closed, redo ends up waiting for the server process to exit.

To get around this, I had to update the ‘serve.do’ job file to create a subshell where I close all file descriptors other than stdin, stdout, and stderr; and then invoke ‘lighttpd’ from that subshell. (I can't close the file descriptors in the parent shell, since I do still want redo to wait for the parent shell to finish!) That ensures that the background process does not inherit a copy of the write file descriptor, and therefore that redo will not block waiting for it to exit.

Closing file descriptors in bash

Invoke lighttpd from a subshell