💾 Archived View for thrig.me › blog › 2024 › 01 › 05 › crontab-footguns.gmi captured on 2024-05-26 at 15:12:46. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-02-05)
-=-=-=-=-=-=-
That's right, one for each foot! Or at least one for each foot, as you may find more, and anyways I don't know how many feet you have.
*/5 * * * * date +%Y >> /tmp/year
Ignore the terrible /tmp file handling (that's what mktemp(1) and such are for) and also ignore why one might want to append the current year to a file every five minutes. Because this code is actually broken, and does not write anything to /tmp/year. What it actually does is run "date +" and to send that output to, probably, email. These sorts of details are available in the fine manual, crontab(5) in particular, but folks may have never enountered (or have forgotten about) what "%" means in a crontab file.
A generally better approach is to avoid putting random shell code into the crontab file, and instead only list a command to run. This has the advantage of keeping various special characters out of the crontab file (unless you put "%" into filenames, which is probably a bad idea), and also gives you an external script that can be easily run and tested outside of cron. Hide them under ~/libexec or something like that if you don't want them to appear in PATH.
The other way is to escape all the "%" suitably so that "Y >> /tmp/year" is not passed as standard input to the program, but you have to remember to do that, and it can be hard to debug a shell script that's encoded up in some random crontab file instead of being in a nice file.
The command field (the rest of the line) is the command to be run. The entire command portion of the line, up to a newline or % character, will be executed by /bin/sh or by the shell specified in the SHELL variable of the crontab. Percent signs (`%') in the command, unless escaped with a backslash (`\'), will be changed into newline characters, and all data after the first `%' will be sent to the command as standard input.
It may be useful to have a script to debug the run-time environment of something run from somewhere, be it a cron job or SSH command or whatever. And here an instance of that script is!
#!/bin/sh # yak - diagnostic tool that saves invocation details to a temporary # file. another option is to instead use tools like strace or sysdig myname=`basename "$0"` umask 077 yakout=`mktemp "/tmp/yak.$myname.XXXXXXXXXX"` || exit 1 ( printf "$0 %s\n\npid=$\n" "$*" id ; groups ; printf '\n' tty ; printf '\n' env ; printf '\n' if [ -t 0 ]; then printf 'istty\n' else cat fi ) >> "$yakout"
Then with a "*/5 * * * * * /path/to/yak" one can inspect the resulting temporary files and see such things as the environment and whatnot. This may be especially useful if you lack the source code to whatever it is you are trying to work with, and have no idea how the plugins or whatever are being run.
$ crontab -l | grep yak 50 23 04 01 * yak % what does this do?? $ grep what /tmp/yak.* what does this do??
Testing that the documentation is correct is another good thing to try out now and then.
Commands are executed by cron(8) when the minute, hour, and month fields match the current time, and when at least one of the two day fields (day-of-month or day-of-week), match the current time.
So the fields use "and", except for day-of-month and day-of-week that instead get the "or" treatment. Tricky! This can also be hard to test for or notice, as it may take a while for a job to run, or folks may not notice soon enough that the job is running more often than one might expect. The usual solution if you need something complicated is to use a different scheduling system, or to run the script more often than usual and then the script checks whether or not it needs to do any work.
P.S. An idea for a script is some code that generates a cron job that will run "soon", where "soon" is enough time for you to get and save it into a crontab. Or similar for other scheduling systems. Removing these test jobs is left as an exercise to the reader.
P.P.S. Why these features exist may require historical research, probably what was Paul Vixie thinking back in 1987 or whenever and however the feature originated. The "why" details are often not documented in the code or anywhere. There are some interesting comments about pipe buffers potentially filling up and then blocking the parent, which might be good to know about in other contexts.
P.P.P.S. Also do not schedule cron jobs during one of those missing or doubled hours due to the clock being randomly jerked around to "save" on daylight. Additional logic in the script to lock or not run too often or so forth might also be good, depending on how paranoid you are and how bad the consequences are if, say, a payments batch gets run too many times or not at all. I guess that's a third footgun.