💾 Archived View for gemini.complete.org › introduction-to-filespooler captured on 2024-08-31 at 12:08:46. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-09)
-=-=-=-=-=-=-
It seems that lately I've written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the *why* of it all.
My needs arose primarily from handling Backups[1] over Asynchronous Communication[2] methods - in this case, NNCP[3]. When backups contain incrementals that are unpacked on the destination, they must be applied in the correct order.
2: /asynchronous-communication/
In some cases, like ZFS[4], the receiving side will detect an out-of-order backup file and exit with an error. In those cases, processing in random order is acceptable but can be slow if, say, hundreds or thousands of hourly backups have stacked up over a period of time. The same goes for using gitsync-nncp[5] to synchronize git repositories. In both cases, a best effort based on creation date is sufficient to produce a significant performance improvement.
With other cases, such as tar or dar backups, the receiving cannot detect out of order incrementals. In those situations, the incrementals absolutely must be applied with strict ordering. There are many other situations that arise with these needs also. Filespooler[6] is the answer to these.
Before writing my own program, I of course looked at what was out there already. I looked at celeary, gearman, nq, rq, cctools work queue, ts/tsp (task spooler), filequeue, dramatiq, GNU parallel, and so forth.
Unfortunately, none of these met my needs at all. They all tended to have properties like:
Many also lacked some nice-to-haves that I implemented for Filespooler:
Filespooler[9] is a tool in the Unix tradition: that is, do one thing well, and integrate nicely with other tools using the fundamental Unix building blocks of files and pipes. Filespooler itself doesn't provide transport for jobs, but instead is designed to cooperate extremely easily with transports that can be written to as a filesystem or piped to -- which is to say, almost anything of interest.
Filespooler is written in Rust and has an extensive Filespooler Reference[10] as well as many tutorials on its homepage[11]. To give you a few examples, here are some links:
12: /using-filespooler-over-syncthing/
13: /using-filespooler-over-nncp/
14: /compressing-filespooler-jobs/
15: /encrypting-filespooler-jobs-with-gpg/
16: /encrypting-filespooler-jobs-with-age/
17: /guidelines-for-writing-to-filespooler-queues-without-using-filespooler/
Filespooler is intentionally simple:
The name of job files on-disk matches a pattern for identification, but other than the pattern, the filename is not significant; only the header matters.
You can send job data in three ways:
1. By piping it to `fspl prepare`
2. By setting certain environment variables when calling `fspl prepare`
3. By passing additional command-line arguments to `fspl prepare`, which can optionally be passed to the processing command at the receiver.
Data piped in is added to the job "payload", while environment variables and command-line parameters are encoded in the header.
Here I will excerpt part of the Using Filespooler over Syncthing[18] tutorial; consult it for further detail. As a bit of background, Syncthing[19] is a FLOSS decentralized directory synchronization tool akin to Dropbox (but with a much richer feature set in many ways).
18: /using-filespooler-over-syncthing/
First, on the receiver, you create the queue (passing the directory name to `-q`):
sender$ fspl queue-init -q ~/sync/b64queue
Now, we can send a job like this:
sender$ echo Hi | fspl prepare -s ~/b64seq -i - | fspl queue-write -q ~/sync/b64queue
Let's break that down:
* `-s seqfile` gives the path to a *sequence file* used on the sender side. This file has a simple number in it that increments a unique counter for every generated job file. It is matched with the `nextseq` file within the queue to make sure that the receiver processes jobs in the correct order. It MUST be separate from the file that is in the queue and should NOT be placed within the queue. There is no need to sync this file, and it would be ideal to not sync it.
* The `-i` option tells `fspl prepare` to read a file for the packet payload. `-i -` tells it to read stdin for this purpose. So, the payload will consist of three bytes: "Hi\n" (that is, including the terminating newline that `echo` wrote)
* `fspl queue-write` reads stdin and writes it to a file in the queue directory in a safe manner. The file will ultimately match the `fspl-*.fspl` pattern and have a random string in the middle.
At this point, wait a few seconds (or however long it takes) for the queue files to be synced over to the recipient.
On the receiver, we can see if any jobs have arrived yet:
receiver$ fspl queue-ls -q ~/sync/b64queue ID creation timestamp filename 1 2022-05-16T20:29:32-05:00 fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
Let's say we'd like some information about the job. Try this:
receiver$ $ fspl queue-info -q ~/sync/b64queue -j 1 FSPL_SEQ=1 FSPL_CTIME_SECS=1652940172 FSPL_CTIME_NANOS=94106744 FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00 FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/b64queue FSPL_JOB_FULLPATH=/home/jgoerzen/sync/b64queue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
This information is intentionally emitted in a format convenient for parsing.
Now let's run the job!
receiver$ fspl queue-process -q ~/sync/b64queue --allow-job-params base64 SGkK
There are two new parameters here:
* Have environment variables set as we just saw in `queue-info`
* Have the text we previously prepared - "Hi\n" - piped to it
By default, `fspl queue-process` doesn't do anything special with the output; see Handling Filespooler Command Output[20] for details on other options. So, the base64-encoded version of our string is "SGkK". We successfully sent a packet using Syncthing as a transport mechanism!
20: /handling-filespooler-command-output/
At this point, if you do a `fspl queue-ls` again, you'll see the queue is empty. By default, `fspl queue-process` deletes jobs that have been successfully processed.
See the Filespooler homepage[21].
--------------------------------------------------------------------------------
22: /using-filespooler-over-syncthing/
Filespooler[23] is a way to execute commands in strict order on a remote machine, and its communication method is by files. This is a perfect mix for Syncthing[24] (and others, but this page is about Filespooler and Syncthing).
Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing[26], NNCP[27], ssh, UUCP[28], USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler.
Filespooler is particularly suited to distributed and Asynchronous Communication[29].
29: /asynchronous-communication/
(c) 2022-2024 John Goerzen