💾 Archived View for gemini.complete.org › parallel-processing-of-filespooler-queues captured on 2024-12-17 at 10:00:48. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-09)
-=-=-=-=-=-=-
Filespooler[1] is designed around careful sequential processing of jobs. It doesn't have native support for parallel processing; those tasks may be best left to the queue managers that specialize in them. However, there are some strategies you can consider to achieve something of this effect even in Filespooler.
Because Filespooler queues are so lightweight, you can easily create dozens (or thousands, whatever). You could simply have your creator system rotate through writing new jobs to each one in turn, and then kick off queue processors for each one.
This doesn't have a great deal of elegance, but it could get the job done.
Let's say you have two queue-processing processes. Here's what you can do:
Let's consider how that might work.
First, create some queues:
fspl queue-init -q ~/incoming fspl queue-init -q ~/proc1 fspl queue-init -q ~/proc2
Now, we'll have a processing script that we'll use to move things out of the incoming queue. Call in incomingproc.sh and let's assume it takes the path to a destination queue as the first parameter, `$1`:
#!/usr/bin/env bash set -euo pipefail ln "$FSPL_JOB_FULLPATH" "$1/jobs/$FSPL_JOB_FILENAME" cat > /dev/null
Now, here's a script that we might use on `proc1`:
#!/usr/bin/env bash set -euo pipefail QPATH=~/proc1 if ! fspl queue-ls -q "$QPATH" | grep -q fspl- ; then fspl queue-process -q ~/incoming ~/incomingproc.sh -- "$QPATH" fi fspl queue-process --order-by=Timestamp -q "$QPATH" command_goes_here
Let's analyze how this works:
1. In proc1, we first check to see if the proc1 queue is empty. If it is, we try to get a job to add to it.
2. To do that, we process the incoming queue using the incomingproc.sh script.
3. incomingproc.sh uses the environment variables that `queue-process` sets (see the Filespooler Reference[2] for details) to actually cause the act of processing the job in the incoming queue to add it to the proc1 queue. It simply hardlinks it into there, which is one of the safe methods of adding a job to a queue (see Guidelines for Writing To Filespooler Queues Without Using Filespooler[3]). Then it discards the payload for now (so that `fspl queue-process` doesn't get errors writing it). As it exits with success, `fspl queue-process` will (by default) go ahead and delete the job from the incoming queue - but now it will live on in proc1.
4. Now we process the target queue like normal.
3: /guidelines-for-writing-to-filespooler-queues-without-using-filespooler/
Notice the use of Timestamp ordering instead of sequence ordering. Since we are pulling jobs from the incoming queue into various processors, the sequence number in any given processor will not be contiguous. That implies a lack of strict ordering of queue processing -- but then parallel processing carries that implication anyhow.
4: /many-to-one-with-filespooler/
--------------------------------------------------------------------------------
5: /feeding-filespooler-queues-from-other-queues/
Sometimes with Filespooler[6], you may wish for your queue processing to effectively re-queue your jobs into other queues. Examples may be:
Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing[8], NNCP[9], ssh, UUCP[10], USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler.
Filespooler is particularly suited to distributed and Asynchronous Communication[11].
11: /asynchronous-communication/
(c) 2022-2024 John Goerzen