💾 Archived View for gemini.complete.org › compressing-filespooler-jobs captured on 2024-08-18 at 17:46:30. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-09)
-=-=-=-=-=-=-
Filespooler[1] has a powerful concept called a *decoder*. A decoder is a special command that any Filespooler command that reads a queue needs to use to decode the files within the queue. This concept is a generic one that can support compression, encryption, cryptographic authentication, and so forth.
Here I will introduce it as a concept for supporting compression with gzip. This page also functions as a tutorial for encoders and decoders. If you aren't already familiar with Filespooler, you should probably read the tutorial at Using Filespooler over Syncthing[2] before proceeding.
2: /using-filespooler-over-syncthing/
These are some useful Filespooler properties that will play out as we work through this discussion:
1. `fspl queue-write` does not inspect the data stream in any way, and doesn't care what's in it.
2. `fspl prepare` dumps its packet to stdout with the expectation that it is piped to some other command.
3. Because of 1 and 2, you can insert something in the pipeline between `prepare` and `queue-write`.
4. All commands that process a Filespooler queue accept a `-d DECODECMD` parameter that lets you give a command to decode packets. This decode command would probably un-do whatever the commands you inserted in the pipeline in step 3 did.
We're going to mimic some of the examples in the Syncthing tutorial, this time with compression.
First, we create a queue, just as we did there:
sender$ fspl queue-init -q ~/sync/gzqueue
Now, we'll add a request:
sender$ echo Hi | fspl prepare -s ~/gzseq -i - | gzip | fspl queue-write -q ~/sync/gzqueue
This is the same command as before, just with the addition of `gzip` in the pipeline. The difference is that now the file in the jobs directory is compressed with gzip. Let's take a look:
receiver$ fspl queue-ls -d zcat -q ~/sync/gzqueue ID creation timestamp filename 1 2022-05-16T20:29:32-05:00 fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
Ah ha, there it is. We can get info about it too:
receiver$ fspl queue-info -d zcat -q ~/sync/gzqueue -j 1 FSPL_SEQ=1 FSPL_CTIME_SECS=1652940172 FSPL_CTIME_NANOS=94106744 FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00 FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/gzqueue FSPL_JOB_FULLPATH=/home/jgoerzen/sync/gzqueue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
Let's take a look at what's happening under the hood when we run one of these commands:
receiver$ fspl --log-level trace queue-ls -d zcat -q ~/sync/gzqueue TRACE fspl: Parsed options are Cli { globalopts: GlobalOpts { log_level: Level(Trace) }, command: QueueLs(QueueOptsWithDecoder { qopts: QueueOpts { queuedir: "/home/jgoerzen/sync/gzqueue" }, decoder: Some("zcat") }) } DEBUG filespooler::jobqueue: Reading header from "/home/jgoerzen/sync/gzqueue/jobs/fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl" DEBUG with_decoder{decoder="zcat"}: filespooler::jobqueue: Preparing to invoke decoder: "/bin/bash" ["-c", "zcat"] DEBUG with_decoder{decoder="zcat"}: filespooler::jobqueue: Decoder PID 4037302 started successfully TRACE filespooler::jobqueue: Killing decoder TRACE filespooler::jobqueue: Waiting for decoder to terminate TRACE filespooler::jobqueue: Decoder termination status Ok(ExitStatus(ExitStatus(0))) ID creation timestamp filename 1 2022-05-18T07:54:02-05:00 fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl
Note that here, unlike with `fspl queue-process`, the decoder is a command that is interpreted by the shell, so you can actually set up a decoder pipeline. Filespooler invoked zcat and piped the content of the packet to it. In this case, it only needed to read the header, so once it has read the header, it kills the decoder to prevent it from wasting cycles needlessly processing large payloads.
If you had multiple files in the queue, you'd see Filespooler invoke zcat for each one, in precisely this manner, since `queue-ls` needs to read the header from each.
If you forget to include the `-d` for a command line, it will be as if the file doesn't exist to Filespooler. This does not cause an error exit; generally people don't want the mere presence of invalid data to prevent the proper working of the queue. However, with debugging turned on, you can see what happens:
receiver$ fspl --log-level debug queue-ls -q ~/sync/gzqueue DEBUG filespooler::jobqueue: Reading header from "/home/jgoerzen/sync/gzqueue/jobs/fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl" DEBUG filespooler::jobfile: Error reading FSPrefix: Input doesn't appear to be a filespooler file ID creation timestamp filename
Technically what happens is Filespooler attemps to read the first few bytes of the file, and detects that it doesn't contain a Filespooler header (of course; it has a gzip header!). So it skips processing the rest of the file.
Every queue operation works exactly like normal - you just have to always supply the `-d`. `fspl queue-process -d zcat -q queuedir` will process a queue, and so forth.
Commands such as `fspl stdin-info` read a packet in stdin. They don't have a `-d` option because you could just as well pipe the decoded data to them. For instance:
$ cat queuefile | zcat | fspl stdin-info
--------------------------------------------------------------------------------
3: /introduction-to-filespooler/
It seems that lately I've written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the *why* of it all.
4: /encrypting-filespooler-jobs-with-gpg/
Thanks to Filespooler[5]'s support for decoders, data for filespooler can be Encrypted[6] at rest and only decrypted when Filespooler needs to scan or process a queue.
7: /using-filespooler-over-syncthing/
Filespooler[8] is a way to execute commands in strict order on a remote machine, and its communication method is by files. This is a perfect mix for Syncthing[9] (and others, but this page is about Filespooler and Syncthing).
Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing[11], NNCP[12], ssh, UUCP[13], USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler.
Filespooler is particularly suited to distributed and Asynchronous Communication[14].
14: /asynchronous-communication/
(c) 2022-2024 John Goerzen