💾 Archived View for dcreager.net › 2023 › 06 › 29-tef-pipelines.gmi captured on 2023-11-14 at 08:06:33. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-22)
-=-=-=-=-=-=-
2023-06-29
tef has written another great post about pipelines, and in particular why (and how!) not to use message queues to implement them:
how (not) to write a pipeline [tef, cohost]
The tl;dr is that your “background job” is really a state machine. Even the simplest possible job:
def do_something(inputs) -> outputs: # do something pass
Is a state machine with 4 possible states:
And a simple pipeline based on a message broker almost certainly does not support:
So bite the bullet and implement this using a proper database table (or equivalent) to track the state of each job!
You might still end up with a queue, but as an optimization, not as a load-bearing part of the design:
the queue buffers the results of a more expensive database query