Alright. One small gripe about my job.
We are attempting to transition our batch processing and Event Stream Processing from a centralized system (Mainframe) to a Distributed computing system.
There are many flaws with the ideologies behind both, and there are many valid points for both sides of this argument. The big dollar guys can’t seem to stand on their preferred side though. The IBM subscription fees are enormous. The cost to upgrade the Distributed servers is huge every five years. In all reality it’s a capex for the company so they can write them all off. The infuriating part for me is for half of the jobs that take place for our operations. Are scheduled on the mainframe’s ESP to run in the Distributed network.
So if the job fails. There’s no way for the operators to get a log of any sort since it’s all generated “in the cloud” and only the analysts have access to those logs. But half of the time the Analysts are asking for logs and we as OP’s can’t give it to them. Now if we chucked them onto the Distributed ESP (Yes we have two separate ESP schedulers. I know, gross) I could barf out a spool file without issue.
TLDR
if you’re a big dollar manager for a company that is in the same boat.
The operators and analysts don’t care. Just pick a platform.
Distributed as in SLURM? or Distributed as in some P2P scheduler?
IMO nothing beats a centralised service, just due to how easy it is to manage. Yes it runs hot and is always overloaded, but you don't have to manage the nightmare of streaming resources over a spotty network or running from machine to machine trying to figure out why a job didn't take