Distributed as in SLURM? or Distributed as in some P2P scheduler?

IMO nothing beats a centralised service, just due to how easy it is to manage. Yes it runs hot and is always overloaded, but you don't have to manage the nightmare of streaming resources over a spotty network or running from machine to machine trying to figure out why a job didn't take

Write a reply

Replies

~madqubit wrote (thread):

Much like SLURM. We use Broadcom (CA) workload directors. One server that acts as the scheduler then all of the workloads get pushed to the servers. Mainly Oracle’s suite of computation, Informatica, Dataserv. Etc. we have a lot of internal data processing as well but they’re really trying to get off of it and shoehorn into something else. It’s quite sad really. The in-house tools are so reliable and beautifully done on the backend. The front end… well. They were definitely made by a backend dev. It isn’t really pretty to look at but dang it everything is there and it makes logical sense on where everything is.

Spotty networks are the bane of my existence. Oh? This super important job failed because of a ping spike? Let me call an analyst at 3AM and deal with their grumpiness because of a reroute that caused latency to go up for a few seconds.

Only time something fails on the mainframe is if the analyst broke the code, forgot to update the JCL, or a resource like IMS is down for maintenance and they forgot to hold the job.