💾 Archived View for adamthiede.com › log › 2023-10-13.gmi captured on 2024-09-29 at 00:26:18. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-12-28)

-=-=-=-=-=-=-

SRE and SysAdmin Incentives

Read someone else's blog recently:

Maybe I in particular am just a bad candidate for oncall work, maybe this isn't actually a problem for people that aren't neurodivergent, but I really think that we have normalized waking people up in the middle of the night to follow a playbook that ChatGPT (or even a bunch of shitty if statements) can do better than any of us. I really think that the best course of action is to fundamentally change the incentives at play so that maintenance is rewarded more than new feature work, but I don't know how to do that.

and

The problem with maintenance work (and SRE [Site Reliability Engineering] in general) is that success is a negative. Things don't go wrong. People don't have issues. Under that lens of analysis, it's very easy to understand why it doesn't get people rewarded. It's also easy to understand why people sometimes deliberately design systems to fail so that they can get rewarded for fixing them.

source

This cuts deep for me. It would be awesome to work in an industry where all-nighters and back-breaking weekend shifts weren't "hero moments" but embarassments. Fixing problems forever is awesome.

Actual Example of SRE

That story is an actual example of a reliable site. We should strive to be more like that. Simple, proven, old tech, instead of shiny new cloud BS that breaks so often that it ensures we have a job.

back to gemlog