Linux: The Derpening
I recently heard that I had not complained about Linux enough!
- A java process that would crash, but only when it was started from cron and not from somewhere else. Of course "system validation" had been done, which meant the process had to be started from cron, if launched somehow else they'd have to re-validate the system. A kernel patch would also require re-validation! (linux + politics == derp ** N) The kernel is maybe in better shape, these days?
- The Linux OOM killer (picture of a drunk deputy with a shotgun) used to nuke sshd. Learned about that at four in the morning when sshd was killed because some cron jobs ran, needed memory, and boom! monitoring is up about sshd being down. Whoops, tee-hee, our bad, better whitelist that sshd by default. (The scientists had figured they could use "all the memory" on the system, and use it they did!)
- Folks wondered why htop crashed a lot on OpenBSD. Probably the htop devs were not checking their malloc calls (or other such slop that OpenBSD will murderize your process for) because linux is "sure, like, y'know, whatever" when it comes allocs right up until it's paperclip unimpacting reset button time, again. Unbent paperclips were hung on all the rack doors.
- vixie-cron used to sometimes vanish from the process table. Oh, what jobs? This was back in the day when I eventually had CFEngine checking to see if cron needed to be started, and a cron job running to see if CFEngine needed to be run, and more things happening over SSH, and other things monitoring all those other things. Duct tape and bailing wire, all the way down. It ran. Mostly.
- At a load of 5,000 a dualproc linux system was writing bits of output into various random file descriptors. Were the Very Important Payment Batch Files being checksummed? Nope. (It might have been a Java + Linux + C++ hairball that they had had the developers on-call for (to the point of renting a condo across the way so the on-call could pajama over) and all those developers had ended up quitting for who knows what reasons. Life is pretty strange.)
- Sysv init would sometimes randomly wedge at shutdown. And so would systemd! Paperclip time! Yay!
- gnome-shell in RedHat Enterprise Linux 7 leaked memory, therefore a cron job that looked for such that had grown "too large" and killed them. RHEL7 had something of a "this seems not actually ready for production?" vibe to it. (I started on RedHat somewhere back in the 1990s, so am not unfamiliar with the line.) Also, why does a terminal need a JavaScript engine?
- also sssd in RHEL7 would, sometimes, forget LDAP group members: shoot sssd in head, nuke cache, wash, rinse, repeat. Like an old laundry machine you need to hit with a broom now and then, but this was new Enterprise grade code, not some old clunker?
- Selinux... I spent a good many a year watching the critical exploits roll in like the tide and wondering where selinux was--helping the attacker bounce off of X to gain root, in one case--while also wondering why selinux would so often hinder legitimate work. Poor defaults and too much complexity is something of a theme.
After reading some internet reviews, the general conclusion is that the ext4 filesystem brings some performance improvements over ext3 but it has also dangerous default options.
https://www.pointsoftware.ch/2014/02/05/linux-filesystems-part-4-ext4-vs-ext3-and-why-delayed-allocation-is-bad/
- NetworkManager for RedHat Enterprise Linux with a server system install would stop DHCP client requests after like three tries. This was too short an interval for a network switch to get itself online after the power came back. Extended outage because you gave up at networking? Can do! (I rebooted them all, maybe swore a bunch, changed the dhcp config across the fleet--no, really, keep trying! you can do it!--and set a lot of servers back to using static IPs. DHCP is cute and all, but if the DNS and whatnot are all down because NetworkManager defaults to "herp, derp, ima laptop" mode... no, I hadn't set them to use DHCP in the first place.)
- chattr -i /etc/resolv.conf will prevent a clown show of services from randomly breaking it. I ran the DNS infrastructure so had a pretty good idea of what the contents of that file should be as had been carefully set by Ansible, but as yet more services were caught fiddling with it, and there were well more than 40 hours of work to do in any given week, so, hey, have an immutable bit. Next issue!
- Why were there 20+ syslog messages for each and every mailman cron job, systemd? Did you even think about the log noise, and why that might be bad to send to syslog by default? No?
- Operating systems have a maintenance cost to them, X people working Y hours to learn and produce result Z. RedHat Linux is rather more towards the Windows end of the spectrum, while OpenBSD tends towards the other. Perhaps there is too much code and too much complexity? But there is no money in flensing a fatburger... rather the opposite.
- RedHat took to putting blue on black, which is unreadable to my eyes--not just annoying like colors usually are. Their vim defaults also stray from upstream; why not put those random customizations into an optional package? Eventually (I'm not the fastest learner) I took to compiling my own vim.
- Documentation... it could be good but too often it's xkcd://979
- The severe split-brain between userland and kernel makes for "uh, well, we can't, like, ever fix that, so we'll just invent some new tool that's probably buggy and/or undocumented." Have fun!
Let's call it the "Cascade of Attention-Deficit Teenagers" model, or "CADT" for short. It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?
https://www.jwz.org/doc/cadt.html
Speaking of gnome, the GUI options on RHEL were... not good. A poor copy of some random Windows or a tablet (??) interface (this applies broadly to unix). I think XFCE at least was A) actually available in some RPM repo that didn't cause too many package conflicts and B) not too terrible? Didn't use that system much. Give me a FVWM or a CWM with some xterms and I'm good, or maybe iTerm.app after I've turned off the noise Apple ships with. I don't need a basket of undebuggable desktop processes with their buckets o' security vulns (to say less of spywared and adspamd) and certainly no cutesy animations so the CPU can have a li'l whirl.
Crashreporter got itself purged because the systems shat core files all day every day. Maybe it was the Very Large (and expensive!) Enterprise Modeling Software doing that? But again, there's only so many working hours in the day, all the while on-call murders your sleep away!
So yeah linux is very much in the "pay me lots of money and I would try to support it" category. It's definitely not in "useful" or "fun" or "interesting" like it was back in the 90s. Too much time spent on the L5 detention block AA-23 support service systems, bicycling through Pioneer Square at two AM to get there for the trouble... of course a sysadmin will be much exposed to the worst that operating systems have to offer, so may develop markedly different views from that of a developer or someone in marketing.
/blog/2023/02/10/unix-agitprop.gmi
https://blog.farhan.codes/2018/06/25/linux-maintains-bugs-the-real-reason-ifconfig-on-linux-is-deprecated/
tags #linux #oldsysadminyellsatcloud
P.S. I probably should learn how to drive one of these years, if you're wondering why someone would ever bicycle in certain places at certain times. But keeping my transportation budget below $100 a year has been something of a hobby for a while now.
bphflog links
bphflog index
next: OpenBSD vi Backwards Search Bug