💾 Archived View for thrig.me › blog › 2023 › 07 › 25 › no-change-friday.gmi captured on 2024-07-09 at 01:07:52. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-11-14)
-=-=-=-=-=-=-
Sysadmins (or whatever it is they are called these days) may push for no changes to production on various days, Friday for example. There is plenty else to do meanwhile. Some developers instead push for all changes all the time. This probably depends where one is on the "change nothing, ever" to "move fast and break things" line and may also depend on who is going to be woken up at two in the morning to do triage (a sysadmin, usually). There will be variance by industry: an experimental lab may differ from a bank as to what regulations apply, when changes can be made in production, etc.
Numerous industry experts had raised concerns about the safety of the vessel. OceanGate executives, including Rush, had not sought certification for Titan, arguing that excessive safety protocols and regulations hindered innovation.
— "Titan submersible implosion". Wikipedia.
Where did No Change Friday come from? Experience. For example, late in June a Canadian Bank made a change that broke something. Due to folks being unusually absent from offices in both Canada and America around that time, the issue only got around to being resolved on July 8th or so. Canada Day, Independence Day, and exactly where those fall with regard to the weekend can make for a pretty long outage. Another story involves the upgrade of the (previously working) mail servers to Slackware right before a big ski trip. This resulted in not working mail services, and someone else trying to resolve the issue.
Exceptions can be made if there is an urgent issue. Ideally this will require approval from the groups involved and sign-off by senior management. This avoids the developers simply lolling something over the fence to ops. The folks involved must available for some time afterwards. Perhaps a Saturday is traded for some other day off. Issues may take time to manifest; one disaster started two weeks after the change. Afterwards, process improvements were made to more quickly surface results of a change. These sorts of improvements should be hashed out in the post-mortem meeting—one of the few meetings I think isn't a total waste of time: what went wrong, and how to be less bad in the future. (That the meeting was only held after something bad happened also helped.)
What should be blocked out as no change time? This will vary. Probably one should avoid changes late in the day or before the weekend or before holidays. Saturday may however be great if those involved have a Saturday through Tuesday or Wednesday schedule, and changes on the weekend make sense. Also those involved with the change must be available, no "ship it and skip". It can be difficult to get Irish developers back from their pub after a mission accomplished.