💾 Archived View for thrig.me › blog › 2024 › 10 › 20 › killnine.gmi captured on 2024-12-17 at 10:42:20. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
The KILL signal is quite possibly the worst signal to send to a shell script, as this may result in a dangling tree of processes. Shell scripts often fork things, so there might be something like (as reported by pstree(1))
| |-+= 16599 jhqdoe -ksh (ksh) | | \-+= 39459 jhqdoe sh tree | | \-+- 07502 jhqdoe sh tree | | \--- 75513 jhqdoe sleep 99
in the process tree when the KILL signal (-9) arrives at the top controlling process, 39459, and then afterwards there may (depending on exactly what those child processes are doing) be a dangling process tree:
|-+- 07502 jhqdoe sh tree | \--- 75513 jhqdoe sleep 99
Note how 7502 has been re-parented on account of 39459 being murdered with KILL; exactly what happens may vary depending on the flavor of unix and how many knobs that unix has.
The KILL signal is thus best reserved as a last resort for when a process is not responding to other, more typical signals (TERM, INT, HUP). Unless you're dealing with something known to be buggy, such as a too complicated web browser that often gets itself tied up in some mess, but exceptional software like that can get a documentation note of "buggy, okay to murder with -9".
TERM INT or HUP type signals will actually give the process being targetted a chance to cleanup and play much better with the job control a shell is likely doing to run some fork tree.
#!/bin/sh ( ls | ( sleep 99 ; ( sleep 99 ; cat ) ) )
Highly contrived mostly to make it easy to stick a -9 signal onto the parent shell process, but it's not too hard for a real shell script to build an even more complicated process tree, and for a bad kill -9 to leave even more orphaned processes lingering in memory.
Also maybe look at the process tree from time to time to see if users or poorly written scripts are leaving stuff stuck in memory, as not everyone has (or will) get the memo that throwing KILL signals around is a bad idea. This may be hard to spot if the process deads and then cleanup by the kernel is happening quickly, so if the source is available look for KILL signals, or maybe try process tracking things (or kernel logging) to see if -9 are being needlessly thrown around.