2018-09-05 Toadfarm and Monit

Every now and then I wake up to a website that’s dead. When I try to restart the Toadfarm using Monit, nothing happens. I connect to the box using `ssh` and try to restart it from the command line:

Toadfarm

Monit

$ ./farm reload
 * Reloading the process farm
Can't create listen socket: Address already in use at /home/alex/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/Mojo/IOLoop.pm line 126.
   ...fail!

So the farm is up and listening to the port but it’s not actually doing anything. Now what? They’re all asleep!

top - 07:24:55 up 150 days, 15:34,  2 users,  load average: 0.00, 0.03, 0.00
Tasks: 124 total,   2 running, 122 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.7 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem :  3084768 total,   336608 free,   737820 used,  2010340 buff/cache
KiB Swap:  1022972 total,       16 free,  1022956 used.  2119480 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
16448 alex      20   0  331468 125888   3760 S  0.0  4.1   0:00.33 /home/alex/farm
16563 alex      20   0  378652 104660   3668 S  0.0  3.4   0:00.38 /home/alex/farm
 6681 alex      20   0  358828  92952   3940 S  0.0  3.0   0:00.89 /home/alex/farm
  709 alex      20   0  347792  84888   5484 S  0.0  2.8   0:01.06 /home/alex/farm
 6730 alex      20   0  342380  74416   3940 S  0.0  2.4   0:01.13 /home/alex/farm
  710 alex      20   0  335376  65868   5424 S  0.0  2.1   0:00.99 /home/alex/farm
22714 alex      30  10  118320  45080   7772 S  0.0  1.5   0:00.53 perl
18640 alex      20   0   85272  25896   6612 S  0.0  0.8   0:00.43 perl
 9579 alex      20   0  343312  25568   3848 S  0.0  0.8   0:00.45 /home/alex/farm
13534 alex      20   0  331080  25548   3848 S  0.0  0.8   0:00.09 /home/alex/farm
 8731 alex      20   0  362272  25520   3848 S  0.0  0.8   0:00.10 /home/alex/farm
10399 alex      20   0  330736  25500   3812 S  0.0  0.8   0:00.35 /home/alex/farm
31772 alex      20   0  327720  25468   3848 S  0.0  0.8   0:01.46 /home/alex/farm
 1931 alex      20   0  328808  25396   3848 S  0.0  0.8   0:00.19 /home/alex/farm
  974 alex      20   0  329248  25380   3848 S  0.0  0.8   0:00.27 /home/alex/farm

And so I’m doing what I usually do, without actually solving the underlying problem:

for pid in $(ps -u alex | grep /farm | cut -c 1-5); do kill $pid; done

Do you have a better idea? What’s going on, here?

​#Toadfarm ​#Monit ​#Administration

Comments

(Please contact me if you want to remove your comment.)

Gah. Had to kill it all, again. Why‽

I fixed my `kill-farm` script:

#! /bin/bash
for pid in $(ps -xo pid,command|perl -ne '@s=split; print $s[0] . "\n" if $s[1] eq "/home/alex/farm/farm"'); do
    kill -9 $pid
done

From `/var/log/syslog`:

Linux sibirocobombus 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Oct  1 19:40:53 2018 from 2a02:168:4823:0:9c0b:e651:b5d7:2f4f
root@sibirocobombus:~# cat /tmp/log
Oct  1 16:51:54 sibirocobombus kernel: [15296489.567141] /home/alex/farm: page allocation stalls for 10472ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567150] CPU: 0 PID: 13804 Comm: /home/alex/farm Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u5
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567151] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567153]  0000000000000000 ffffffffb9f285b4 ffffffffba5febb0 ffffb9ccc10cbb68
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567156]  ffffffffb9d84f3a 024200ca00000002 ffffffffba5febb0 ffffb9ccc10cbb08
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567159]  ffff8f1a00000010 ffffb9ccc10cbb78 ffffb9ccc10cbb28 98dccf6c149eda9d
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567161] Call Trace:
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567168]  [<ffffffffb9f285b4>] ? dump_stack+0x5c/0x78
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567171]  [<ffffffffb9d84f3a>] ? warn_alloc+0x13a/0x160
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567173]  [<ffffffffb9d8592d>] ? __alloc_pages_slowpath+0x95d/0xbc0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567175]  [<ffffffffb9d85d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567177]  [<ffffffffb9dd7c3e>] ? alloc_pages_vma+0xae/0x260
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567181]  [<ffffffffb9daf069>] ? wp_page_copy+0x89/0x700
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567183]  [<ffffffffb9db0361>] ? do_wp_page+0x161/0x7d0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567186]  [<ffffffffb9db3170>] ? handle_mm_fault+0x8d0/0x1350
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567188]  [<ffffffffb9c24701>] ? __switch_to+0x2c1/0x6c0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567191]  [<ffffffffb9c5fd84>] ? __do_page_fault+0x2a4/0x510
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567194]  [<ffffffffba207788>] ? async_page_fault+0x28/0x30
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567195] Mem-Info:
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199] active_anon:563083 inactive_anon:141043 isolated_anon:813
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199]  active_file:3237 inactive_file:1620 isolated_file:0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199]  unevictable:0 dirty:21 writeback:7962 unstable:0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199]  slab_reclaimable:8695 slab_unreclaimable:14080
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199]  mapped:5449 shmem:8956 pagetables:15088 bounce:0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567199]  free:14229 free_pcp:0 free_cma:0
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567209] Node 0 active_anon:2252332kB inactive_anon:564172kB active_file:12948kB inactive_file:6480kB unevictable:0kB isolated(anon):3252kB isolated(file):0kB mapped:21796kB dirty:84kB writeback:31848kB shmem:35824kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:2 all_unreclaimable? no
Oct  1 16:51:55 sibirocobombus kernel: [15296489.567210] Node 0 DMA free:12124kB min:232kB low:288kB high:344kB active_anon:3492kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:12kB slab_unreclaimable:88kB kernel_stack:0kB pagetables:184kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567214] lowmem_reserve[]: 0 2975 2975 2975 2975
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567217] Node 0 DMA32 free:44792kB min:44820kB low:56024kB high:67228kB active_anon:2248840kB inactive_anon:564172kB active_file:12948kB inactive_file:6480kB unevictable:0kB writepending:31932kB present:3129216kB managed:3068860kB mlocked:0kB slab_reclaimable:34768kB slab_unreclaimable:56232kB kernel_stack:5392kB pagetables:60168kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567221] lowmem_reserve[]: 0 0 0 0 0
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567224] Node 0 DMA: 5*4kB (MEH) 5*8kB (UME) 4*16kB (ME) 5*32kB (UEH) 9*64kB (UMEH) 6*128kB (UEH) 5*256kB (UEH) 2*512kB (ME) 2*1024kB (UH) 3*2048kB (UME) 0*4096kB = 12124kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567236] Node 0 DMA32: 2599*4kB (UMEH) 1538*8kB (UMEH) 672*16kB (UMEH) 265*32kB (UME) 23*64kB (MEH) 1*128kB (H) 1*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 44812kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567248] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567249] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567249] 26825 total pagecache pages
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567251] 13006 pages in swap cache
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567252] Swap cache stats: add 3064172, delete 3051166, find 484392409/484925615
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567253] Free swap  = 147028kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567253] Total swap = 1022972kB
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567254] 786302 pages RAM
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567255] 0 pages HighMem/MovableOnly
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567255] 15110 pages reserved
Oct  1 16:51:56 sibirocobombus kernel: [15296489.567256] 0 pages hwpoisoned
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538179] /home/alex/farm invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538181] /home/alex/farm cpuset=/ mems_allowed=0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538186] CPU: 0 PID: 13764 Comm: /home/alex/farm Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u5
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538189]  0000000000000000 ffffffffb9f285b4 ffffb9ccc0e9bc20 ffff8f1a4e8f8100
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538193]  ffffffffb9dfe020 0000000000000000 0000000000000000 0000000000000001
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538195]  ffffffffb9d844e7 0000004252982a80 ffffffffc0454695 0000000000000001
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538197] Call Trace:
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538204]  [<ffffffffb9f285b4>] ? dump_stack+0x5c/0x78
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538206]  [<ffffffffb9dfe020>] ? dump_header+0x78/0x1fd
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538209]  [<ffffffffb9d844e7>] ? get_page_from_freelist+0x3f7/0xb40
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538214]  [<ffffffffc0454695>] ? virtballoon_oom_notify+0x25/0x70 [virtio_balloon]
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538217]  [<ffffffffb9d8047a>] ? oom_kill_process+0x21a/0x3e0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538220]  [<ffffffffb9d800fd>] ? oom_badness+0xed/0x170
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538221]  [<ffffffffb9d80911>] ? out_of_memory+0x111/0x470
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538223]  [<ffffffffb9d85b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538225]  [<ffffffffb9d85d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538227]  [<ffffffffb9dd7c3e>] ? alloc_pages_vma+0xae/0x260
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538230]  [<ffffffffb9daf069>] ? wp_page_copy+0x89/0x700
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538233]  [<ffffffffb9db0361>] ? do_wp_page+0x161/0x7d0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538235]  [<ffffffffb9db3170>] ? handle_mm_fault+0x8d0/0x1350
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538238]  [<ffffffffb9c24701>] ? __switch_to+0x2c1/0x6c0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538240]  [<ffffffffb9c5fd84>] ? __do_page_fault+0x2a4/0x510
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538243]  [<ffffffffba207788>] ? async_page_fault+0x28/0x30
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538244] Mem-Info:
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248] active_anon:571476 inactive_anon:143154 isolated_anon:532
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248]  active_file:204 inactive_file:222 isolated_file:0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248]  unevictable:0 dirty:0 writeback:3178 unstable:0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248]  slab_reclaimable:3157 slab_unreclaimable:13740
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248]  mapped:1215 shmem:7913 pagetables:15023 bounce:0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538248]  free:14207 free_pcp:37 free_cma:0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538252] Node 0 active_anon:2285904kB inactive_anon:572616kB active_file:816kB inactive_file:888kB unevictable:0kB isolated(anon):2128kB isolated(file):0kB mapped:4860kB dirty:0kB writeback:12712kB shmem:31652kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:5391 all_unreclaimable? yes
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538252] Node 0 DMA free:12124kB min:232kB low:288kB high:344kB active_anon:3484kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:12kB slab_unreclaimable:88kB kernel_stack:0kB pagetables:188kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538257] lowmem_reserve[]: 0 2975 2975 2975 2975
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538260] Node 0 DMA32 free:44704kB min:44820kB low:56024kB high:67228kB active_anon:2282420kB inactive_anon:572612kB active_file:816kB inactive_file:888kB unevictable:0kB writepending:12712kB present:3129216kB managed:3068860kB mlocked:0kB slab_reclaimable:12616kB slab_unreclaimable:54872kB kernel_stack:5376kB pagetables:59904kB bounce:0kB free_pcp:148kB local_pcp:148kB free_cma:0kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538264] lowmem_reserve[]: 0 0 0 0 0
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538266] Node 0 DMA: 4*4kB (UEH) 2*8kB (E) 2*16kB (E) 5*32kB (UMEH) 8*64kB (UMEH) 5*128kB (UEH) 4*256kB (UEH) 1*512kB (M) 3*1024kB (UEH) 3*2048kB (UME) 0*4096kB = 12128kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538278] Node 0 DMA32: 2108*4kB (UEH) 1402*8kB (UMEH) 746*16kB (UMEH) 286*32kB (UMEH) 40*64kB (MEH) 1*128kB (H) 1*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 44704kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538289] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538291] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538291] 18160 total pagecache pages
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538293] 9813 pages in swap cache
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538294] Swap cache stats: add 3100981, delete 3091168, find 484392412/484925620
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538294] Free swap  = 0kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538295] Total swap = 1022972kB
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538296] 786302 pages RAM
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538296] 0 pages HighMem/MovableOnly
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538297] 15110 pages reserved
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538298] 0 pages hwpoisoned
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538298] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538302] [  188]     0   188    15502      833      27       3       80             0 systemd-journal
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538304] [  376]     0   376    11625       55      26       3      104             0 systemd-logind
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538306] [  381]     0   381     7416       35      20       3       50             0 cron
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538308] [  387]     0   387    62560      163      30       3      351             0 rsyslogd
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538309] [  388]     0   388     3152        9      12       3       37             0 rsync
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538311] [  393]     0   393     1054        0       8       3       35             0 acpid
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538312] [  394]   105   394    11283       60      27       3       78          -900 dbus-daemon
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538314] [  450]     0   450     3634        0      12       3       38             0 agetty
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538316] [  458]     0   458    28698      198      26       3       93             0 monit
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538317] [  464]     0   464    17486       41      37       3      164         -1000 sshd
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538319] [  614]     0   614    12949      767      30       3     1563             0 munin-node
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538329] [  823]   107   823    27941       23      50       3      365             0 exim4
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538331] [10368]   100 10368    31821       19      32       3      119             0 systemd-timesyn
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538333] [10648]     0 10648    11308       12      24       3       91         -1000 systemd-udevd
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538335] [25971]  1000 25971    27465      378      54       3     6922             0 /home/alex/camp
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538337] [22812]     0 22812    26416      842      58       3       46             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538338] [ 2794]     0  2794   169401     3422      61       4      455             0 fail2ban-server
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538340] [29327]    33 29327   216079     2688     100       4      530             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538342] [23538]  1000 23538    81856      167     143       3    29692             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538343] [14140]  1000 14140    81929      249     144       3    29760             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538345] [15321]  1000 15321    85002      249     149       3    31614             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538346] [15742]  1000 15742    82686      247     144       3    29465             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538348] [15822]  1000 15822    82672      253     144       3    29467             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538349] [18583]  1000 18583    85088      250     149       3    31258             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538351] [16180]  1000 16180    83100      164     148       3    30518             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538353] [16240]  1000 16240    84600      170     151       3    31838             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538354] [17135]  1000 17135    84455      169     150       3    31723             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538356] [18081]  1000 18081    83276      169     147       3    29866             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538357] [18089]  1000 18089    82757      167     146       3    29665             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538359] [18097]  1000 18097    83000      167     146       3    29722             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538360] [28873]  1000 28873    83946      167     148       3    30644             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538362] [29477]  1000 29477    84020      168     148       3    30825             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538364] [17703]  1000 17703    85534    10436     152       3    22151             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538366] [25630]  1000 25630    84044    17581     147       3    13484             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538367] [23799]  1000 23799    83909    16024     149       4    15467             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538369] [23831]  1000 23831    88924    19813     159       4    16552             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538370] [17148]  1000 17148    27465     3262      52       3     4112             0 /home/alex/camp
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538372] [24680] 65534 24680    12608     2506      27       3        0             0 python
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538374] [ 6121]  1000  6121    84608    32664     151       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538376] [ 9051]    33  9051    26290      742      54       3       46             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538377] [ 9372]  1000  9372    79677    26765     141       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538379] [ 9393]  1000  9393    88087    36094     158       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538380] [ 9395]  1000  9395   104793    53630     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538382] [ 9396]  1000  9396    94890    43125     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538392] [ 9418]  1000  9418    21312     4988      45       3        0             0 perl
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538394] [ 9989]  1000  9989    17541     3956      37       3        0             0 perl
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538395] [11782]    33 11782   215922     2843      99       4       45             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538397] [11833]    33 11833   214475     2459      97       4       45             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538398] [26741]  1000 26741    92116    38897     165       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538400] [17934]  1000 17934    87620    35407     157       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538402] [18489]  1000 18489    85855    32863     152       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538403] [19093]    33 19093   215694     2608      99       4       45             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538405] [13755]  1000 13755    94098    41334     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538407] [13758]  1000 13758   106413    54375     195       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538408] [13759]  1000 13759    94098    41361     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538410] [13763]  1000 13763    94098    41501     170       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538411] [13764]  1000 13764    89787    36814     160       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538413] [13765]  1000 13765    93158    38962     166       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538414] [13766]  1000 13766    89786    36814     160       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538415] [13767]  1000 13767    94098    41334     171       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538417] [13768]  1000 13768   106413    54423     193       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538419] [13769]  1000 13769    93158    38962     166       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538420] [13771]  1000 13771    94098    41501     170       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538422] [13772]  1000 13772   106413    54423     193       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538423] [13773]  1000 13773    93158    38961     166       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538425] [13774]  1000 13774    89786    36814     160       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538427] [13775]    33 13775   214080      930      95       4       46             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538428] [13804]  1000 13804    94098    41501     170       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538430] [13805]  1000 13805    89786    36814     160       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538431] [13806]  1000 13806    95932    43127     174       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538433] [13807]  1000 13807   106413    54423     193       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538435] [13808]  1000 13808    92634    38961     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538436] [13809]  1000 13809    89786    36815     160       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538438] [13810]  1000 13810    95932    43127     174       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538439] [13812]  1000 13812    92634    38961     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538441] [13813]  1000 13813   105709    54206     190       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538442] [13814]  1000 13814    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538444] [13815]    33 13815   213919      868      95       4       46             0 apache2
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538446] [13817]  1000 13817    89164    36681     158       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538447] [13844]  1000 13844    92634    38898     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538449] [13845]  1000 13845   105609    54032     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538450] [13846]  1000 13846    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538452] [13847]  1000 13847   105789    54335     190       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538453] [13848]  1000 13848    89263    36789     158       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538455] [13849]  1000 13849    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538456] [13850]  1000 13850    92634    38898     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538458] [13851]  1000 13851    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538459] [13852]  1000 13852   105311    53674     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538461] [13853]  1000 13853    88605    36095     157       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538462] [13854]  1000 13854    92634    38940     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538463] [13855]  1000 13855    88605    36095     157       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538465] [13856]  1000 13856   105311    53631     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538466] [13857]  1000 13857    92634    38898     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538468] [13858]  1000 13858    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538469] [13859]  1000 13859    88605    36095     157       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538471] [13860]  1000 13860   105311    53631     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538472] [13861]  1000 13861    92634    38898     164       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538474] [13862]  1000 13862    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538475] [13863]  1000 13863    88605    36095     157       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538477] [13864]  1000 13864   105311    53737     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538478] [13865]  1000 13865    92116    38883     162       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538480] [13866]  1000 13866    95408    43126     172       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538481] [13867]  1000 13867    88087    36080     155       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538483] [13868]  1000 13868   105311    53631     189       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538484] [13869]  1000 13869    94890    43125     171       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538486] [13870]  1000 13870    92116    38883     162       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538488] [13871]  1000 13871    88087    36080     155       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538489] [13872]  1000 13872   104793    53616     188       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538491] [13873]  1000 13873    94890    43111     171       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538492] [13875]  1000 13875    88087    36070     155       3        0             0 /home/alex/farm
Oct  1 16:51:56 sibirocobombus kernel: [15296512.538494] Out of memory: Kill process 13768 (/home/alex/farm) score 53 or sacrifice child
Oct  1 16:51:56 sibirocobombus kernel: [15296512.867096] Killed process 13768 (/home/alex/farm) total-vm:425652kB, anon-rss:217024kB, file-rss:668kB, shmem-rss:0kB

– Alex Schroeder 2018-10-01 17:02 UTC

---

OK, I wrote myself a `cull-farm` script:

#! /home/alex/perl5/perlbrew/perls/perl-5.26.1/bin/perl
use Modern::Perl;
my %data = map { $_->[0] => $_->[1] }
           grep { $_->[2] eq "/home/alex/farm/farm" and $_->[3] ne "1" }
           map { [ split ] }
           (qx(ps -xo pid,etimes,command,ppid));
my @ids = sort { $data{$a} <=> $data{$b} } keys %data;
# say "pids: @ids";
my @old_ids = @ids[4 .. $#ids];
if (@old_ids > 0) {
  say "SIGTERM for @old_ids";
  kill 'TERM', @old_ids;
} else {
  say "The only child processes we have are @ids";
}

It runs a `ps` which prints the process id, the elapsed time in seconds, the command name (so that I can filter for my processes), and the parent process id. I don’t want to kill the top level process that keeps spawning the children!

– Alex Schroeder 2018-10-07 17:17 UTC

---

I fiddled with that script again because I noticed something. Here’s an example output when sorted by etimes, the elapsed time in seconds:

alex@sibirocobombus:~$ ps -x -k etimes -o pid,etimes,command,ppid | grep /home/alex/farm/farm
22634       0 grep /home/alex/farm/farm   16815
22541      75 /home/alex/farm/farm        32495
22542      75 /home/alex/farm/farm        32495
22539      76 /home/alex/farm/farm        32495
22540      76 /home/alex/farm/farm        32495
32495   27081 /home/alex/farm/farm            1
 4376  217327 /home/alex/farm/farm            1
 4352  217356 /home/alex/farm/farm            1
 4285  217497 /home/alex/farm/farm            1
 4238  217538 /home/alex/farm/farm            1

In this situation we want to start killing from the end. Look at all those old processes that are still hanging around, started by 1. These need to go!

#! /home/alex/perl5/perlbrew/perls/perl-5.26.1/bin/perl
use Modern::Perl;

if (grep /^(-h|--help)$/, @ARGV) {
  say "Use -d for debugging";
  exit;
}

my $debug = 0;
$debug = 1 if (grep /^(-d|--debug)$/, @ARGV);

my %data =
    map { $_->[0] => { etimes => $_->[1], ppid => $_->[3] }}
    grep { $_->[2] eq "/home/alex/farm/farm" }
    map { [ split ] }
    # -o is output; 0: pid; 1: etimes; 2: command; 3: ppid
    # -k is sorting; etimes is elapsed time
    (qx(ps -x -k etimes -o pid,etimes,command,ppid));

# Example result:
# 22634       0 grep /home/alex/farm/farm   16815
# 22541      75 /home/alex/farm/farm        32495
# 22542      75 /home/alex/farm/farm        32495
# 22539      76 /home/alex/farm/farm        32495
# 22540      76 /home/alex/farm/farm        32495
# 32495   27081 /home/alex/farm/farm            1
#  4376  217327 /home/alex/farm/farm            1
#  4352  217356 /home/alex/farm/farm            1
#  4285  217497 /home/alex/farm/farm            1
#  4238  217538 /home/alex/farm/farm            1
# In this situation, the process that we do not want to kill is 32495 and we
# want to start killing from the back: 4376, 4352, 4285, 4238.

for my $pid (keys %data) {
  if ($data{$pid} and $data{$data{$pid}->{ppid}}) {
    say "protected pid: $data{$pid}->{ppid}" if $debug;
    delete $data{$data{$pid}->{ppid}};
  }
}
my @ids = sort { $data{$b} <=> $data{$a} } keys %data;
say "pids: @ids" if $debug;
my @old_ids = @ids[4 .. $#ids];
if (@old_ids > 0) {
  say "SIGTERM for @old_ids";
  kill 'TERM', @old_ids unless $debug;
} else {
  say "The only child processes we have are @ids";
}

– Alex Schroeder 2018-10-26 12:04 UTC