2016-05-31 Zombie Apocalypse on the Toadfarm

I’ve been using the Toadfarm/Mojolicious/Oddmuse setup for alexschroeder.ch for quite a while, now. I’ve decided I wanted to switch to a similar setup for emacswiki.org, but small differences keep annoying me. I already wrote about my inability to start and stop the user process using Monit. Here’s another problem: Even though I’m not using the toadfarm at the moment (port 80 is nginx acting as the proxy cache for Apache which runs on port 8080 where as Toadfarm runs on port 8081 and thus isn’t getting used unless I use it – plus port scanners, I guess), zombies are accumulating.

using Monit

Today I killed and restarted Toadfarm and all the zombies were cleaned up. Good! But that doesn’t sounds like a good solution. What could be the difference between alexschroeder.ch and emacswiki.org to result in one system accumulating lots of zombies?

emacswiki.org:

One hundred zombies

alexschroeder.ch:

Null problemo

I noticed that on the system that doesn’t have a zombie problem, I’ve been using some older software. Let’s upgrade the working system and see whether the zombies start appearing. I know something is wrong about this line of reasoning but I can’t help thinking that newer is better.

+--------------------------+------------------+---------------+
|       Perl Module        | alexschroeder.ch | emacswiki.org |
+--------------------------+------------------+---------------+
| Toadfarm                 |             0.67 |          0.74 |
| Mojolicious              |             6.19 |          6.62 |
| Mojolicious::Plugin::CGI |             0.23 |          0.32 |
+--------------------------+------------------+---------------+

So, a few hours ago I upgrade the software on alexschroeder.ch. When looking at the following numbers, remember that Toadfarm on emacswiki.org is basically not being used.

emacswiki.org:

One zombie

alexschroeder.ch:

A thousand zombies

I’ve added the following rule to my Monit setup:

    if children > 250 then restart

The result after a few hours:

The reaper comes

I already dread trying to convince the developers that they broke something in that recent upgrade.

On the same server, I have a pure Toadfarm/Mojolicious application that doesn’t use Mojolicious::Plugin::CGI which I just restarted. Let’s see whether korero.org also starts accumulating zombies.

For the moment, korero.org after a restart (and thus using the new versions of Toadfarm and Mojolicious):

alex@kallobombus:~/korero.org$ sudo ps fax|grep korero\\.org
 6239 ?        Ss     0:00 /home/alex/korero.org/server.pl
 6240 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6241 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6242 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6243 ?        S      0:00  \_ /home/alex/korero.org/server.pl

As compared to alexschroeder.ch (using the new versions of Toadfarm and Mojolicious and Mojolicious::Plugin::CGI):

alex@kallobombus:~/korero.org$ sudo ps fax|grep "farm.*defunct"|wc -l
243
alex@kallobombus:~/korero.org$ sudo ps fax|grep farm\\b|tail
 5991 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 5992 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 5999 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6001 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6013 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6246 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6293 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6322 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6460 ?        Z      0:00      \_ [/home/alex/farm] <defunct>
 6545 ?        Z      0:00      \_ [/home/alex/farm] <defunct>

A little stress testing showed that Mojolicious::Plugin::CGI must be the culprit.

Sending a few hundred requests to korero.org, which doesn’t use Mojolicious::Plugin::CGI using this:

I=1; for f in *; do I=$(( $I+1 )); (curl --silent https://korero.org/ > test$I &); done

Checking for zombies:

alex@kallobombus:~$ sudo ps fax|grep korero
 9021 pts/0    S+     0:00              \_ grep korero
 6239 ?        Ss     0:00 /home/alex/korero.org/server.pl
 6240 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6241 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6242 ?        S      0:00  \_ /home/alex/korero.org/server.pl
 6243 ?        S      0:00  \_ /home/alex/korero.org/server.pl

No zombies!

Let’s check how many zombies there are for alexschroeder.ch, which uses Mojolicious::Plugin::CGI:

alex@kallobombus:~$ sudo ps fax|grep "farm.*defunct"|wc -l
161

Sending a few hundred requests:

I=1; for f in *; do I=$(( $I+1 )); (curl --silent https://alexschroeder.ch/ > test$I &); done

Count zombies again:

alex@kallobombus:~$ sudo ps fax|grep "farm.*defunct"|wc -l
500

OK, so next is to figure out when this change appeared.

The only strange thing is that for emacswiki.org, which uses Mojolicious::Plugin::CGI, but doesn’t get much traffic, the zombie count has stayed at 1. Let’s test it!

Count:

kensanata@po6:~$ sudo ps fax|grep \\bdefunct | wc -l
1

Stress:

I=1; for f in test*; do I=$(( $I+1 )); (curl --silent http://www.emacswiki.org:8081/wiki > test$I &); done

Count:

kensanata@po6:~$ sudo ps fax|grep \\bdefunct | wc -l
15

Hm, that’s not a lot. I wonder what’s going on. A file that isn’t being closed? A log file that cannot be written to?

Well, the bug apparently got introduced in Mojolicious::Plugin::CGI 0.27. Note that in the following output, the one hit I keep getting is the grep process itself. 😉

alex@kallobombus:~$ cd src/mojolicious-plugin-cgi/
alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout 0.25
Previous HEAD position was 035d5e3... Released version 0.24
HEAD is now at 5df5779... Released version 0.25
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo monit restart toadfarm                                                                                         alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo ps fax|grep "farm.*defunct"|wc -l
1
alex@kallobombus:~/src/mojolicious-plugin-cgi$ for i in `seq 100`; do (curl --silent https://alexschroeder.ch/ > /dev/null &); done
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo ps fax|grep "farm.*defunct"|wc -l
1
alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout 0.26
Previous HEAD position was 5df5779... Released version 0.25
HEAD is now at dd98a07... Released version 0.26
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo monit restart toadfarm
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo ps fax|grep "farm.*defunct"|wc -l
1
alex@kallobombus:~/src/mojolicious-plugin-cgi$ for i in `seq 100`; do (curl --silent https://alexschroeder.ch/ > /dev/null &); done
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo ps fax|grep "farm.*defunct"|wc -l
1
alex@kallobombus:~/src/mojolicious-plugin-cgi$ git checkout 0.27
Previous HEAD position was dd98a07... Released version 0.26
HEAD is now at 83949da... Released version 0.27
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo monit restart toadfarm
alex@kallobombus:~/src/mojolicious-plugin-cgi$ for i in `seq 100`; do (curl --silent https://alexschroeder.ch/ > /dev/null &); done
alex@kallobombus:~/src/mojolicious-plugin-cgi$ sudo ps fax|grep "farm.*defunct"|wc -l
41

I’m going to leave it at 0.26 and then I can look at it when I have some time.

Looking good...

One way to do it:

cpanm --sudo https://cpan.metacpan.org/authors/id/J/JH/JHTHORSEN/Mojolicious-Plugin-CGI-0.26.tar.gz

​#Web ​#Administration ​#Mojolicious ​#Munin

Comments

(Please contact me if you want to remove your comment.)

I opened an issue on GitHub, and I wrote a test to go along with it.

opened an issue

a test

– Alex Schroeder 2016-06-03 11:04 UTC

---

And it was merged!

– Alex Schroeder 2016-06-07 22:17 UTC