What the hell happened around 18:00 yesterday?
Was it an unusual amount of requests? I don’t think so:
alex@sibirocobombus:~$ sudo cat /var/log/apache2/access.log.1 | bin/time-grouping 15/Feb/2017:06 426 1% 15/Feb/2017:05 949 3% 15/Feb/2017:04 1050 3% 15/Feb/2017:03 1320 4% 15/Feb/2017:02 1292 4% 15/Feb/2017:01 1272 4% 15/Feb/2017:00 1253 4% 14/Feb/2017:23 1205 4% 14/Feb/2017:22 1181 4% 14/Feb/2017:21 1195 4% 14/Feb/2017:20 1161 4% 14/Feb/2017:19 1015 3% 14/Feb/2017:18 965 3% 14/Feb/2017:17 1121 4% 14/Feb/2017:16 1129 4% 14/Feb/2017:15 1022 3% 14/Feb/2017:14 852 3% 14/Feb/2017:13 1085 4% 14/Feb/2017:12 1094 4% 14/Feb/2017:11 1154 4% 14/Feb/2017:10 1115 4% 14/Feb/2017:09 1198 4% 14/Feb/2017:08 1408 5% 14/Feb/2017:07 973 3% 14/Feb/2017:06 531 1%
Nothing unusual in the Toadfarm logs.
I noticed something weird in my Apache logs, though:
alex@sibirocobombus:~$ sudo cat /var/log/apache2/access.log.1 | bin/leech-detector | head IP Number hits bandw. hits% interv. status code distrib. 127.0.0.1 1590 4K 5% 54.3s 200 (99%), 301 (0%), 404 (0%) 62.210.215.100 1255 2K 4% 68.7s 301 (33%), 200 (33%), 302 (32%), 503 (0%) ...
Who is 62.210.215.100 and what are they doing?
communitywiki.org:80 62.210.215.100 - - [15/Feb/2017:06:38:14 +0100] "GET /de?action=rss;rcidonly=OpenWLAN HTTP/1.1" 301 487 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:443 62.210.215.100 - - [15/Feb/2017:06:38:14 +0100] "GET /de?action=rss;rcidonly=OpenWLAN HTTP/1.1" 302 4026 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:443 62.210.215.100 - - [15/Feb/2017:06:38:14 +0100] "GET /wiki?action=rss;rcidonly=OpenWLAN HTTP/1.1" 200 4687 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:80 62.210.215.100 - - [15/Feb/2017:06:39:45 +0100] "GET /en?action=rss;rcidonly=ConservationOfRationality HTTP/1.1" 301 521 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:443 62.210.215.100 - - [15/Feb/2017:06:39:45 +0100] "GET /en?action=rss;rcidonly=ConservationOfRationality HTTP/1.1" 302 4060 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:443 62.210.215.100 - - [15/Feb/2017:06:39:45 +0100] "GET /wiki?action=rss;rcidonly=ConservationOfRationality HTTP/1.1" 200 4698 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" ...
What we see here is that somebody is requesting various pages via HTTP (port 80) and getting redirected permanently (status 301) to HTTPS (port 443) and getting redirected temporarily (status 302) to a different URL (`/wiki` instead of `/en`). We also see that they are getting the URL in various languages.
Just to make sure, I added the following to my `.htaccess` file:
1. idiots downloading every single RSS file order allow,deny deny from 62.210.215.100 allow from all
And now they’re getting something else:
communitywiki.org:80 62.210.215.100 - - [15/Feb/2017:08:51:57 +0100] "GET /cw-fr?action=rss;rcidonly=LackOfReworking HTTP/1.1" 301 507 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" communitywiki.org:443 62.210.215.100 - - [15/Feb/2017:08:51:57 +0100] "GET /cw-fr?action=rss;rcidonly=LackOfReworking HTTP/1.1" 403 3930 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36"
Their HTTP request gets redirected to HTTPS and then they’re getting an access denied (status 403).
But wait, are they perhaps the cause for all these problems?
alex@sibirocobombus:~$ sudo cat /var/log/apache2/access.log.1 | grep 62.210.215.100 | bin/time-grouping 15/Feb/2017:06 24 1% 15/Feb/2017:05 60 4% 15/Feb/2017:04 54 4% 15/Feb/2017:03 44 3% 15/Feb/2017:02 48 3% 15/Feb/2017:01 48 3% 15/Feb/2017:00 57 4% 14/Feb/2017:23 78 6% 14/Feb/2017:22 38 3% 14/Feb/2017:21 41 3% 14/Feb/2017:20 45 3% 14/Feb/2017:19 42 3% 14/Feb/2017:18 33 2% 14/Feb/2017:17 65 5% 14/Feb/2017:16 46 3% 14/Feb/2017:15 69 5% 14/Feb/2017:14 24 1% 14/Feb/2017:13 45 3% 14/Feb/2017:12 68 5% 14/Feb/2017:11 62 4% 14/Feb/2017:10 69 5% 14/Feb/2017:09 63 5% 14/Feb/2017:08 45 3% 14/Feb/2017:07 48 3% 14/Feb/2017:06 39 3%
Nope, they didn’t start yesterday at around 18:00.
Oh well. 😟
Restarted Toadfarm. Manually killed all the farm processes I found via `top`. Restarted Apache for good measure. I think we’re good.
I also checked `/var/log/syslog.1` and found nothing surprising around 18:00. Spamd is checking mails, Munin is running, more spam filtering, and that basically covers the time from 17:00 to 19:00.
☯
Well, things are still not good.
top - 10:39:14 up 29 days, 23:18, 1 user, load average: 5.26, 5.60, 5.69 Tasks: 66 total, 6 running, 57 sleeping, 0 stopped, 3 zombie %Cpu(s): 98.7 us, 0.7 sy, 0.0 ni, 0.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4194304 total, 2157836 used, 2036468 free, 0 buffers KiB Swap: 1048576 total, 480444 used, 568132 free. 1456564 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18835 alex 20 0 217104 71452 2284 R 19.0 1.7 20:59.77 /home/alex/farm 17130 alex 20 0 216768 71140 2268 R 18.6 1.7 37:40.49 /home/alex/farm 18286 alex 20 0 217932 72364 2308 R 18.6 1.7 23:40.29 /home/alex/farm 19937 alex 20 0 218128 72652 2332 R 18.6 1.7 16:56.54 /home/alex/farm 22862 alex 20 0 218780 73208 2332 R 18.6 1.7 7:35.41 /home/alex/farm 16345 alex 20 0 209860 68980 3432 S 1.7 1.6 0:52.01 /home/alex/farm 16344 alex 20 0 212272 69720 2180 S 0.7 1.7 0:52.34 /home/alex/farm 16346 alex 20 0 208112 65936 2012 S 0.7 1.6 0:55.12 /home/alex/farm 16347 alex 20 0 210904 68176 2120 S 0.7 1.6 0:52.76 /home/alex/farm
And they keep coming back. I’m slowly taking sites offline to where the trouble lies.
#Apache #Toadfarm #Administration
(Please contact me if you want to remove your comment.)
⁂
OK, I started disabling sites in my Toadfarm configuration until I only had two sites left: campaignwiki.org and oddmuse.org. A few hours I checked back and I had ten farm processes running using 10% CPU each. Still a problem!
Then I disabled campaignwiki.org and tried calling it from the command line. Mojolicious allows me to do that, just like the CGI module does, too. It turns out that I can run a few requests, but when I load RecentChanges, I get a timeout.
Here’s me getting a regular page:
alex@sibirocobombus:~/farm$ perl campaignwiki.pl get '/?action=browse&id=Hauptseite' [Wed Feb 15 22:39:27 2017] [debug] GET "/" [Wed Feb 15 22:39:27 2017] [debug] Routing to a callback [Wed Feb 15 22:39:27 2017] [debug] [CGI:CODE(0x1506388):11783] START /home/alex/farm/wiki.pl [Wed Feb 15 22:39:28 2017] [debug] 200 OK (0.32674s, 3.061/s) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ... </body> </html>
Here’s me getting RecentChanges:
alex@sibirocobombus:~/farm$ perl campaignwiki.pl get '/RecentChanges' [Wed Feb 15 22:39:42 2017] [debug] GET "/RecentChanges" [Wed Feb 15 22:39:42 2017] [debug] Routing to a callback [Wed Feb 15 22:39:43 2017] [debug] [CGI:CODE(0x1552388):11786] START /home/alex/farm/wiki.pl [Wed Feb 15 22:39:43 2017] [debug] 200 OK (0.350602s, 2.852/s) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ... <div class="rc"><hr />[Wed Feb 15 22:39:58 2017] [debug] Inactivity timeout [Wed Feb 15 22:40:03 2017] campaignwiki.pl: Problem loading URL "http://127.0.0.1:40810/RecentChanges": Inactivity timeout
When I check top, I see a perl process running eating all available CPU for every time I try to view RecentChanges and got an inactivity timeout. That’s suspiciously like the problem I’ve been experiencing.
So here’s the current status: All the web apps are back online except for campaignwiki.org.
– Alex Schroeder 2017-02-15 21:50 UTC
---
I added some debug warning messages and discovered that the infinite loop was somehow caused by StripRollbacks for the Gridmapper namespace. I looked at the rc.log file and found a single rollback entry. I removed the entire section and everything seems to work. So campaignwiki.org is up again and the bug remains a mystery.
It took me a while to get load back down
– Alex Schroeder 2017-02-15 22:16 UTC