I’m not sure what to make of this. At some point this evening I was alerted to the fact that all my sites were down. When I came back home, I checked the log files and saw that the first `oom-killer` (oom means *out of memory*) invokation in `syslog` happened at 17:00:
Aug 24 17:00:18 sibirocobombus kernel: [12013817.356181] /home/alex/farm invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
And from then on, it just kept bombing. The last one was killed about an hour later:
Aug 24 18:10:25 sibirocobombus kernel: [12018023.798769] Killed process 10775 (/home/alex/farm) total-vm:338508kB, anon-rss:127416kB, file-rss:0kB, shmem-rss:0kB
And from then on, everything was down. Apache was still running but the backend was not:
[Fri Aug 24 23:08:51.927171 2018] [proxy:error] [pid 3594:tid 139759988967168] (111)Connection refused: AH00957: HTTP: attempt to connect to 178.209.50.237:8080 (campaignwiki.org) failed
The backend are my wikis and web applications using the *Mojolicious* framework, running under *Toadfarm*. That’s the `/home/alex/farm` in those out of memory messages above.
When I checked using `ps aux`, I saw many `farm` processes. I ended up using something like this:
for pid in $(ps -xo pid,command|grep 'farm |cut -d ' ' -f 1); do kill -9 $pid; done
And for those lines starting with a space (why are you doing this to me, `ps`?), one more time:
for pid in $(ps -xo pid,command|grep 'farm |cut -d ' ' -f 2); do kill -9 $pid; done
And I still don’t see anything suspicious in my log files.
Clearly, something happened between 16:00 and 18:00! Let’s check IP numbers:
$ sudo grep 24/Aug/2018:1[67]: /var/log/apache2/access.log|bin/leech-detector|head IP Number hits bandw. hits% interv. status code distrib.
OK, now we’re getting closer. My `robots.txt` says, amongst many other things:
User-agent: * Crawl-delay: 20
So an average interval of less than a second between requests is clearly an idiot. Who might it be? `whois` says “Vultr Holdings, LLC”. An ominous name!
Grepping the access log for the IP number in question shows that they were basically trying to download `communitywiki.org`.
Their first request?
[24/Aug/2018:16:59:55 +0200] "GET /wiki/BannedHosts HTTP/1.1" 200 7180 "https://communitywiki.org/wiki/BannedHosts" "Pcore-HTTP/v0.44.0"
Notice that this was just seconds before it all came crashing down!
The first HTTP 503 status code (”Service Unavailable”) is right there, in the same second:
[24/Aug/2018:16:59:55 +0200] "GET /wiki?action=reindex HTTP/1.1" 503 5573 "https://communitywiki.org/wiki?action=reindex" "Pcore-HTTP/v0.44.0"
From then on, there will be more and more server errors. But they don’t care...
Pcore is a Perl framework.
Stop gap measure for Apache:
RewriteCond "%{HTTP_USER_AGENT}" "Pcore" RewriteRule ".*" "-" [redirect=403,last]
I am still astonished about the speed. Something about my setup is not cool. I am particularly annoyed because my Monit setup didn’t allow me to recover from this situation. And I’m annoyed because these idiots stopped hitting the site at 17:05:33 and yet my setup wasn’t able to recover. It went ballistic and died *an hour later*. Not good.
Any ideas? What would you do?
#Web #Administration
(Please contact me if you want to remove your comment.)
⁂
I guess if I wanted to have a solution that also worked for Gopher (see 2018-07-15 Russian Gopher Idiots at Work), I cannot rely on solutions based on Apache, tempting as they are. It would have to be some sort of firewall. @sheogorath recommended taking another look at `fail2ban` (which I already use), so that’s promising. Must look at `/etc/fail2ban/filter.d/`, tomorrow.
2018-07-15 Russian Gopher Idiots at Work
Here’s their toot on it:
The `fail2ban` daemon watches specified logs and log files for changes (like you can do with `journalctl -f`) and passes the content to a regex. These regex’ are specified in `/etc/fail2ban/filter.d/`.
You only have to enable them in `/etc/fail2ban/jail.local` or `/etc/fail2ban/jail.d`.
Given that a regex matches the specific number of times in the specified time, it’ll trigger an action, like ban the ip in the firewall. These are specified in `/etc/fail2ban/action.d`.
– Alex Schroeder 2018-08-24 22:45 UTC
---
OK, first attempt...
This is the filter for my Apache access.log file, stored in `/etc/fail2ban/filter.d/alex-apache.conf`:
# Author: Alex Schroeder <alex@gnu.org> [Definition] # ANY match in the logfile counts! # communitywiki.org:443 000.000.000.000 - - [24/Aug/2018:16:59:55 +0200] "GET /wiki/BannedHosts HTTP/1.1" 200 7180 "https://communitywiki.org/wiki\ /BannedHosts" "Pcore-HTTP/v0.44.0" failregex = ^[^:]+:[0-9]+ <HOST> # Except cgit! # alexschroeder.ch:443 0:0:0:0:0:0:0:0 - - [28/Aug/2018:09:14:39 +0200] "GET /cgit/bitlbee-mastodon/objects/9b/ff0c237ace5569aa348f6b12b3c2f95e07fd0d HTTP/1.1" 200 3308 "-" "git/2.18.0" ignoreregex = ^[^"]*"GET /cgit
This is the filter for the log file of my gopher server, stored in `/etc/fail2ban/filter.d/alex-gopher.conf`:
# Author: Alex Schroeder <alex@gnu.org> [Init] # 2018/08/25-09:08:55 CONNECT TCP Peer: "[000.000.000.000]:56281" Local: "[000.000.000.000]:70" datepattern = ^%%Y/%%m/%%d-%%H:%%M:%%S [Definition] # ANY match in the logfile counts! failregex = CONNECT TCP Peer: "\[<HOST>\]:\d+"
And this sets up a few jails using these filters, stored in `/etc/fail2ban/jail.d/alex.conf`:
[alex-apache] enabled = true port = http,https logpath = %(apache_access_log)s findtime = 40 maxretry = 20 [alex-gopher] enabled = true port = 70 logpath = /home/alex/farm/gopher-server.log findtime = 40 maxretry = 20 [alex-gopher-ssl] enabled = true filter = alex-gopher port = 7443 logpath = /home/alex/farm/gopher-server-ssl.log findtime = 40 maxretry = 20
Reload and show status:
# fail2ban-client reload # fail2ban-client status Status |- Number of jail: 4 `- Jail list: alex-apache, alex-gopher, alex-gopher-ssl, sshd
We’ll see how it goes!
– Alex Schroeder 2018-08-25 07:35 UTC
---
Well, the situation is pretty much under control, now. I see that `fail2ban` has between zero and four hosts banned. That looks reasonable, actually.
– Alex Schroeder 2018-08-26 08:07 UTC
---
The old `fail2ban` which comes with Debian stable doesn’t support IPv6 addresses but the newer one does.
1. search for fail2ban package
2. follow the links to a version 0.10 or better
3. download file (in my case that meant `wget http://ftp.ch.debian.org/debian/pool/main/f/fail2ban/fail2ban_0.10.2-2_all.deb`)
4. install (in my case using `dpkg -i fail2ban_0.10.2-2_all.deb`)
I have made one change to my `/etc/fail2ban/fail2ban.conf`: I just want to know about banning and unbanning, not about all the IP numbers seen.
loglevel = NOTICE
– Alex Schroeder 2018-08-27 11:28 UTC
---
I’ve had to fiddle with the filter again and again. Loading the Munin page, for example, generated enough requests for instant banning. So now I’m trying to find a good place to be where most simple file aren’t considered for banning. This is my current `/etc/fail2ban/filter.d/alex-apache.conf`.
# Author: Alex Schroeder <alex@gnu.org> [Definition] # ANY match in the logfile counts! # communitywiki.org:443 000.000.000.000 - - [24/Aug/2018:16:59:55 +0200] "GET /wiki/BannedHosts HTTP/1.1" 200 7180 "https://communitywiki.org/wiki/BannedHosts" "Pcore-HTTP/v0.44.0" failregex = ^[^:]+:[0-9]+ <HOST> # Except cgit, css files, images... # alexschroeder.ch:443 0:0:0:0:0:0:0:0 - - [28/Aug/2018:09:14:39 +0200] "GET /cgit/bitlbee-mastodon/objects/9b/ff0c237ace5569aa348f6b12b3c2f95e07fd0d HTTP/1.1" 200 3308 "-" "git/2.18.0" ignoreregex = ^[^"]*"GET /(robots\.txt |favicon\.ico |[^/ ]+.(css|js) |cgit/|css/|fonts/|pics/|1pdc/|gallery/|static/|munin/)
– Alex Schroeder 2018-09-04 10:25 UTC