I’m on a trip and once again the bot fuckers are at it. Long term load average is 0.5; I have two cores so I can handle an average load of 2.0; but when I looked at my sites this morning load was at 110. Some idiot is training their AI again, is my guess.
I switched off Emacs Wiki and Community Wiki for the time being. They'll be back when the AI bros are tired of downloading every link.
Munin shows load goin up and up and then Munin breaks down…
#Butlerian Jihad #Emacs
Compare this shit I should be dealing with to the stuff I want to look at:
So here's the new deal: I'm redirecting any request containing "rcidonly" in the query string to nobots. This parameter is used when showing recent changes for just a single page ("show all changes"), or for showing a feed for just one page. These are valid links but when the Internet ingesting machines start slurping, these endpoints are too expensive. This redirect to the static page already saves us a lot of resources.
In addition to that, I wrote a fail2ban filter `/etc/fail2ban/filter.d/alex-bots.conf` containing this:
[Definition] # www.emacswiki.org:443 000.000.000.000 - - [19/Feb/2025:00:05:14 +0100] "GET /emacs?action=rc&all=1&from=1728421966&rcidonly=Comments_on_FacesPerBuffer&showedit=1 HTTP/1.1" 200 7168 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" failregex = ^(www\.emacswiki\.org|communitywiki\.org|campaignwiki\.org):[0-9]+ <HOST> .*rcidonly=
And I added a section using this filter to my jail `/etc/fail2ban/jail.d/alex.conf`:
[alex-bots] enabled = true port = http,https logpath = %(apache_access_log)s findtime = 3600 maxretry = 2
And when I check it, it already contained some bots:
# fail2ban-client status alex-bots Status for the jail: alex-bots |- Filter | |- Currently failed: 166 | |- Total failed: 474 | `- File list: /var/log/apache2/access.log `- Actions |- Currently banned: 0 |- Total banned: 9 `- Banned IP list:
The web interface to my IRC servers no longer worked. The Lounge reported timeout errors.
So I investigated the IRC server and that seemed to work. I noticed that no numbers appeared on Munin so I was confused. I added a config for Monit, thinking that perhaps I'd get more info. But no, everything seemed to work. I even managed to log on using the `irc` program (ircII):
irc blog-reader SSLIRC/campaignwiki.org:6697
Munin also didn't show other info, like Apache data. Apparently that's due to me deleting the `000-default.conf` file which ensures that localhost is in fact a virtual host. Without it, there's no way to access `http://localhost/server-info` and `http://localhost/server-status`. Well, I fixed that, but still no numbers.
This is when I realized that The Lounge managed to connect to other IRC servers. It was only failing for my own server. And I started thinking.
All these symptoms pointed to me being blocked by my own firewall. And so it was!
This even explained another small setup problem where two of my sites behind Apache were being served by one service that listened on the same port but for two domains. Those requests also were getting lost. I rewrote that particular setup so that each site is now proxied directly and therefore listens for `localhost` and a port instead of a hostname and a port. Because that second pathway goes through the firewall.
Yikes! So much headache only because I had blocked myself.