2005-12-17 Wikis Spam

Today there was some uncaught spam on EmacsWiki:RecentChanges. If you look at the list below, you’ll see how many edits a certain IP number made, what percentage of all edits those comprised, how many edits were rejected, and what percentage of the edits made were rejected.

EmacsWiki:RecentChanges

Day 0 just started, so the number of total edits is very small which explains the large 28%. Other than that, however, I think we see that there’s not a huge barrage of WikiSpam hitting EmacsWiki.

WikiSpam

EmacsWiki

day               IP      Edits  %Total  Rejects %Edits
0
      218.18.168.242         19  28%          8   42%
1
        218.73.40.82          7   2%          2   28%
       85.100.24.112          1   0%          1  100%
2
       85.100.30.196          1   0%          1  100%
        218.76.12.67          1   0%          1  100%
3
4
       61.49.133.236          3   1%          1   33%
       218.80.158.89          3   1%          3  100%
5
6
7
8
9
10
11
       220.168.99.47          1   0%          1  100%
12
13
       220.168.99.47          1   0%          1  100%
14

Why am I mentioning this at all? The EmacsWiki:BannedContent list is getting too big. If somebody like DrewAdams updates more than 80 source files on EmacsWiki on a single day, then the Oddmuse:Despam Action crashes. The script is killed without being able to finish, leaving lockfiles behind, etc.

EmacsWiki:BannedContent

EmacsWiki

Oddmuse:Despam Action

I tried it with a much shorter list of banned regular expressions and it worked like a charm. I therefore think I will do the following:

1. Automatically expire older regular expressions (eg. after a year)

2. Change the format to the SharedAntiSpam format in order to add dates

EmacsWiki

​#Wikis ​#Spam

Comments

(Please contact me if you want to remove your comment.)

It would be good to keep a timestamp when a blacklist regexp is added, but even more useful would be if the engine kept a timestamp of when a regexp last caught some spam. Should be possible hey?

I mentioned this and some other ideas for a more advanced blacklist format in the discussion here: http://wiki.chongqed.org//ContentBanning Taken to extremes you can end up trying to devise DNS style trust mechanisms, but maybe that’s a bit ambitious.

http://wiki.chongqed.org//ContentBanning

– Halz 2006-01-27 13:40 UTC

Halz

---

That is true. However, when I look at the numbers, I don’t think it is worth my time – yet. See WebServerLogs for the script.

WebServerLogs

                  IP        requests      edit denied

aschroeder@thinkmo:~$ spam-detector < /org/org.emacswiki/logs/access.log
aschroeder@thinkmo:~$ spam-detector < /org/org.emacswiki/logs/access.log.1
       209.22.11.124          1   0%          1  100%
aschroeder@thinkmo:~$ for n in 2 3 4 5 6 7 8 9 10 11 12 13 14; do f=access.log.$n.gz; echo $f; zcat /org/org.emacswiki/logs/$f | spam-detector; done
access.log.2.gz
access.log.3.gz 61.82.152.224 1 0% 1 100% access.log.4.gz 80.58.5.46 1 1% 1 100% access.log.5.gz access.log.6.gz access.log.7.gz 84.73.213.191 19 14% 3 15% 85.185.3.21 1 0% 1 100% 219.254.42.113 1 0% 1 100% access.log.8.gz 222.240.20.233 2 2% 2 100% 81.213.170.190 1 1% 1 100% access.log.9.gz 222.240.20.233 2 2% 2 100% access.log.10.gz 222.240.20.233 3 2% 3 100% 144.132.244.81 1 0% 1 100% access.log.11.gz 216.114.169.72 7 5% 1 14% 222.240.20.233 2 1% 2 100% 62.64.141.202 1 0% 1 100% access.log.12.gz 222.240.21.96 2 2% 2 100% 220.169.26.233 1 1% 1 100% access.log.13.gz 222.240.21.96 3 5% 3 100% 195.58.242.97 1 1% 1 100% access.log.14.gz 222.240.21.96 1 2% 1 100%

I agree that it would make sense to list the reason for the edit denied. Was it the IP number? Was it a regular expression? Which one?

As the numbers are so small, I figure that for my sites a simple expiry mechanism will be enough.

– Alex Schroeder 2006-01-29 13:06 UTC

Alex Schroeder