Today there was some uncaught spam on EmacsWiki:RecentChanges. If you look at the list below, you’ll see how many edits a certain IP number made, what percentage of all edits those comprised, how many edits were rejected, and what percentage of the edits made were rejected.
Day 0 just started, so the number of total edits is very small which explains the large 28%. Other than that, however, I think we see that there’s not a huge barrage of WikiSpam hitting EmacsWiki.
day IP Edits %Total Rejects %Edits 0 218.18.168.242 19 28% 8 42% 1 218.73.40.82 7 2% 2 28% 85.100.24.112 1 0% 1 100% 2 85.100.30.196 1 0% 1 100% 218.76.12.67 1 0% 1 100% 3 4 61.49.133.236 3 1% 1 33% 218.80.158.89 3 1% 3 100% 5 6 7 8 9 10 11 220.168.99.47 1 0% 1 100% 12 13 220.168.99.47 1 0% 1 100% 14
Why am I mentioning this at all? The EmacsWiki:BannedContent list is getting too big. If somebody like DrewAdams updates more than 80 source files on EmacsWiki on a single day, then the Oddmuse:Despam Action crashes. The script is killed without being able to finish, leaving lockfiles behind, etc.
I tried it with a much shorter list of banned regular expressions and it worked like a charm. I therefore think I will do the following:
1. Automatically expire older regular expressions (eg. after a year)
2. Change the format to the SharedAntiSpam format in order to add dates
#Wikis #Spam
(Please contact me if you want to remove your comment.)
⁂
It would be good to keep a timestamp when a blacklist regexp is added, but even more useful would be if the engine kept a timestamp of when a regexp last caught some spam. Should be possible hey?
I mentioned this and some other ideas for a more advanced blacklist format in the discussion here: http://wiki.chongqed.org//ContentBanning Taken to extremes you can end up trying to devise DNS style trust mechanisms, but maybe that’s a bit ambitious.
http://wiki.chongqed.org//ContentBanning
– Halz 2006-01-27 13:40 UTC
---
That is true. However, when I look at the numbers, I don’t think it is worth my time – yet. See WebServerLogs for the script.
IP requests edit denied aschroeder@thinkmo:~$ spam-detector < /org/org.emacswiki/logs/access.log aschroeder@thinkmo:~$ spam-detector < /org/org.emacswiki/logs/access.log.1 209.22.11.124 1 0% 1 100% aschroeder@thinkmo:~$ for n in 2 3 4 5 6 7 8 9 10 11 12 13 14; do f=access.log.$n.gz; echo $f; zcat /org/org.emacswiki/logs/$f | spam-detector; done access.log.2.gz access.log.3.gz 61.82.152.224 1 0% 1 100% access.log.4.gz 80.58.5.46 1 1% 1 100% access.log.5.gz access.log.6.gz access.log.7.gz 84.73.213.191 19 14% 3 15% 85.185.3.21 1 0% 1 100% 219.254.42.113 1 0% 1 100% access.log.8.gz 222.240.20.233 2 2% 2 100% 81.213.170.190 1 1% 1 100% access.log.9.gz 222.240.20.233 2 2% 2 100% access.log.10.gz 222.240.20.233 3 2% 3 100% 144.132.244.81 1 0% 1 100% access.log.11.gz 216.114.169.72 7 5% 1 14% 222.240.20.233 2 1% 2 100% 62.64.141.202 1 0% 1 100% access.log.12.gz 222.240.21.96 2 2% 2 100% 220.169.26.233 1 1% 1 100% access.log.13.gz 222.240.21.96 3 5% 3 100% 195.58.242.97 1 1% 1 100% access.log.14.gz 222.240.21.96 1 2% 1 100%
I agree that it would make sense to list the reason for the edit denied. Was it the IP number? Was it a regular expression? Which one?
As the numbers are so small, I figure that for my sites a simple expiry mechanism will be enough.
– Alex Schroeder 2006-01-29 13:06 UTC