What am I doing against Community:WikiSpam, WikiVandalism, and DenialOfService attacks?
1. Most spam consists of links. The aim is to increase the target’s pagerank in search engines that consider the number of inbound links. Therefore, BannedContent lists target URLs that are banned. You cannot link to them. Since free webhosters are often used for indirect links (spam points to pages on free webhosters, which in turn point to the target page), many free webhosters are – unfortunately – banned as well.
2. Sometimes a page finds its way into a spam target database. A sure sign of this is that every two weeks, the page is spammed with something new, usually looking very similar to the spam posted last time. In this case, I just lock the page until there’s reason to unlock it again. Until then, enough time will hopefully have passed to get my page out of the spam target database.
3. If vandalism comes from a certain range of IP addresses, I put a regular expression matching them on the BannedHosts page. I pick one of the edits, and pass its IP number to `whois`. If I do this for 85.98.61.4, for example, I find that the network covers numbers 85.98.48.0 to 85.98.63.255. I therefore use the regular expression `^85\.98\.(48|49|5|60|61|62|63)` on the BannedHosts page. This allows users from the network to still read the site, but they can no longer edit it.
You can use a cron job to automatically update BannedContent and BannedHosts. Personally, I use Oddmuse:merge-banned-lists and Oddmuse:wikiput, as follows:
merge-banned-lists \ http://www.emacswiki.org/cgi-bin/wiki/raw/BannedContent \ http://localhost/cgi-bin/wiki/raw/BannedContent \ | wikiput -u Inquisition http://localhost/cgi-bin/wiki/BannedContent
There are various kinds of “leeches”: Clueful, clueless, and dangerous. (A leech is somebody who downloads a lot of pages automatically. This uses a lot of bandwidth, which is bad enough, but in the case of a wiki, it also starts a process on the server for every request. Soon enough, load on the server rises and everything slows down dramatically. This is aggravating, particularly in a shared hosting environment.)
1. Clueful leeches run into the Oddmuse SurgeProtector. Ordinary users can only view ten pages in twenty seconds. Thus, a burst of ten pages is allowed, and on average a page view every two seconds is ok, too. All further requests beyond the limit will get error responses. A clueful leech will realize this, and stop leeching.
2. A clueless leech will also request pages that nobody should need: Search results, old revisions of pages, and so on. Broken spiders (programs that download pages) sometimes do this, because they follow every link instead of following the instructions embedded into Oddmuse pages. If these request take longer than two seconds to answer, then these people can crash the server even if they bypass the SurgeProtector. This is where a second line of defense comes in: I use the Oddmuse:Limit Number Of Instances Running extension and limit the number of processes to ten. It’s also a good protection against infinite loops in the code when you’re developing. ;)
3. If I am getting leeched so hard that it practically amounts to a denial of server attack, such that load on the server rises significantly, I have to prevent users in that network from starting my script. I therefore have to use the `.htaccess` file in my cgi-bin directory. Just as before, I pick an IP number and pass it to `whois`. If I do this for 202.108.1.2 I find that the network covers numbers 202.108.0.0 to 202.108.255.255. I therefore use the Apache Deny directive in my `.htaccess` file with the pattern 202.108.
Oddmuse:Limit Number Of Instances Running
(Please contact me if you want to remove your comment.)
⁂
Thanks for writing this. Very helpful 😄
– greywulf 2006-04-26 18:58 UTC