2015-03-10 Fighting Wiki Spam

I recently wrote something about fighting Emacs Wiki spam on Google+.

on Google+

Wikis

Lauren said “I’m hoping we can stay focused on *practical processes* and solutions *that are workable at very large scales*.” This post is about a wiki that is popular but not as popular as Wikipedia.

Lauren said

I’ve been running Emacs Wiki for a while now. It’s definitely not “at very large scales” but it was certainly a step up from all my previous wiki efforts. I got started back on the Portland Pattern Repository and Meatball Wiki and so I kept trying to live up to Soft Security. That’s why Emacs Wiki still doesn’t have logins and passwords for ordinary users. There are passwords for roles that allow you to lock pages and to edit locked pages, for example.

Soft Security

Rollback

The first line of defense I added was rolling back edits. The first wiki I used allowed you to edit an old revision and save it (click history link, click good revision, scroll to bottom, click edit link, scroll to bottom, click save button). I wanted to speed up the clean up. Now you you click history link and click the rollback button.

The remaining features are reserved for administrators.

→ Wiki Spam

Wiki Spam

Banning URLs by regexp

We are mostly getting link spam. Therefore I soon introduced a list of regular expressions (on a locked page that I could edit together with my co-administrators). In order to speed this up, I added some code to rollbacks. If the rollback removed URLs from the page text, those are listed and you automatically get a form where you can write a regular expression based on the list you’re seeing. Clicking the submit button will add this regular expression to the ban list.

→ Banned Content

Banned Content

Banning IP numbers

This is a very crude measure. Luckily enough, it still works often enough. Perfect for defense in depth. I added more code to rollbacks. After a rollback, administrators are presented with a link to “ban contributors”. If you click on it, you’re presented with the editors of recent page revisions, together with a note indicating whether they have been banned or not. You can check the IP numbers of the contributors not yet banned and click a button to add an appropriate regular expression to the list.

→ Banned Hosts

Banned Hosts

Banning key words

This is also a very crude measure. It’s our last automatic defense. We’ve added a few regular expressions to this list such as the Russian word for porn because we were getting vandals that posted links to forums and the like, together with some keywords, and those forum posts would then contain the link to the material we wanted to ban. The indirection via forum, URL shortener and the like circumvented our earlier ban mechanisms. This was our solution.

→ Banned Regexps

Banned Regexps

Mass Rollback

If we’re under a large scale attack, we can always lock the wiki and wait. Once the damage is done, however, we can reset the wiki to a particular edit, generating the appropriate rollbacks for every page (i.e. these rollbacks are all regular edits and can again be undone by other administrators).

There are also other defense mechanisms unrelated to banning.

CAPTCHA

We only ask for a CAPTCHA once. Answering the question sets a cookie that bypasses the CAPTCHA. Clearly, this only works for a low profile site.

→ reCaptcha Extension

reCaptcha Extension

Surge Protection

If you load too many pages in a short time window, the wiki will start responding with error messages, assuming that you’re a bot and not a human.

→ Surge Protection

Surge Protection

Missing features

There is currently no way to automatically ban the entire IP range given an IP number.

There is currently a semi-automatic expiry process for bans. It would be better if this was automatic. These days I have to run a whois query, type the two IP numbers into my little Python programm, and type the result into my `.htaccess` file.

There used to be a ban network sharing those regular expression lists but it was brittle and so I abandoned it.

​#Wikis ​#Spam

Comments

(Please contact me if you want to remove your comment.)

Great post.

There is one thing about Soft Security that makes me feel worried. What if some insane person wants to terrorize the wiki? That is, a directional attack, without links, without keywords, from different IPs. And all that stuff is mixed with minor edits, impersonation and malicious rollbacks. The only way to undo this mess it to restore the content from a backup (which is rather easy to do with the git extension I use, but it is still a pain).

Soft Security

The only thing that prevents this is the lack of motivation (which is somewhat associated with the absence of competition), but I’m not sure if that applies to all other wikis out there.

– AlexDaniel 2015-03-10 23:55 UTC

AlexDaniel

---

Here are my disorganized thoughts on the limits of Soft Security.

I think we can talk about two different kinds of attacks: a long term infiltration under the radar and a massive attack on multiple levels. Both attacks need to be stopped by fish-bowling the wiki: *making it read-only*.

In order to detect long term infiltration, you need *constant peer review*. I myself have bookmarks for my wikis, obviously. They all look like this: `...wiki?action=rc;showedit=1;days=3;rollback=1;lang=;css=` or similar. I want to see rollbacks and minor edits! This is important.

Clearly, *Soft Security does not work as well if the community is small*, visits rarely, or if there are long periods of inactivity.

In order to defend against sudden onslaughts, the ability to lock the wiki needs to be available to a lot of people that are constantly watching. You also need the ability to mass revert the vandalism. If your site does not provide this, restoring from backup will be necessary.

1. Do you know how to lock the wiki? It must also be easy to unlock—easy enough for your friends to do it, hard enough to prevent the vandals from doing it.

2. Oddmuse has the ability to mass revert vandalism, but only up to the “keep days” window (14 days by default). This time window is important: Do you check at least once in this window? Including holidays and business trips?

3. Do you have backups from various points in time?

by default

In a way, for small communities using Soft Security, *defense in depth* now means *defense in time*.

For Campaign Wiki, for example, there is a way for people to lock and unlock a wiki. The setting is easy enough to find (I hope) but hidden well enough to elude casual spammers.

the abuse Anita Sarkeesian endures every week

– Alex Schroeder 2015-03-11 09:47 UTC