2022-04-03 Where does wiki spam come from?

It’s Sunday early afternoon and we’re reading social media. My wife tells me all the things I did not know about Newt Scamander because a new movie is out. If only J. K. Rowling weren’t such a Trans-Exclusionary Radical Feminism (TERF). Also, why wasn’t Hermione the main character. Anyway… – TERF

TERF

For a while now I’ve been handling wiki spam by reverting changes and banning the entire IP range where the spammer comes from, based on the assumption that almost all of these are commercial ISPs. That is, chances of blocking people are small (those would be residential blocks). Also, I’m lucky and all the spammers seem to be using IPv4 which makes it easy for me to reverse lookup their IP range. Now I want to confirm or deny my prejudice that Russia and Ukraine are the lead spammers (and no longer China).

Trying to figure it out: going through the ranges on my blocklist (see BannedHosts).

BannedHosts

First, get the raw data:

wget https://campaignwiki.org/wiki/raw/BannedHosts

Then I call whois on the upper end of the range, and looking for a key matching “country”, saving that into a file.

#! /usr/bin/env perl
use Modern::Perl;
use Net::Whois::IP qw(whoisip_query);
while(<STDIN>) {
  next unless /- (\S*)\]/;
  my $ip = $1;
  my $response = whoisip_query($ip);
  foreach (sort keys(%{$response}) ) {
    next unless $_ =~ /country/i;
    say "$_ $response->{$_}";
  }
}

Save the countries:

./countries.pl < BannedHosts > countries

Sort, count, sort:

sort < countries | sed -e s/country\ //i | uniq -c | sort -gr | head
     23 RU
     14 VN
     12 NL
      9 US
      6 UA
      4 PL
      4 LT
      3 SE
      3 GB
      3 FR

Wiki spammers like hosting in Russia > Vietnam > Netherlands > USA > Ukraine, apparently. Hm. My prejudice partly disproven, I guess.

Also note this presumes that every IP range hosts one spammer. At the bottom end I think this is true. If I don’t block the IP range, I’ll often see the IP spam more pages, or repeatedly spam the same page over the next few days, with very similar spam.

At the upper end, I’m not so sure. There was a surge a while back where I suspect that XRumer added support for Oddmuse. If you’re a developer for this kind of software, I wish you the worst of luck. Anyway, multiple bans on my end could all be the same spammer using multiple networks.

Common defensive actions by webmasters are to institute IP-based posting bans on subnetworks used by the spammers. – XRumer

XRumer

Looking back, I reset the list on 2020-05-26, so many of these bans might be from before the invasion of Ukraine by Russia. In any case, the last few additions since March 2022:

Looks like Vietnam, the Netherlands, and the USA have been losing ground! 😆

@dentangle told how georbl.info allows country lookup directly using a reverse DNS lookup. This is how to find that my server is in Switzerland:

@dentangle

dig -t txt 237.50.209.178.country.georbl.info | grep "^[^;]"
237.50.209.178.country.georbl.info. 86362 IN TXT "CH"

So rewriting that Perl script:

#! /usr/bin/env perl
use Modern::Perl;
use Net::DNS;
while(<STDIN>) {
  next unless /- (\S*)\]/;
  my $ip = $1;
  my $reverse = join(".", reverse(split(/\./, $ip))) . ".country.georbl.info";
  say join(", ", map { $_->txtdata } grep { $_->type eq "TXT" } rr($reverse, "TXT"));
}

Let’s do it again:

wget https://campaignwiki.org/wiki/raw/BannedHosts
./countries.pl < BannedHosts > countries
sort < countries | uniq -c | sort -gr | head
     23 RU
     14 VN
     10 NL
      9 US
      8 GB
      7 SE
      6 UA
      4 RO
      4 LT
      4 DE

Wow. The lists are different!

@dentangle says:

@dentangle

The data in georbl.info is sourced directly from the regional registries (RIPE etc.), and reflects what the ASN has declared that IP to be. WHOIS should, in theory, be the same but whois data seems to be getting less reliable over time as various orgs mess with it and try to redirect you to web interfaces etc.

Sounds plausible.

​#Spam