WebServerLogs

On this page:

Leech Detector

Wondering whether an IP is hitting you all the time?

#!/usr/bin/perl
use Time::ParseDate;

while (<STDIN>) {
  m/^(\S+) \S+ \S+ \[(.*?)\] ".*?" (\d+) (\d+|-)/ or die "Cannot parse:\n$_";
  $ip = $1;
  $code = $3;
  $size = $4;
  $time = parsedate($2);
  $count{$ip}++;
  $first{$ip} = $time unless $first{$ip};
  $last{$ip} = $time;
  $status{$ip} = () unless exists $status{$ip};
  $status{$ip}{$code}++;
  $size{$ip} += $size;
  $total++;
}
@result = sort {$count{$b} <=> $count{$a}} keys %count;
foreach $ip (@result) {
  $avg = 0;
  if ($first{$ip} and $last{$ip} and $count{$ip} > 1) {
    $avg = ($last{$ip} - $first{$ip}) / ($count{$ip} -1);
  }
  printf ("%20s %10d %5dK %3d%% %7s  %s\n",
          $ip,
          $count{$ip},
          $size{$ip} / $count{$ip} / 1024,
          100 * $count{$ip} / $total,
          $avg ? sprintf('%.1fs', $avg) : '',
          join(', ', map { sprintf("%3d (%d%%)",
                                   $_,
                                   100 * $status{$ip}{$_} / $count{$ip})
                         } sort { $status{$ip}{$b} <=> $status{$ip}{$a} } keys %{$status{$ip}}));
}

Result (edited):

aschroeder@thinkmo:~$ cat /org/org.emacswiki/logs/access.log | leech-detector | head
      82.125.175.xxx        296     3K   6%   15.7s  304 (59%), 200 (28%), 301 (7%), 302 (2%), 404 (2%)
        82.212.8.xxx        192     2K   3%   89.5s  200 (50%), 301 (50%)
       82.52.178.xxx        166    34K   3%   98.7s  304 (36%), 200 (32%), 301 (30%), 302 (1%)
        69.17.54.xxx        145     9K   2%  120.0s  200 (100%)
       84.73.118.xxx        144    30K   2%   88.0s  200 (57%), 304 (22%), 301 (9%), 302 (9%), 404 (0%)
        148.87.1.xxx        110    28K   2%   32.5s  200 (50%), 304 (43%), 302 (3%), 301 (1%)
      83.137.100.xxx         97    71K   1%  159.7s  200 (88%), 302 (10%), 301 (1%)
     130.246.135.xxx         90    35K   1%   83.2s  200 (57%), 302 (27%), 304 (12%), 301 (2%)
       207.46.98.xxx         68     0K   1%  218.9s  403 (100%)
     221.186.146.xxx         62    39K   1%  256.5s  301 (51%), 200 (48%)

It lists **IP**, **number of hits**, **average size of response** (if known), **percentage of total hits**, **average time between hits**, and **HTTP status distribution**.

Maybe I can improve this eventually to weigh results by action taken – people starting searches should count more than people doing page views, and people getting 304 NOT MODIFED should count less than people getting 200 OK...

Writing it as a filter was a great idea. Interesting combinations:

(Banned) Spammer Detector

When the BannedContent test is successful, a 403 FORBIDDEN status is returned for a POST. The following script returns all 403 for POSTs per IP:

BannedContent

#!/usr/bin/perl
use Time::ParseDate;
while (<STDIN>) {
  m/^(\S+) \S+ \S+ \[(.*?)\] "([A-Z]+).*?" (\d+)/ or die "Cannot parse:\n$_";
  $type = $3;
  next unless $type eq 'POST';
  $ip = $1;
  $code = $4;
  $time = parsedate($2);
  $count{$ip}++;
  $first{$ip} = $time unless $first{$ip};
  $last{$ip} = $time;
  $status{$ip} = () unless exists $status{$ip};
  $status{$ip}{$code}++;
  $total++;
}
@result = sort {$count{$b} <=> $count{$a}} keys %count;
foreach $ip (@result) {
  next unless $status{$ip}{403};
  $percent_of_total = 100 * $count{$ip} / $total;
  $percent_forbidden = 100 * $status{$ip}{403} / $count{$ip};
  printf "%20s %10d %3d%% %10d %4d%%\n", $ip, $count{$ip}, $percent_of_total,
    $status{$ip}{403}, $percent_forbidden;
}

Result (edited):

aschroeder@thinkmo:~/bin$ spam-detector < /org/org.emacswiki/logs/access.log
       60.25.120.218          6   2%          6  100%
      221.197.19.228          3   1%          3  100%
      218.69.192.126          3   1%          3  100%
       222.47.24.227          2   0%          1   50%

It lists **IP**, **number of page edits** (POST requests), **percentage of total page edits**, **how often banned content was triggered** (number of 403 Forbidden status codes), **percentage of page edits blocked**. The higher this last number, the more probable the IP was indeed a spammer IP.