On this page:
Wondering whether an IP is hitting you all the time?
#!/usr/bin/perl use Time::ParseDate; while (<STDIN>) { m/^(\S+) \S+ \S+ \[(.*?)\] ".*?" (\d+) (\d+|-)/ or die "Cannot parse:\n$_"; $ip = $1; $code = $3; $size = $4; $time = parsedate($2); $count{$ip}++; $first{$ip} = $time unless $first{$ip}; $last{$ip} = $time; $status{$ip} = () unless exists $status{$ip}; $status{$ip}{$code}++; $size{$ip} += $size; $total++; } @result = sort {$count{$b} <=> $count{$a}} keys %count; foreach $ip (@result) { $avg = 0; if ($first{$ip} and $last{$ip} and $count{$ip} > 1) { $avg = ($last{$ip} - $first{$ip}) / ($count{$ip} -1); } printf ("%20s %10d %5dK %3d%% %7s %s\n", $ip, $count{$ip}, $size{$ip} / $count{$ip} / 1024, 100 * $count{$ip} / $total, $avg ? sprintf('%.1fs', $avg) : '', join(', ', map { sprintf("%3d (%d%%)", $_, 100 * $status{$ip}{$_} / $count{$ip}) } sort { $status{$ip}{$b} <=> $status{$ip}{$a} } keys %{$status{$ip}})); }
Result (edited):
aschroeder@thinkmo:~$ cat /org/org.emacswiki/logs/access.log | leech-detector | head 82.125.175.xxx 296 3K 6% 15.7s 304 (59%), 200 (28%), 301 (7%), 302 (2%), 404 (2%) 82.212.8.xxx 192 2K 3% 89.5s 200 (50%), 301 (50%) 82.52.178.xxx 166 34K 3% 98.7s 304 (36%), 200 (32%), 301 (30%), 302 (1%) 69.17.54.xxx 145 9K 2% 120.0s 200 (100%) 84.73.118.xxx 144 30K 2% 88.0s 200 (57%), 304 (22%), 301 (9%), 302 (9%), 404 (0%) 148.87.1.xxx 110 28K 2% 32.5s 200 (50%), 304 (43%), 302 (3%), 301 (1%) 83.137.100.xxx 97 71K 1% 159.7s 200 (88%), 302 (10%), 301 (1%) 130.246.135.xxx 90 35K 1% 83.2s 200 (57%), 302 (27%), 304 (12%), 301 (2%) 207.46.98.xxx 68 0K 1% 218.9s 403 (100%) 221.186.146.xxx 62 39K 1% 256.5s 301 (51%), 200 (48%)
It lists **IP**, **number of hits**, **average size of response** (if known), **percentage of total hits**, **average time between hits**, and **HTTP status distribution**.
Maybe I can improve this eventually to weigh results by action taken – people starting searches should count more than people doing page views, and people getting 304 NOT MODIFED should count less than people getting 200 OK...
Writing it as a filter was a great idea. Interesting combinations:
When the BannedContent test is successful, a 403 FORBIDDEN status is returned for a POST. The following script returns all 403 for POSTs per IP:
#!/usr/bin/perl use Time::ParseDate; while (<STDIN>) { m/^(\S+) \S+ \S+ \[(.*?)\] "([A-Z]+).*?" (\d+)/ or die "Cannot parse:\n$_"; $type = $3; next unless $type eq 'POST'; $ip = $1; $code = $4; $time = parsedate($2); $count{$ip}++; $first{$ip} = $time unless $first{$ip}; $last{$ip} = $time; $status{$ip} = () unless exists $status{$ip}; $status{$ip}{$code}++; $total++; } @result = sort {$count{$b} <=> $count{$a}} keys %count; foreach $ip (@result) { next unless $status{$ip}{403}; $percent_of_total = 100 * $count{$ip} / $total; $percent_forbidden = 100 * $status{$ip}{403} / $count{$ip}; printf "%20s %10d %3d%% %10d %4d%%\n", $ip, $count{$ip}, $percent_of_total, $status{$ip}{403}, $percent_forbidden; }
Result (edited):
aschroeder@thinkmo:~/bin$ spam-detector < /org/org.emacswiki/logs/access.log 6 2% 6 100% 3 1% 3 100% 3 1% 3 100% 2 0% 1 50%
It lists **IP**, **number of page edits** (POST requests), **percentage of total page edits**, **how often banned content was triggered** (number of 403 Forbidden status codes), **percentage of page edits blocked**. The higher this last number, the more probable the IP was indeed a spammer IP.