2019-07-09 Web Requests

I have dozens of little scripts, bash functions and aliases...

This one, for example, shows me the distribution of HTTP codes Apache is returning. These are the codes you’re going to see in the list below:

+------+--------------------------------+
| code |            message             |
+------+--------------------------------+
|  200 | OK, page was served            |
|  301 | redirect (permanent, this      |
|      | should go down)                |
|  302 | redirect (temporary, this      |
|      | might change in the future)    |
|  304 | not modified (no need to serve |
|      | page, this is the best)        |
|  404 | forbidden (this is a suspected |
|      | bot or spammer)                |
|  404 | not found (should be changed   |
|      | into a redirect)               |
|  408 | timeout (this is a problem)    |
|  502 | bad gateway (this is a         |
|      | problem)                       |
+------+--------------------------------+
root@sibirocobombus:~# time-grouping
   09/Jul/2019:12:40        524     5%  403 (41%), 200 (40%), 301 (7%), 404 (4%), 304 (3%), 408 (1%), 302 (0%), 400 (0%)
   09/Jul/2019:12:30        677     6%  200 (69%), 301 (14%), 408 (4%), 404 (4%), 304 (3%), 403 (2%), 302 (0%)
   09/Jul/2019:12:20        628     6%  200 (69%), 301 (10%), 304 (5%), 408 (4%), 302 (3%), 404 (3%), 403 (2%)
   09/Jul/2019:12:10        860     8%  200 (57%), 403 (17%), 301 (13%), 304 (3%), 408 (3%), 404 (3%), 400 (0%), 302 (0%)
   09/Jul/2019:12:00        936     9%  403 (42%), 200 (40%), 301 (8%), 408 (3%), 304 (2%), 404 (1%), 302 (0%)
   09/Jul/2019:11:50        578     5%  200 (69%), 301 (12%), 304 (7%), 408 (3%), 403 (2%), 302 (1%), 404 (1%)
   09/Jul/2019:11:40        705     7%  200 (65%), 301 (10%), 503 (9%), 304 (4%), 408 (3%), 404 (2%), 403 (2%), 502 (1%), 302 (0%)
   09/Jul/2019:11:30        719     7%  200 (72%), 301 (9%), 304 (6%), 408 (5%), 404 (3%), 403 (1%), 302 (0%)
   09/Jul/2019:11:20        740     7%  200 (71%), 301 (10%), 304 (6%), 403 (4%), 408 (4%), 404 (1%), 302 (0%)
   09/Jul/2019:11:10        673     6%  200 (68%), 301 (13%), 408 (6%), 404 (4%), 304 (3%), 403 (2%), 302 (0%)
   09/Jul/2019:11:00        694     6%  200 (70%), 301 (12%), 304 (6%), 408 (4%), 404 (3%), 403 (1%), 302 (0%), 502 (0%)
   09/Jul/2019:10:50        580     5%  200 (72%), 301 (14%), 304 (5%), 404 (3%), 408 (2%), 403 (1%), 302 (0%), 400 (0%)
   09/Jul/2019:10:40        707     7%  200 (76%), 301 (9%), 304 (4%), 408 (3%), 404 (3%), 403 (1%), 302 (1%)
   09/Jul/2019:10:30        805     8%  200 (71%), 301 (14%), 408 (4%), 403 (3%), 404 (2%), 304 (2%), 302 (0%)
   09/Jul/2019:10:20        174     1%  200 (70%), 301 (12%), 304 (6%), 408 (6%), 404 (1%), 403 (1%), 302 (0%)

The `time-grouping` is an alias for:

tail -n 10000 /var/log/apache2/access.log | /home/alex/bin/time-grouping 10

And `/home/alex/bin/time-grouping` is a Perl script to parse `access.log`:

#!/usr/bin/env perl
use Modern::Perl;
use Time::ParseDate;

die "Argument '10min' to use smaller bucket size\n" if grep { $_ eq '--help' } @ARGV;

my $bucket = '1h';
$bucket = '10min' if grep { $_ eq '10' } @ARGV;

my $ESC = "\x1b";

sub red {
  my $text = shift;
  return '' unless $text;
  return "$ESC\[31m$text$ESC\[0m";
}

sub bold_red {
  my $text = shift;
  return '' unless $text;
  return "$ESC\[1;31m$text$ESC\[0m";
}

sub green {
  my $text = shift;
  return '' unless $text;
  return "$ESC\[32m$text$ESC\[0m";
}

sub yellow {
  my $text = shift;
  return '' unless $text;
  return "$ESC\[33m$text$ESC\[0m";
}

sub color_code {
  my $code = shift;
  return '' unless $code;
  return green($code) if substr($code,0,1) eq '2' or $code eq '304'; # or not modified
  return red($code) if substr($code,0,1) eq '5' or $code eq '408'; # server errors or timeout
  # return yellow($code) if substr($code,0,1) eq '3'; # redirects
  return yellow($code) if substr($code,0,1) eq '4'; # error messages
  return $code;
}

my %latest;
my %status;
my %count;
my $total;
while (<STDIN>) {
  m/^(?:(\S+):\S+ )?(\S+) \S+ \S+ \[(.*?)\] "(.*?)" (\d+)/ or die "Cannot parse:\n$_";
  my $host = $1;
  my $ip = $2;
  my $ts = $3;
  my $url = $4;
  my $code = $5;
  my $time = parsedate($ts);
  my $label;
  $label = substr($ts,0,14) if $bucket eq '1h';
  $label = substr($ts,0,16) . '0' if $bucket eq '10min';
  $total++;
  $latest{$label}=$time;
  $status{$label} = () unless exists $status{$label};
  $status{$label}{$code}++;
  $count{$label}++;
}
my @result = sort {$latest{$b} <=> $latest{$a}} keys %count;
foreach my $label (@result) {
  printf "%20s %10d   %3d%%  %s\n", $label, $count{$label}, 100* $count{$label} / $total,
      join(', ', map { sprintf("%3s (%d%%)",
			       color_code($_),
			       100 * $status{$label}{$_} / $count{$label})
	   } sort { $status{$label}{$b} <=> $status{$label}{$a} } keys %{$status{$label}});
}

​#Administration