gemini://alexschroeder.ch:1965/page/2023-07-16%20Bots%20crawling%20my%20sites
perl -ne 'print "$1\n" if /"([^"]*bot[^"]*)"$/i' \
< /var/log/apache2/access.log.1 \
| sort | uniq -c | sort -n | tail
A perlrun(1) optimization would be the -l switch, which removes newlines from the input, and puts them back on for the output:
perl -nle 'print $1 if /"([^"]*bot[^"]*)"$/i' \
Next, there are five forks. "sort | uniq -c | sort -n" can be replaced with tally(1), which will be a little bit faster, or 158% by one random benchmark. tally(1) got itself written because "sort | uniq -c | sort -n" came up a lot when doing log searches.
Also "sort -nr | head" may be better than "sort -n | tail" given that tail may end up throwing away too much data. Why not put the lines you want up at the start of the pipe?
Or, the tally can be done within perl itself, but that gets pretty long to sort a hash and print the keys thereof. Probably here you may want handy little functions in a custom library, maybe a "hf" for "hash frequency" for easy one-liner access:
$ cat ~/perl5/lib/perl5/y.pm *main::hf = sub (+) { my ($href) = @_; my @s; for my $k ( sort { $href->{$b} <=> $href->{$a} } keys %$href ) { push @s, join $, // "\t", $href->{$k}, $k; } join $/, @s; }; 1; $ cat log test foo bar test foo test $ perl -My -nle '$x{$1}++ if /(.+)/ }{ print hf %x' < log 3 test 2 foo 1 bar
The }{ is a terribly clever trick that probably should not see use in production code. A deparse will show the expansion, but not the horrible "the text was put into a { ... } block but then we snuck a { ... }{ ... } into that block, making two blocks, one of which happens after the loop, or at the end of the script. Less insane code would use an END block, but that's more typing.
$ perl -MO=Deparse -nle '$x++ }{ print $x' ... $ perl -My -nle '$x{$1}++ if /(.+)/; END { print hf %x }' < log
"Golf" in the title is a term for doing something with as few keystrokes as possible, which may have been more important on 300 baud lines.
tags #perl #logs