💾 Archived View for thrig.me › blog › 2023 › 07 › 20 › golf-log-commands.gmi captured on 2024-08-31 at 12:15:29. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-11-14)

-=-=-=-=-=-=-

Golf Log Commands

gemini://alexschroeder.ch:1965/page/2023-07-16%20Bots%20crawling%20my%20sites

perl -ne 'print "$1\n" if /"([^"]*bot[^"]*)"$/i' \
< /var/log/apache2/access.log.1 \
| sort | uniq -c | sort -n | tail

A perlrun(1) optimization would be the -l switch, which removes newlines from the input, and puts them back on for the output:

    perl -nle 'print $1 if /"([^"]*bot[^"]*)"$/i' \

Next, there are five forks. "sort | uniq -c | sort -n" can be replaced with tally(1), which will be a little bit faster, or 158% by one random benchmark. tally(1) got itself written because "sort | uniq -c | sort -n" came up a lot when doing log searches.

tally is somewhere herein

Also "sort -nr | head" may be better than "sort -n | tail" given that tail may end up throwing away too much data. Why not put the lines you want up at the start of the pipe?

Or, the tally can be done within perl itself, but that gets pretty long to sort a hash and print the keys thereof. Probably here you may want handy little functions in a custom library, maybe a "hf" for "hash frequency" for easy one-liner access:

    $ cat ~/perl5/lib/perl5/y.pm
    *main::hf = sub (+) {
        my ($href) = @_;
        my @s;
        for my $k ( sort { $href->{$b} <=> $href->{$a} } keys %$href ) {
            push @s, join $, // "\t", $href->{$k}, $k;
        }
        join $/, @s;
    };
    1;
    $ cat log
    test
    foo
    bar
    test
    foo
    test
    $ perl -My -nle '$x{$1}++ if /(.+)/ }{ print hf %x' < log
    3       test
    2       foo
    1       bar

The }{ is a terribly clever trick that probably should not see use in production code. A deparse will show the expansion, but not the horrible "the text was put into a { ... } block but then we snuck a { ... }{ ... } into that block, making two blocks, one of which happens after the loop, or at the end of the script. Less insane code would use an END block, but that's more typing.

    $ perl -MO=Deparse -nle '$x++ }{ print $x'
    ...
    $ perl -My -nle '$x{$1}++ if /(.+)/; END { print hf %x }' < log

"Golf" in the title is a term for doing something with as few keystrokes as possible, which may have been more important on 300 baud lines.

tags #perl #logs