💾 Archived View for thrig.me › blog › 2022 › 10 › 25 › shell-while-loop-considered-harmful.gmi captured on 2023-09-28 at 16:35:11. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-04-19)
-=-=-=-=-=-=-
$ printf 'one\ntwo' | while read l; do echo "$l"; done one $
Whoops, silent data loss. POSIX requires that text files end in an ultimate newline to be considered text files, but in practice that ultimate newline may be absent and then...yeah. How many shell scripts are out there and how many ultimate lines have been lost this way?
A less terrible version is, well, verbose.
$ printf 'one\ntwo' | while IFS= read -r l || [ -n "$l" ]; do printf '%s\n' "$l"; done one two
And, slow. Do you want under a second (C, Perl), or seven seconds (ksh)?
$ perl -E 'say for 1..1_000_000' | time ./shellwhile >/dev/null 0m07.34s real 0m03.48s user 0m03.80s system $ perl -E 'say for 1..1_000_000' | time ./perlwhile >/dev/null 0m00.71s real 0m00.45s user 0m00.25s system $ perl -E 'say for 1..1_000_000' | time ./cwhile >/dev/null 0m00.58s real 0m00.03s user 0m00.11s system
Typically the shell will be a order, or orders, of magnitude slower, especially when it forks external tools; the above uses the shell internal echo to make the numbers for the shell less bad. echo is not portable nor safe for random input, but the portable printf(1) involves a fork which would make the shell performance even worse.
Okay for fast prototyping, terrible for most anything else.
Therefore, if shell code will have something tricky and non-performant like a while loop in it, I generally write that code in some other language.
$ cat shellwhile #!/bin/ksh while IFS= read -r l || [ -n "$l" ]; do echo "$l"; done $ cat perlwhile #!/usr/bin/perl print while readline; $ cat cwhile.c #include <stdio.h> int main(void) { char *line = NULL; size_t linesize = 0; ssize_t linelen; while ((linelen = getline(&line, &linesize, stdin)) != -1) fwrite(line, linelen, 1, stdout); return 0; } $ cat lispwhile.lisp (defun main () (loop for line = (read-line *standard-input* nil nil) while line do (write-line line))) (sb-ext:save-lisp-and-die "lispwhile" :executable t :toplevel 'main)
Commands or groups of commands that are often run should probably be rewritten to be more efficient; consider counting the most frequent of input lines, which might realistically be some portion of a logfile:
$ printf 'a\nb\na\na\nb\nc\n' | sort | uniq -c | sort -nr 3 a 2 b 1 c
This gets the job done, but is slow. Somewhat faster is to place all the lines into a hash of line => count pairs, and then to sort by the count, all within a single process. More efficient, but you have to actually notice the pattern, worry about the CPU waste, and then write a specific tool for it.
Rate shell perl tally shell 88.0/s -- -41% -61% perl 149/s 70% -- -34% tally 227/s 158% 52% --
https://thrig.me/src/scripts.git
tags #ksh #c #sh #perl #lisp