💾 Archived View for ecs.d2evs.net › posts › 2023-11-24-awk.gmi captured on 2024-05-12 at 14:50:52. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-02-05)

-=-=-=-=-=-=-

how to write good awk

awk is one of my favorite languages. it's not a good fit for all (or even most) tasks, but when it does work, it works *really* well

however, good awk code looks pretty different from good code in any other language. awk is by nature an event-based language, and idiomatic awk tries to keep as much of its logic in the action conditions as it can. in particular, try to avoid just putting a bunch of control flow into a single big action with an empty pattern. for example, this:

{
	if ($0 ~ /hi/) {
		print "hii"
	} else {
		print $0
	}
}

would be better as

/hi/ { print "hii"; next }
1

(the `1` here is a fallback which triggers unconditionally and uses the default action of printing $0. this would be clearer with an actual `true` literal, and arguably either `//` or `{ print }` are even better)

simple actions should go entirely on one line, and more complex actions should have one statement per line

you can and should take advantage of the fact that variables default to 0 in order to simplify boolean states. as an example, here's a simple gemtext parser:

/^```/ { preformatted = !preformatted; next }
preformatted { display_preformatted($0); next }

BEGIN { FS = "[ \t]" }
/^=>/ {
	url = $2;
	$1 = "";
	$2 = "";
	sub(/^ */, "");
	display_url(url, $0);
	next;
}

{ display_text($0) }

sometimes you don't want to end patterns with `next`, as in this code which wraps indented lines in a <pre>:

/^\t/ && !pre { print "<pre>"; pre = 1 }
/^\t/ { print substr($0, 2); next }
pre { print "</pre>"; pre = 0 }
1

awk is also an old language, with its share of warts. in particular, while it doesn't have local variables, you can emulate them by adding extra arguments to a function and not passing them in:

function printn(x, i) { for (i = 1; i <= x; i += 1) print $i }
{ i = 5; printn(2); print i }

you should make use of this for all local variables, even when it's not strictly necessary

i don't think i've written a single good conclusion to any of the posts i've written here