Yesterday I wrote about "The awk of the future" from Rob Pike's structural regular expressions paper. I've been hacking together something to try those ideas and this post will serve as my current brain dump.
Instead of setting RS to different values, what if you specify a pattern to match records, and then sub-patterns. Recreating awk's default of \n separating records would look like this:
/^.*$/ { /^#/ { } # matches a "#" at the start of a line /^>/ { } # ditto for ">" }
You could change this to operate on words:
/[a-zA-Z]+/ { /^#/ { } # matches a "#" at the start of a word }
Or perhaps every word before an exclamation mark:
/([a-zA-Z]+)!+/ { /^#/ { } # matches a "#" at the start of a word before a "!" }
If there are any groups, the first group becomes the record.
I'm not sure how this applies to FS, perhaps it's not needed as you could continue using sub-patterns. Here's now I picture that working:
/^.*$/ { # $0 = whole file, $1 = line 1, $2 = line 2 etc /([a-zA-Z]+)+/ { # $0 = whole line, $1 = word 1, $2 = word 2 etc } }
A shorter way to do this (or sensible defaults so it acts like regular awk) would be good.
I think awk's variables are always globally scoped, with nested patterns and actions, we'll need local scoping. Maybe action scoping, instead of full block scoping.
gate
2020-08-26