💾 Archived View for ecs.d2evs.net › posts › 2024-02-07-awkbot.gmi captured on 2024-08-31 at 12:06:22. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-05-10)

🚧 View Differences

-=-=-=-=-=-=-

the world's awkiest irc bot

if you've hung out around me long enough, you'll know that i have a bit of an unhealthy love of awk. some of that is because it's just a nice language, but a lot of it can be traced back to this one exchange i had on irc a few years ago:

2021-09-22 22:02:51	@etj	hm
2021-09-22 22:02:54	@etj	acctually
2021-09-22 22:03:06	@etj	i think i know the right language for writing microbots in
2021-09-22 22:03:10	@ecs	do tell
2021-09-22 22:03:12	@etj	awk.

but before i can tell you that story, i have to tell you *this* story

on 2020-04-20, i decided to write an irc bot to display search results from duckduckgo. i'm not really sure why i wanted to do that, but i had this to say about it at the time

2020-04-20 16:50:54	@ecs	let's write a *useful* irc bot
2020-04-20 16:51:13	@ecs	something that'll interpret commands like:
2020-04-20 16:51:22	@ecs	!ddg this should show the top search result

for reasons that were funny at the time, she ended up being named "qta", and she soon grew to a healthy 1500loc. she could set reminders, forecast the weather, produce fascinating insights with a markov chain dynamically trained on messages sent in the channel, control an mpd server which i'm pretty sure only ever played metallica because look i don't know, and a bunch of other small things i ended up wanting her to do

on 2020-10-01, inspiration struck, and the first¹ of my jsbot clones was born

¹: technically second, but the scheme bot i wrote on a phone in the middle of a 6-week hiking trip and never actually used doesn't really count

Drew's blog post about jsbot

2020-10-01 00:39:57	@ecs	it would be nice to have a good™ embeddable scripting language
2020-10-01 00:40:36	@ecs	https://github.com/d5/tengo maybe
2020-10-01 00:41:17	@etj	!tengo when
2020-10-01 00:42:22	@ecs	https://github.com/traefik/yaegi huh
[snip]
2020-10-01 01:43:22	@ecs	!go fmt.Println("hello, world!")
2020-10-01 01:43:23	quaternia-test	=> hello, world!
2020-10-01 01:43:27	@ecs	\o/

we developed some infrastructure for writing little irc bots inline in go, but it never worked particularly well. go is a kinda verbose language, and that's a problem when your code needs to fit within a 512-byte irc message. we also hacked together persistent data by manually appending go code to a file in /etc which got sourced on boot, with the idea if you wanted to eg. increment a number, you'd run `foo++` and then append `foo++` to /etc/quaternia/init.go

this worked about as well as you'd expect it to

it got ~worse~ better

a year later, etj was complaining about just how bad the !go persistence mechanism was. after having messed with it manually in order to remove a macro that expanded "lol" to "lingerie of love" (don't ask), they were thinking of just removing it entirely ("i really think that the scriptable bot that you write bots in is a bad idea" "or at least langbot shouldn't be the final destination for bots"), and i just thought that "it would be nice to switch to a non-go language", because "livecoding one line at a time over im protocols is the best software development method". etj came up with a half-serious proposal of using awk, and i was "not actually as horrified by that as i feel like i should be", so i got to work implementing it

after a few hours of hacking and a brief break to determine qta's birthday (valentine's day in 1970, apparently):

2021-09-23 03:00:47	@ecs	let's see if this works
2021-09-23 03:01:12	@ecs	.awk add test /^\.awkping$/ { print("pong") }
2021-09-23 03:01:12	+qta	success
2021-09-23 03:01:16	@ecs	.awkping
2021-09-23 03:01:16	+qta	pong

(note: the syntax for adding snippets has changed since these logs)

initially, i was worried about persistence

2021-09-23 03:07:25	@ecs	one caveat about this is that
2021-09-23 03:08:03	@ecs	.awk add foo BEGIN { foo = 0 } /bar/ { foo++; print foo; }
2021-09-23 03:08:03	+qta	success
2021-09-23 03:08:07	@ecs	bar
2021-09-23 03:08:07	+qta	1
2021-09-23 03:08:08	@ecs	bar
2021-09-23 03:08:09	+qta	1
2021-09-23 03:08:24	@ecs	it doesn't retain state

but i've since come around. both gobot and jsbot have issues with persistence being optional: it's possible to write programs which look like they work, but which lose data when the bot is rebooted. because awkbot's awk context isn't kept around between messages, you're forced to use the postgres database it provides bindings for if you want to keep any data around at all, and rebooting the bot is guaranteed to never break anything

once i'd written the initial version of awkbot, i slowly started making it more and more powerful in order to be able to rewrite more and more of the original bot in awk. i even managed to rewrite half of the bot itself in awk - the interface for adding, listing, and removing awk snippets was originally written in go, but once i added a function to run arbitrary sql queries from awk², i was able to delete those 127 lines of code

²: not a trivial task. goawk has ffi, but go functions that're callable from awk can only take in and return primitive types and strings, so i had to do some creative escaping in order to pull the argument and result arrays across that boundary

one substantial improvement i managed to make over jsbot is the ability to execute code at an arbitrary time, rather than being limited to replying to messages. you can call `at(date, cmd)` to add `cmd` to a table, marked with the timestamp `date`. every 10 seconds, the go code executes the snippet named "__ontick__", which looks through that table and executes any code whose timestamp is in the past. another hacked-together system sits on top of this for implementing repeating commands, allowing you to, for example, print "hi" every 10 minutes by running `.cron now "in 10m" '{ print "hi" }'`. don't ask how that works, you don't want to know

somewhere along the line we decided that it'd be a good idea to give awkbot an http client, so now there's some awk code to print out url titles, interface with the schedule api for my local public transit agency, check the weather, and a bunch more stuff. she also knows how to parse json, xml, and html, though i'm thinking of trying to rewrite some of that in awk

the old bot weighed in at around 2.2kloc at her peak, and i finished rewriting the last part (my rss reader³) in awk on 2022-12-10. the new bot consists of 156 snippets of awk code totalling 29,981 characters, and 428 lines (9.155 characters) of go code. she's grown a brainfuck interpreter, an implementation of the geohashing algorithm, half of a cube timer, and dozens of other things i don't have time to list

³: said rss reader once got me a politely worded email from someone whose blog i followed asking me to please stop hammering her rss feed once every 10 seconds. i fixed it, now qta only hammers rss feeds once every 10 minutes)

every so often i come back and hack on her some more, but for the most part qta's just become part of my life at this point. sometimes i decide to add another organizational tool to her, and on occasion some of them even get a bit of use. i recently sorted out some race conditions in the output-channel management⁴, because a youtube rss feed outage was causing her to yell at me really really loudly and incessantly in dms. i also just rewrote the geocoding, reverse geocoding, shlexing, and human-friendly datetime parsing bits in awk, which shaved off around 80 lines of go code and got rid of quite a few dependencies

⁴: the go code provides a setchan() function, which controls the channel that data is printed to, but because user code used to run inside a `print eval(...)`, that data wasn't actually sent to the pseudo-stdout until it finished evaluating, which meant that if you did something along the lines of `print "hi"; setchan("#a")`, the "hi" would be sent to #a. in order to solve this i added a "passthrough" parameter to eval(), which tells it to immediately write everything to the channel in addition to buffering it up to be returned as a string. then i discovered that it still didn't work because for some reason i was having goawk print things into an io.Pipe which a different goroutine was reading from, rather than just having an io.Writer which writes messages to irc. anyways it works now, though i didn't quite manage to fix it before youtube fixed their feeds

there's no actual moral to this story. you probably shouldn't ever use this code, but the source code is linked below if you're curious. bye!

https://git.sr.ht/~ecs/awkbot

errata