💾 Archived View for gemini.ctrl-c.club › ~nttp › writing › other-tiny-scripting.md captured on 2024-08-25 at 03:35:47.
⬅️ Previous capture (2023-03-20)
-=-=-=-=-=-=-
# Another tiny scripting engine family 18 September 2020 Last year I came up with not one but two kinds of interpreter so tiny they can outright hide into a bigger program and serve as scripting engines. Both were used as the basis for real languages that work, thus proving them viable. Yet when I needed to put one into the [Ramus](https://notimetoplay.org/engines/ramus/) revival this spring, I did something entirely different. We programmers tend to complicate things, but the essence of scripting is being able to tell a computer: "do this; test that; if so then yay, else nay". It's really that simple: a series of statements, each beginning with a word which says how to (wait for it) interpret the remaining words, at least within the same statement. Which in turn is great news for programmers, because it's very easy to parse a language like that: split input into lines and lines into words. Seriously, don't bother with anything more complicated for now. You don't need anything else to parse code like: set a 0 set b $a incr a incr b 2 echo $a $b Primitive? Sure, but it *works*, and people new to programming get it easily. So let's look at one way to make this work. To keep it short, I'll use Python 3. Hopefully it's not too hard to follow even for fans of other languages: variables = {} commands = {} def run_script(text): for line in text.split("\n"): if line == "" or line.isspace(): continue words = [parse_value(i) for i in line.split()] name = words[0] args = words[1:] commands[name](*args) def parse_value(word): if word[0] == "$": return variables[word[1:]] elif word.isdigit(): return int(word) elif word[0] == "-" and word[1:].isdigit(): return int(word) else: return word How's that! I wrote smaller interpreters in the past, but 20 lines is pretty damn good. Could have made it even shorter by going with just one dictionary, but this way it's more clear what I mean. It ran almost on first try, too. Well, it's not going to until we define the actual commands: def do_set(name, value): variables[name] = value def do_incr(name, value=1): variables[name] += value def do_echo(*args): print(*args) commands["set"] = do_set commands["incr"] = do_incr commands["echo"] = do_echo Yep, that's really all you need to run the example at the beginning. (Did you get "1 2" for output as expected?) Other basics can be added just as easily. It might not be as obvious how to add loops and conditionals. After all, we have no way to declare a literal list, for example, never mind blocks of code. But we don't need to, because a command doesn't have to start at the first word of a line. Consider a script like: lappend numbers 1 2 3 4 5 foreach i $numbers echo \$i The `lappend` command is trivial to implement with the current setup: def do_lappend(name, *values): if name not in variables: variables[name] = list(values) else: variables[name].extend(values) commands["lappend"] = do_lappend On the other hand, adding a for-each loop requires reworking all the code so far (except the already defined commands): def run_script(text): for line in text.split("\n"): if line == "" or line.isspace(): continue run_command(line.split()) def run_command(words): words = [parse_value(i) for i in words] name = words[0] args = words[1:] commands[name](*args) def parse_value(word): if word[0] == "\\": return word[1:] elif word[0] == "$": return variables[word[1:]] elif word.isdigit(): return int(word) elif word[0] == "-" and word[1:].isdigit(): return int(word) else: return word See what I did there? Now it's possible to rerun a command that was already split up. Also, to pass through a word starting with a dollar sign so it's only parsed when it's supposed to. It took exactly five more lines, and now we can demonstrate that loop: def do_foreach(name, values, *words): for i in values: variables[name] = i run_command(words) commands["foreach"] = do_foreach The way it works is, `do_foreach` receives all its arguments like any other command, but only keeps two for its own use; the rest make up the loop body. It's fragile and limited: better remember to escape those dollar signs, and you can only have one command in the body. We can work around that, as you'll see, but loops just aren't at home in our little language. Conditionals are another story: set a 3 test $a > 0 iftrue set b \$a iftrue incr b -1 iftrue echo Yes!! iffalse set b 0 iffalse echo Noo... You can probably guess thar `test` is supposed to set a flag that subsequent `iftrue` and `iffalse` commands can check whenever needed. But what about the conditions proper? Do we have to parse arithmetic expressions now? Nope! In fact the syntax of `test` is fixed, always taking the same form: def do_test(op1, test, op2): if test == ">": variables["*TEST*"] = (op1 > op2) else: print("Unknown operator in test:", test) def do_iftrue(*words): if "*TEST*" not in variables: print("iftrue without test") elif variables["*TEST*"]: run_command(words) def do_iffalse(*words): if "*TEST*" not in variables: print("iftrue without test") elif not variables["*TEST*"]: run_command(words) commands["test"] = do_test commands["iftrue"] = do_iftrue commands["iffalse"] = do_iffalse Yes, adding all the other operators would be tedious, but it can't be helped. And yes, the test result can live just fine in an ordinary variable. No need to modify the interpreter just for that. Except the code doesn't work as shown, because `iftrue` and `iffalse` send their respective commands to be re-parsed, and `parse_value` chokes on anything that's not a string (anymore). It would be easy enough to escape arguments like 0 and -1, but seriously? So let's make another small change: def parse_value(word): if type(word) != str: return word elif word[0] == "\\": return word[1:] elif word[0] == "$": return variables[word[1:]] elif word.isdigit(): return int(word) elif word[0] == "-" and word[1:].isdigit(): return int(word) else: return word Now it should finally run and print "Yes!!", but still. Our language just won't get very far without some means to type literal lists, strings, or anything that can work as blocks of code. Good thing it doesn't need to. Well, there is something we can do. For one thing, notice how `run_command` happily accepts any words. Like for example those stored by `do_append`. We just need to tell them about each other: def do_eval(*words): run_command(*words) commands["eval"] = do_eval Just like that, you can now run code like: lappend thumbs-up echo Okay, \$name ! set name Venus eval $thumbs-up set name Steve eval $thumbs-up and get (you've guessed it): Okay, Venus ! Okay, Steve ! Except... it doesn't solve the quoting problem, which was the whole point. We're also going to need some way to tell the interpreter: "set aside any code you see from now on, until further notice, without any parsing". Which in turn means changing `run_script` again: compiling = False proc_name = "" proc_code = [] def run_script(text): global compiling, proc_name, proc_code for line in text.split("\n"): if line == "" or line.isspace(): continue elif not compiling: run_command(line.split()) elif line.strip() != "end": proc_code.append(line.split()) else: variables[proc_name] = list(proc_code) proc_name = "" proc_code.clear() compiling = False Note how procedures (as we're going to call them) are stored alongside variables, because they're not the same as commands. They don't take arguments for one thing. So we'll need some other way to run them: def do_call(name): for i in variables[name]: run_command(i) commands["call"] = do_call But we still haven't given the interpreter a way to start compiling in the first place: def do_proc(name): global compiling, proc_name compiling = True proc_name = name commands["proc"] = do_proc Ta-da! Now we can finally run code that looks like, proc count-up incr counter echo $counter end set counter 0 call count-up call count-up call count-up Note how there's no need for a dollar sign in front of a procedure name when calling it. That's one less way to get in trouble. So now the `foreach` command can do more than one thing in the loop body. Still in a clunky way however, and frankly procedures are a poor fit for the language. Worse, sooner or later we're going to want stuff like while loops, and it's just not possible, the way this works so far. So how about we set this first interpreter aside and start over briefly: variables = {} commands = {} code = [] labels = {} program_counter = 0 def load_script(text): for line in text.split("\n"): if line == "" or line.isspace(): continue else: code.append(line.split()) def run_script(): global program_counter program_counter = 0 while program_counter < len(code): run_command(code[program_counter]) program_counter += 1 You can copy-paste `run_command` and `parse_value` from the original version unchanged. Same for the three commands needed to run the first example, or `lappend` for that matter. Even `foreach` still works like before, so why did we bother? Because now we can also do this: set d 1 label start echo $d incr d test $d > 10 iffalse goto start How? With just two more commands: def do_label(name): global program_counter labels[name] = program_counter def do_goto(name): global program_counter program_counter = labels[name] commands["label"] = do_label commands["goto"] = do_goto Now you see why the new `run_script` relies on a while loop, and why the program counter is visible to the entire interpreter. Speaking of which: note how after a jump, execution continues with the line *after* the label. It can still happen for a label command to be called again, but only if we jump to another label placed before it. Which won't change anything, at least unless the code changes in the mean time. But hopefully no-one is going to try while it's running! How ironic. I've shown you two different ways to do this, and I'm still only three quarters into the intended word count. So let's take the time to look at another issue. Without a way to compose commands, how are we going to, say, take the size of a list and do something with it? Why, it's perfectly possible for a command to place results in predefined variables instead of returning them: def do_length(value): variables["#"] = len(value) commands["length"] = do_length That's right... we can name a variable anything we want. Try it: length $c echo $# length hello echo $# In fact, list / string length is such an often used operation that Lua for example has an operator for it. And we can, too: def parse_value(word): if type(word) != str: return word elif word[0] == "\\": return word[1:] elif word[0] == "$": return variables[word[1:]] elif word[0] == "#": return len(variables[word[1:]]) elif word.isdigit(): return int(word) elif word[0] == "-" and word[1:].isdigit(): return int(word) else: return word Though of course the `echo #c` syntax only works for variables, so `length` is still good to have around. So this is it. No, really: pretty much everything else is a detail. And it's very easy to add new commands; beware of overdoing it! That said, after reading the first draft, a friend (hi, Adrick!) asked what the point of this exercise was. Not that it's supposed to have a point. But it does. It was important for this language to be usable by non-programmers and still have a tiny implementation. Neither a Lisp-like nor a Forth-like would have fit the bill. While this still unnamed language was praised for proving more intuitive than Javascript. And the original version used in production was very much improvised. I should update it using insights from this article at some point. In the mean time, enjoy, and dare to be different.