💾 Archived View for gemini.ctrl-c.club › ~stack › gemlog › 2022-09-28.extreme.forth.gmi captured on 2023-05-24 at 18:40:42. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
Forth is a curious animal, especially when stripped down to the minimum.
Forth has no syntax. The input is a list of words and literals. The tokenization/compilation of the source is normally a linear process (that is, procedure calls just compile tokens that will invoke the procedure when executed); however immediate words can take over the parsing process and do whatever they want.
It is tempting to create a higher-level syntax with words like IF ... THEN and even ELSE, which fuckles the data IF leaves on the stack during compilation and rearranges the targets of jumps. This is cool when you first see it, but...
Without such contrivances, Forth is decompilable to original source -- that is, the codestream is equivalent to the source! But words like IF are not tokens, they compile conditional jumps, which get pretty ugly to look at, especially if ELSE is involved.
Lisp has parentheses, which are not syntactic sugar -- each parenthesised expression is actually a list - at least before it is compiled down to code by modern Lisps. Originally, Lisp machines interpreted these lists... It annoys the hell out of me when Lisp newbies complain about the parentheses - the whole point of lisp is having these wonderful parentheses! Every couple of years some idiot comes up with a way 'to eliminate the parentheses'... God help us all.
Take the IF example. In Lisp, it is a list with two or three items: the reference to symbol 'IF, the true clause, and if it exists, the else clause. There is no ambiguity or mystery there. A dumb printer can show the internal representation as source. Tokenized Forth has no boundaries of the IF structure; it can be inferred by looking at the 0BRANCH and BRANCH targets, but that requires internal knowledge of the inner workings of IF.
So, it is not unreasonable to say, don't do things like that. The token stream should decompile to original source, more or less (we do lose comments, and dereferenced constants lose names, etc, but that is life).
Such minimalism may seem brutal, but there are ways to deal with it, and perhaps we are better for it. ColorForth is one example of a system with brutal constraints, although I mostly disagree with the design decisions taken by Chuck.
Take FOR ... NEXT loop, for instance. It is tokenized as <FOR> ... <NEXT>, and executed at least once. The implementation is simple: <FOR> puts the count and IP onto the stack, and <NEXT> decrements the count, branching to the stored IP if the count is not zero. When it reaches zero, <NEXT> cleans up the stack and continues.
Now, it will execute at least once, and some people will bitch and moan. You could <0BRANCH> over the loop if you really want, if you allow such branches.
Some extreme Forthwrites go a step farther: do not allow conditional branches at all; do a conditional return instead. That sounds extreme, but a tokenized Forth stream with no interruptions for heads makes code like this not entirely hard to deal with.
The cost of a call in a token interpreter, especially my interpreter, is low, so separating conditionals and loops into quasi-subroutines which have conditional returns is a viable strategy.
I am excited about trying some of these extreme techniques out.