💾 Archived View for gemini.ctrl-c.club › ~stack › gemlog › 2022-10-02.octoforth.gmi captured on 2023-04-26 at 13:47:13. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

OctoForth progress

Last month I fell into a Forth rabbithole and never came out.

Having implemented a quick toy i386 Forth, with the hope of having a tool for an i386 OS written in assembly, I had so much fun reacquainting myself with the 32-bit i386 (and its limitations, viewed as advantages this time), I just continued plowing ahead.

Over the last week or so I revived my old sliding-window interpreter which allows me to use 8-bit bytecodes, but does not limit me to a fixed assigment, but instead, automatically adjusts the indirection table at compile-time. It is a truly awesome idea, and amazingly, no one else has used it. It is a unique privilege to be (apparently) the only one in the world doing something.

I suppose there is little interest in threaded interpreters in the first place, and Forths that used bytecodes normally incur an extra performance penalty. However, having put on my who-cares-colored sunglasses, I can run a dead loop counting to one billion in less than 3 seconds on my 10-year-old notebook. That's really not bad for a dynamic interpreted language without any bullshit tricks.

And the beast is up and running! I dubbed it OctoForth, for its 8-bit tokens (and it seems no one else uses that name).

It's already a pretty complete Forth-like language in a few kilobytes. The bytecodes are ridiculously compact, and some words are less than 10 bytes/tokens long.

I am test-driving some new ideas:

As per above, the sliding window token interpreter
Separate heads: the dictionary is now pure code, and symbolic names are really labels. You can run one word directly into next, which creates amazing opportunities: prefixing a <2x> token in front of some code will execute it twice, for instance...
No DOCOL: words are a bunch of tokens. Words that start with 0 are CODE words, with machine code following. That eliminates a ton of overhead of indirect forth, but this is not a direct forth but the opposite, if you know what I mean. This allows me to run words into each other.
Hashes of names are used extensively for searching heads; heads don't even have names in them, but refer to source instead. Simple and efficient, and no more variable-sized names and byte counts and such.
Sources are kept in an append-only form, matching the dictionary and heads. Sources are appended as new definitions are compiled. It's nice to have sources for everything in the dictionary! Although the decompile is 99% as good, so I may drop sources and just keep comments!
No STATE - I always compile, and in interactive mode, immediately execute the compiled tokens, and clean up. So it looks like an interpreter. To define new words, I simply don't execute or clean up. This eliminates a whole bunch of Forth annoyances, such as having different versions of words for compiling and interpreting.
No spaghetti conditionals. All code uses { ... } lambdas, and control structures can run them when appropriate (and as many times as appropriate, for looping structures). Slava laid the foundation for this in Factor.
Extremely decompilable tokens: using lambda means no more jumping around for control structures - the code is now completely sequential and it is possible to reason about it without understanding how controls tructures work.
First-class code: with lambdas it is trivial to create pieces of code and pass them around at will. This allows higher-level functions to operate on code as data, making it possible to do things like Lisp and Haskell.

Whew, that is a lot of new stuff.

Needless to say, I am happy as a pig in shit (if you believe that pigs are happy in shit). Anyway, I will be plowing ahead as I can't stop now. More later.

index

home