💾 Archived View for elektito.com › gemlog › 16-starforth-1 captured on 2024-06-16 at 12:16:22. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-07-10)

-=-=-=-=-=-=-

Starforth: A Minimal Self-Hosting Compiler

This is going to be the beginning of a series of articles on writing a
self-hosting compiler. This is an introductory post. No source code or actual
development is going to be discussed in this installment, or possibly not even
in the next one. But we'll get there...hopefully!
EDIT: Next installment is now available:

Starforth v1: How to Write a Compiler without a Compiler!

I've always been vaguely aware of Forth. It's the language that works with only

a stack. Which also means it uses reverse polish notation, that is, instead of

`2 + 2`, you write `2 2 +`.

My (inaccurate) knowledge of Forth ended there. But then recently I was reading

up on Forth and I realized something. Here's how you define a variable named

`foo` in Forth:

variable foo

Here's how you write `100` to `foo`:

100 foo !

And for reading it:

foo @

`@` and `!` are called "fetch" and "store" in Forth nomenclature. I sorta

understood why you'd need "store", but what about "fetch"? You don't need an

operator to read variables in other languages, do you? Also, if `@` reads a

variable, and puts its value on the stack, what exactly does `foo` put on the

stack?

A bit of experimentation, and it dawned on me: `foo` puts the address of the

variable on the stack, and `@` reads an address and de-references it. Forth has

pointers!

Just like that, Forth found its place in the language hierarchy I keep in my

head. If Lisp, is the highest level language, Forth is the lowest-level one

(except for assembly, obviously).

I liked that so much, that I decided it's time I start a new compiler project.

I'm gonna call it Starforth. For no reason at all!

Goals

I set these goals for myself:

- The compiler should produce machine code. I've written compiler's targeting virtual machines before. This time I want to generate real machine code.

- The compiler should be self-hosting, that is, it should be able to compile its own source code.

- This is going to be an ahead-of-time compiler. While Forths are generally written in an interpreter style known as threaded-code, I've always had a fondness for compilers that do their job and get out of the way. I realized later that there are good reasons for Forths being written the way they are, but the goal stands nonetheless.

- The compiler should be bootstrappable with nothing more than an assembler. I want the starting binary to be as small as possible.

And here are some decisions to make things more concrete:

- The target environment is Linux. Both the compiler, and the binary it produces run on Linux. It would be nice if I can write a version of the compiler capable of producing code that can run on bare metal, but that should be left for another time.

- The compiler will produce 32-bit x86 code. The code is smaller than 64-bit, and more similar to the good old days! Also, 4 gigabytes of memory should be enough for everyone, right? Note that 32-bit binaries can run just fine on Linux, as long as they don't reference any non-existing shared objects, which brings us to the next point...

- The compiler will output self-contained ELF binaries. No shared objects allowed, and libc is not used (neither statically nor dynamically).

Upcoming Series

I've already written Starforth at the time of this writing. You can find the source code on Sourcehut:

https://git.sr.ht/~elektito/starforth

I plan to write a series of articles about the development process, starting from my earliest iterations and going forward from there. I'll be using my memory, which is hopefully still fresh, and my trusty git logs.