💾 Archived View for mediocregopher.com › posts › ginger-its-alive.gmi captured on 2024-12-17 at 10:42:00. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-08-18)
-=-=-=-=-=-=-
The new best language for computing fibonacci numbers.
As a kind of Christmas present to myself I took a whole week off of work specifically to dedicate myself to working on ginger.
My concrete goal was to be able to run a ginger program to compute any Nth fibonacci number, a goal I chose because it would require the implementation of conditionals, some kind of looping or recursion, and basic addition/subtraction. In other words, it would require all the elements which comprise a Turing complete language.
And you know what? I actually succeeded!
The implementation can be found here. At this point ginger is an interpreted language running in a golang-based VM. The dream is for it to be self-hosted on LLVM (and other platforms after), but as an intermediate step to that I decided on sticking to what I know (golang) rather than having to learn two things at once.
In this post I'm going to describe the components of this VM at a high level, show a quick demo of it working, and finally talk about the roadmap going forward.
The core package of the whole project is the `graph` package. This package implements a generic directed graph datastructure.
The generic part is worth noting; I was able to take advantage of go's new generics which are currently in beta. I'd read quite a bit on how the generic system would work even before the beta was announced, so I was able to hit the ground running and start using them without much issue.
Ginger's unique graph datastructure has been discussed in previous posts in this series quite a bit, and this latest implementation doesn't deviate much at a high level. Below are the most up-to-date core datatypes and functions which are used to construct ginger graphs:
// Value is any value which can be stored within a Graph. Values should be // considered immutable, ie once used with the graph package their internal // value does not change. type Value interface { Equal(Value) bool String() string } // OpenEdge consists of the edge value (E) and source vertex value (V) of an // edge in a Graph. When passed into the AddValueIn method a full edge is // created. An OpenEdge can also be sourced from a tuple vertex, whose value is // an ordered set of OpenEdges of this same type. type OpenEdge[E, V Value] struct { ... } // ValueOut creates a OpenEdge which, when used to construct a Graph, represents // an edge (with edgeVal attached to it) coming from the vertex containing val. func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V] // TupleOut creates an OpenEdge which, when used to construct a Graph, // represents an edge (with edgeVal attached to it) coming from the vertex // comprised of the given ordered-set of input edges. func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V] // Graph is an immutable container of a set of vertices. The Graph keeps track // of all Values which terminate an OpenEdge. E indicates the type of edge // values, while V indicates the type of vertex values. type Graph[E, V Value] struct { ... } // AddValueIn takes a OpenEdge and connects it to the Value vertex containing // val, returning the new Graph which reflects that connection. func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V] // ValueIns returns, if any, all OpenEdges which lead to the given Value in the // Graph (ie, all those added via AddValueIn). func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
The current `Graph` implementation is *incredibly* inefficient, it does a lot of copying, looping, and equality checks which could be optimized out one day. That's going to be a recurring theme of this post, as I had to perform a balancing act between actually reaching my goal for the week while not incurring too much tech debt for myself.
There's a final operation I implemented as part of the `graph` package: MapReduce. It's a difficult operation to describe, but I'm going to do my best in this section for those who are interested. If you don't understand it, or don't care, just know that `MapReduce` is a generic tool for transforming graphs.
For a description of `MapReduce` we need to present an example graph:
+<--b--- + \ X <--a--+<--c----+<--f-- A + / + +<---g--- +<--d--+ +<---h--- \ Y <---------e----------- B
Plus signs indicate tuples, and lowercase letters are edge values while upper case letters are vertex values. The pseudo-code to construct this graph in go might look like:
g := new(Graph) fA := ValueOut("f", "A") g = g.AddValueIn( "X", TupleOut( "a", TupleOut("b", fA), TupleOut("c", fA), TupleOut( "d", ValueOut("g", "A"), ValueOut("h", "B"), ), ), ) g = g.AddValueIn("e", "B")
As can be seen in the code, `MapReduce`'s first argument is an `OpenEdge`, *not* a `Graph`. Fundamentally `MapReduce` is a reduction of the *dependencies* of a particular value into a new value; to reduce the dependencies of multiple values at the same time would be equivalent to looping over those values and calling `MapReduce` on each individually. Having `MapReduce` only deal with one edge at a time is more flexible.
So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by `TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge` recursively, in order to first find all value vertices (ie the leaf vertices, those without any children of their own).
At this point `MapReduce` will use its second argument, the `mapVal` function, which accepts a value of one type and returns a value of another type. This function is called on each value from every value vertex encountered. In this case both `A` and `B` are connectable from `X`, so `mapVal` will be called on each *only once*. This is the case even though `A` is connected to multiple times (once with an edge value of `f`, another with an edge value of `b`). `mapVal` only gets called once per vertex, not per connection.
With all values mapped, `MapReduce` will begin reducing. For each edge leaving each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as arguments the edge value of the edge and the *mapped value* (not the original value) of the vertex, and returns a new value of the same type that `mapVal` returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be called on it once.
With each value vertex edge having been reduced, `reduceEdge` is called again on each edge leaving *those* edges, which must be tuple edges. An array of the values returned from the previous `reduceEdge` calls for each of the tuples' input edges is used as the value argument in the next call. This is done until the `OpenEdge` is fully reduced into a single value.
To flesh out our example, let's imagine a `mapVal` which returns the input string repeated twice, and a `reduceEdge` which returns the input values joined with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B, C]) -> aBaCa`).
Calling `MapReduce` on the edge leading into `X` will then give us the following calls:
# Map the value vertices mapVal(A) -> AA mapVal(B) -> BB # Reduce the value vertex edges reduceEdge(f, [AA]) -> fAAf reduceEdge(g, [AA]) -> gAAg reduceEdge(h, [BB]) -> hBBh # Reduce tuple vertex edges reduceEdge(b, [fAAf]) -> bfAAfb reduceEdge(c, [fAAf]) -> cfAAfc reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
Beautiful, exactly what we wanted.
`MapReduce` will prove extremely useful when it comes time for the VM to execute the graph. It enables the VM to evaluate only the values which are needed to produce an output, and to only evaluate each value once no matter how many times it's used. `MapReduce` also takes care of the recursive traversal of the `Graph`, which simplifies the VM code significantly.
With a generic graph implementation out of the way, it was then required to define a specific implementation which could be parsed from a file and later used for execution in the VM.
The file extension used for ginger code is `.gg`, as in "ginger graph" (of course). The package name for decoding this file format is, therefore, also called `gg`.
The core datatype for the `gg` package is the `Value`, since the `graph` package takes care of essentially everything else in the realm of graph construction and manipulation. The type definition is:
// Value represents a value which can be serialized by the gg text format. type Value struct { // Only one of these fields may be set Name *string Number *int64 Graph *Graph // Optional fields indicating the token which was used to construct this // Value, if any. LexerToken *LexerToken } type Graph = graph.Graph[Value, Value] // type alias for convenience
Note that it's currently only possible to describe three different types in a `gg` file, and one of them is the `Graph`! These are the only ones needed to implement a fibonacci function, so they're all I implemented.
The lexing/parsing of `gg` files is not super interesting, you can check out the package code for more details. The only other thing worth noting is that, for now, all statements are required to end with a `;`. I had originally wanted to be less strict with this, and allow newlines and other tokens to indicate the end of statements, but it was complicating the code and I wanted to move on.
Another small thing worth noting is that I decided to make each entire `.gg` file implicitly define a graph. So you can imagine each file's contents wrapped in curly braces.
With the `gg` package out of the way I was able to finally parse ginger programs! The following is the actual, real-life implementation of the fibonacci function (though at this point it didn't actually work, because the VM was still not implemented:
out = { decr = { out = add < (in; -1;); }; n = tupEl < (in; 0;); a = tupEl < (in; 1;); b = tupEl < (in; 2;); out = if < ( isZero < n; a; recur < ( decr < n; b; add < (a;b;); ); ); } < (in; 0; 1;);
Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy, well constructed foundations of a tall building, then the `vm` package is the extremely long, flimsy stick someone propped up vertically so they could say they built a structure of impressive height.
In other words, it's very likely that the current iteration of the VM will not be long for this world, and so I won't waste time describing it in super detail.
What I will say about it is that within the `vm` package I've defined a new `Value` type, which extends the one defined in `gg`. The necessity of this was that there are types which cannot be represented syntactically in a `.gg` file, but which *can* be used as values within a program being run.
The first of these is the `Operation`, which is essentially a first-class function. The VM will automatically interpret a graph as an `Operation` when it is used as an edge value, as has been discussed in previous posts, but there are also built-in operations (like `if` and `recur`) which cannot be represented as datastructures, and so it was necessary to introduce a new in-memory type to properly represent operations.
The second is the `Tuple` type. This may seem strange, as ginger graphs already have a concept of a tuple. But the ginger graph tuple is a *vertex type*, not a value type. The distinction is small, but important. Essentially the graph tuple is a structural element which describes how to create a tuple value, but it is not yet that value. So we need a new Value type to hold the tuple once it *has* been created during runtime.
Another thing worth describing about the `vm` package, even though I think they might change drastically, are `Thunk`s:
// Thunk is returned from the performance of an Operation. When called it will // return the result of that Operation having been called with the particular // arguments which were passed in. type Thunk func() (Value, error)
The term "thunk" is borrowed from Haskell, which I don't actually know so I'm probably using it wrong, but anyway...
A thunk is essentially a value which has yet to be evaluated; the VM knows exactly *how* to evaluate it, but it hasn't done so yet. The primary reason for their existence within ginger is to account for conditionals, ie the `if` operation. The VM can't evaluate each of an `if`'s arguments all at once, it must only evaluate the first argument (to obtain a boolean), and then based on that evaluate the second or third argument.
This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on the operation (the edge's value) and the inputs to the edge (which will themselves be `Thunk`s). Because each `Thunk` represents a potential value, not an actual one, the VM is able to completely parse the program to be executed (using `graph.MapReduce`) while allowing conditionals to still work correctly.
EvaluateEdge is where all that happens, if you're interested, but be warned that the code is a hot mess right now and it's probably not worth spending a ton of time understanding it as it will change a lot.
A final thing I'll mention is that the `recur` operation is, I think, broken. Or probably more accurately, the entire VM is broken in a way which prevents `recur` from working correctly. It *does* produce the correct output, so I haven't prioritized debugging it, but for any large number of iterations it takes a very long time to run.
Finally, to show it off! I put together a super stupid `eval` binary which takes two arguments: a graph to be used as an operation, and a value to be used as an argument to that operation. It doesn't even read the code from a file, you have to `cat` it in.
The README documents how to run the demo, so if you'd like to do so then please clone the repo and give it a shot! It should look like this when you do:
# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8 21
You can put any number you like instead of `8`, but as mentioned, `recur` is broken so it can take a while for larger numbers.
The following are all the things I'd like to address the next time I work on ginger:
* Allow for newlines (and `)` and `}`) to terminate statements, not just `;`.
* Allow names to have punctuation characters in them (maybe?).
* Don't read all tokens into memory prior to parsing.
* Fix `recur`.
* Implement tail call optimization.
* A bit of polish on the `eval` tool.
* Expose graph creation, traversal, and transformation functions as builtins.
* Create plan (if not actually implement it yet) for how code will be imported from one file to another. Namespacing in general will fall into this bucket.
* Create plan (if not actually implement it yet) for how users can extend/replace the lexer/parser.
I don't know *when* I'll get to work on these next, ginger will come back up in my rotation of projects eventually. It could be a few months. In the meantime I hope you're as excited about this progress as I am, and if you have any feedback I'd love to hear it.
Thanks for reading!
-----
Published 2021-12-31