💾 Archived View for mediocregopher.com › posts › ginger.gmi captured on 2024-09-29 at 00:00:53. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-08-18)
-=-=-=-=-=-=-
Yes, it does exist.
This post is about a programming language that's been bouncing around in my head for a *long* time. I've tried to actually implement the language three or more times now, but everytime I get stuck or run out of steam. It doesn't help that everytime I try again the form of the language changes significantly. But all throughout the name of the language has always been "Ginger". It's a good name.
In the last few years the form of the language has somewhat solidified in my head, so in lieu of actually working on it I'm going to talk about what it currently looks like.
Here's a hello world program in my favorite ASL language, brainfuck:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.++ +.------.--------.>>+.>++.
(If you've never seen brainfuck, it's deliberately unintelligible. But it *is* an ASL, each character representing a single command, executed by the brainfuck runtime from left to right.)
ASLs did the job at the time, but luckily we've mostly moved on past them.
Eventually programmers upgraded to C-like languages. Rather than a sequence of commands, these languages were syntactically represented by an "abstract syntax tree" (AST). Rather than executing commands in essentially the same order they are written, an AST language compiler reads the syntax into a tree of syntax nodes. What it then does with the tree is language dependent.
Here's a program which outputs all numbers from 0 to 9 to stdout, written in (slightly non-idiomatic) Go:
i := 0 for { if i == 10 { break } fmt.Println(i) i++ }
When the Go compiler sees this, it's going to first parse the syntax into an AST. The AST might look something like this:
(root) |-(:=) | |-(i) | |-(0) | |-(for) |-(if) | |-(==) | | |-(i) | | |-(10) | | | |-(break) | |-(fmt.Println) | |-(i) | |-(++) |-(i)
Each of the non-leaf nodes in the tree represents an operation, and the children of the node represent the arguments to that operation, if any. From here the compiler traverses the tree depth-first in order to turn each operation it finds into the appropriate machine code.
There's a sub-class of AST languages called the LISP ("LISt Processor") languages. In a LISP language the AST is represented using lists of elements, where the first element in each list denotes the operation and the rest of the elements in the list (if any) represent the arguments. Traditionally each list is represented using parenthesis. For example `(+ 1 1)` represents adding 1 and 1 together.
As a more complex example, here's how to print numbers 0 through 9 to stdout using my favorite (and, honestly, only) LISP, Clojure:
(doseq [n (range 10)] (println n))
Much smaller, but the idea is there. In LISPs there is no differentiation between the syntax, the AST, and the language's data structures; they are all one and the same. For this reason LISPs generally have very powerful macro support, wherein one uses code written in the language to transform code written in that same language. With macros users can extend a language's functionality to support nearly anything they need to, but because macro generation happens *before* compilation they can still reap the benefits of compiler optimizations.
The ASL (assembly) is essentially just a thin layer of human readability on top of raw CPU instructions. It does nothing in the way of representing code in the way that humans actually think about it (relationships of types, flow of data, encapsulation of behavior). The AST is a step towards expressing code in human terms, but it isn't quite there in my opinion. Let me show why by revisiting the Go example above:
i := 0 for { if i > 9 { break } fmt.Println(i) i++ }
When I understand this code I don't understand it in terms of its syntax. I understand it in terms of what it *does*. And what it does is this:
This behavior could be further abstracted into the original problem statement, "it prints numbers 0 through 9 to stdout", but that's too general, as there are different ways for that to be accomplished. The Clojure example first defines a list of numbers 0 through 9 and then iterates over that, rather than looping over a single number. These differences are important when understanding what code is doing.
So what's the problem? My problem with ASTs is that the syntax I've written down does *not* reflect the structure of the code or the flow of data which is in my head. In the AST representation if you want to follow the flow of data (a single number) you *have* to understand the semantic meaning of `i` and `:=`; the AST structure itself does not convey how data is being moved or modified. Essentially, there's an extra implicit transformation that must be done to understand the code in human terms.
In my view the next step is towards using graphs rather than trees for representing our code. A graph has the benefit of being able to reference "backwards" into itself, where a tree cannot, and so can represent the flow of data much more directly.
I would like Ginger to be an ASG language where the language is the graph, similar to a LISP. But what does this look like exactly? Well, I have a good idea about what the graph *structure* will be like and how it will function, but the syntax is something I haven't bothered much with yet. Representing graph structures in a text file is a problem to be tackled all on its own. For this post we'll use a made-up, overly verbose, and probably non-usable syntax, but hopefully it will convey the graph structure well enough.
All graphs have nodes, where each node contains a value. A single unique value can only have a single node in a graph. Nodes are connected by edges, where edges have a direction and can contain a value themselves.
In the context of Ginger, a node represents a value as expected, and the value on an edge represents an operation to take on that value. For example:
5 -incr-> n
`5` and `n` are both nodes in the graph, with an edge going from `5` to `n` that has the value `incr`. When it comes time to interpret the graph we say that the value of `n` can be calculated by giving `5` as the input to the operation `incr` (increment). In other words, the value of `n` is `6`.
What about operations which have more than one input value? For this Ginger introduces the tuple to its graph type. A tuple is like a node, except that it's anonymous, which allows more than one to exist within the same graph, as they do not share the same value. For the purposes of this blog post we'll represent tuples like this:
1 -> } -add-> t 2 -> }
`t`'s value is the result of passing a tuple of two values, `1` and `2`, as inputs to the operation `add`. In other words, the value of `t` is `3`.
For the syntax being described in this post we allow that a single contiguous graph can be represented as multiple related sections. This can be done because each node's value is unique, so when the same value is used in disparate sections we can merge the two sections on that value. For example, the following two graphs are exactly equivalent (note the parenthesis wrapping the graph which has been split):
1 -> } -add-> t -incr-> tt 2 -> }
( 1 -> } -add-> t 2 -> } t -incr-> tt )
(`tt` is `4` in both cases.)
A tuple with only one input edge, a 1-tuple, is a no-op, semantically, but can be useful structurally to chain multiple operations together without defining new value names. In the above example the `t` value can be eliminated using a 1-tuple.
1 -> } -add-> } -incr-> tt 2 -> }
When an integer is used as an operation on a tuple value then the effect is to output the value in the tuple at that index. For example:
1 -> } -0-> } -incr-> t 2 -> }
(`t` is `2`.)
When a value sits on an edge it is used as an operation on the input of that edge. Some operations will no doubt be builtin, like `add`, but users should be able to define their own operations. This can be done using the `in` and `out` special values. When a graph is used as an operation it is scanned for both `in` and `out` values. `in` is set to the input value of the operation, and the value of `out` is used as the output of the operation.
Here we will define the `incr` operation and then use it. Note that we set the `incr` value to be an entire sub-graph which represents the operation's body.
( in -> } -add-> out 1 -> } ) -> incr 5 -incr-> n
(`n` is `6`.)
The output of an operation may itself be a tuple. Here's an implementation and usage of `double-incr`, which increments two values at once.
( in -0-> } -incr-> } -> out } in -1-> } -incr-> } ) -> double-incr 1 -> } -double-incr-> t -add-> tt 2 -> }
(`t` is a 2-tuple with values `2`, and `3`, `tt` is `5.)
The conditional is a bit weird, and I'm not totally settled on it yet. For now we'll use this. The `if` operation expects as an input a 2-tuple whose first value is a boolean and whose second value will be passed along. The `if` operation is special in that it has *two* output edges. The first will be taken if the boolean is true, the second if the boolean is false. The second value in the input tuple, the one to be passed along, is used as the input to whichever branch is taken.
Here is an implementation and usage of `max`, which takes two numbers and outputs the greater of the two. Note that the `if` operation has two output edges, but our syntax doesn't represent that very cleanly.
( in -gt-> } -if-> } -0-> out in -> } -> } -1-> out ) -> max 1 -> } -max-> t 2 -> }
(`t` is `2`.)
It would be simple enough to create a `switch` macro on top of `if`, to allow for multiple conditionals to be tested at once.
Loops are tricky, and I have two thoughts about how they might be accomplished. One is to literally draw an edge from the right end of the graph back to the left, at the point where the loop should occur, as that's conceptually what's happening. But representing that in a text file is difficult. For now I'll introduce the special `recur` value, and leave this whole section as TBD.
`recur` is cousin of `in` and `out`, in that it's a special value and not an operation. It takes whatever value it's set to and calls the current operation with that as input. As an example, here is our now classic 0 through 9 printer (assume `println` outputs whatever it was input):
// incr-1 is an operation which takes a 2-tuple and returns the same 2-tuple // with the first element incremented. ( in -0-> } -incr-> } -> out in -1-> } ) -> incr-1 ( in -eq-> } -if-> out in -> } -> } -0-> } -println-> } -incr-1-> } -> recur ) -> print-range 0 -> } -print-range-> } 10 -> }
This post is long enough, and I think gives at least a basic idea of what I'm going for. The syntax presented here is *extremely* rudimentary, and is almost definitely not what any final version of the syntax would look like. But the general idea behind the structure is sound, I think.
I have a lot of further ideas for Ginger I haven't presented here. Hopefully as time goes on and I work on the language more some of those ideas can start taking a more concrete shape and I can write about them.
The next thing I need to do for Ginger is to implement (again) the graph type for it, since the last one I implemented didn't include tuples. Maybe I can extend it instead of re-writing it. After that it will be time to really buckle down and figure out a syntax. Once a syntax is established then it's time to start on the compiler!
-----
Published 2021-01-09