đŸ’Ÿ Archived View for dcreager.net â€ș stack-graphs â€ș is-it-forth.gmi captured on 2023-07-22 at 16:29:52. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Should stack graphs just be Forth programs?

2022-07-31

Two anecdotes while chatting with Jason about writing stack graph rules for Rust.

On the one hand, we could possibly “fix” both of these in the small. For (1), add a way to construct partial paths directly, or maybe add a new node type that lets you provide an arbitrary pre- and postcondition. For (2), add those as new node types that you can construct, or implement them via the “arbitrary pre- and postcondition” node type.

But maybe...we should just be writing Forth programs? Is there a way to apply something analogous to the path finding and stitching algorithms to Forth programs? I can see the stack pre- and postcondition translating over pretty well, but stack graphs use the graph structure to encode quite a lot of the underlying semantics. So I'm not sure. But it's an interesting idea!

2022-08-10

Came across a good overview of concatenative programming languages that really reinforces this possibility. In particular, it calls out how functions in a concatenative language are typed in a way that *strongly* resemble a partial path's symbol stack pre- and post-conditions.

Concatenative programming languages

Digging further, the Factor language [Pestov2010] includes a “stack effect” notation, which describes how each word in the language effects the stack. This is also very much like partial path pre- and post-conditions, including a notion of “row variables” that lines up with our symbol stack variables, and the ability for elements of a stack effect to be nested stack effects — modeling how values on the stack can be quotations of code that should be executed later.

Pestov2010

Concatenative languages have multiple stacks; the minimum seems to be two, a control stack and a data stack. I think that in the stack graphs world, the symbol stack corresponds to the data stack, and the scope stack corresponds to the control stack. In a concatenative language, a continuation is “just a snapshot of the current stacks”:

Continuations in concatenative languages

2022-08-11

After talking with Beka, I think that “just write Forth” is the wrong goal to aim for, at least at first, because it conflates too many things. What if we were still using TSG to construct some kind of graph structure, but the _underlying formalism_ that the paths represent is more like the execution formalism of a concatenative language like Forth or Factor?

As mentioned above, in the simplest execution model the data stack corresponds to our symbol stack, and the control stack corresponds to our scope stack. If we were to make this kind of change, we'd need to think about what kinds of values can appear on these stacks.

The scope stack would actually stay mostly the same. The control stack typically holds some kind of “instruction pointer”. For us, nodes still identify states or points of time in the execution of the “program”. So the node identifiers are still a good equivalent to an “instruction pointer”.

But what about the symbol stack? A concatenative data stack contains the values that your program can operate on. We'd need several kinds of values, some of which already exist or have analogues:

With this change, we still need to think about what other “node types” we'd need to model the various stack manipulation primitives that Forth/Factor/etc provide (dup, roll, drop, etc). But I think this model is enough to let us implement callcc.

This also helps crystallize what our partial paths are. They're *not* types like Factor's stack effect annotations. Instead, they're a concise summary of a *set of execution traces*. I.e., there is a programming language down there (and these changes make it properly Turing-complete), but programs in this language aren't written as a set of instructions, they're defined as the set of execution traces (the partial paths) that we want the program to produce. (And in particular, if we *were* to write the programs out in a more traditional instruction-based way, there'd be a lot of use of ‘amb’.)

Still-open questions: