đŸ Archived View for dcreager.net âș swanson âș concatenative.gmi captured on 2024-09-29 at 00:09:32. Gemini links have been rewritten to link to archived content
âŹ ïž Previous capture (2023-09-28)
-=-=-=-=-=-=-
Or, put differently, should Sâ have a stack as its underlying data model, instead of an environment of named values?
This came about from thinking about how stack graphs relates to concatenative languages.
Concatenative programming languages
Should stack graphs just be Forth programs?
The especially interesting bit is what a *linear* concatenative language would look like. Quotations would be the equivalent to our existing closures/invokables, and I think they would definitely still need to have branches.
A closure (method set? menu?) would still be a set of named branches. You'd create one by closing over some number of values at the top of the stack (the values would be popped off and moved into the closure), and providing the list of branches along with their names. Each of the branches would need to accept at least as many inputs as were closed over, as well as an optional list of âadditionalâ inputs. (These correspond to the receiving set in the old semantics.)
You can invoke a closure when it's at top of stack. You provide the name of the branch you want to invoke. Doing so consumes the closure (like before in the environment-based semantics), popping it from the stack. The chosen branch's additional inputs should be at top of stack. Before passing control to the branch's body, the closed-over values are pushed onto the stack.
That means that the code of the branch sees all of its inputs (both closed-over and additional) immediately at top of stack. (The closed-over inputs will be higher than the additional inputs because of the push ordering described above.)
An individual branch could therefore be defined in isolation, and wouldn't need to distinguish between which values were âcontainingâ vs âreceivingâ. It would just operate on whatever values are at the top of the stack, which we'd call âinputsâ.
And that, in turn, means that we wouldn't need to trick we've been doing so far where an âobjectâ's âmethodsâ are defined twice â an internal version with all of its necessary data passed in via a receiving set, and the external version in the object itself delegating to that. (We did that so that methods could more easily call each other.)
As part of this, I'd like to add a notation for immediately invoking a branch â sugar for creating a single-branch closure with no closed-over values, and immediately invoking it.
The âsuffixâ of the stack that each closure doesn't operate on, or need any visibility to, is exactly like the globs that we currently have, but without the complexity of having to ensure that names don't clash between the named portion of an input specification and the glob portion. (That alone seems like a strong argument in favor of this approach.)
One drawback is that before, with environments, values had names, which could easily correspond to source code names/locations. My intent was that this would be helpful in constructing nice error messages. We'll need to think of a replacement âdebug infoâ setup if we move to this model.
When invoking something, the target has to be top-of-stack at the point of the âinvokeâ instruction. But where does the continuation go? Above the inputs, or below?
above: target $return $0 $1 ... below: target $0 $1 $return ...
Either one works from the caller â they can push the arguments and then the return continuation, or the reverse, just as easily. For the callee, âbelowâ is probably easier to consume, since you'll typically need to interact with your inputs before returning to your caller.
The target will typically not already be at top of stack, so the ordering will usually be:
Another option would be to have two stacks (control and data), like is typical in a concatenative language. The control stack would only hold invokables. The data stack would hold anything. You'd create closures onto the data stack. You'd invoke them from the control stack. The âmove to controlâ instruction would pop a closure from the data stack and move it to the control stack. The âjumpâ instruction would pop a closure from the control stack and pass control to it. (You have to provide the name of the branch that you want to jump to.) I can't immediately think of what situations this would make easier, but it's an option to keep in mind.
An interesting tweak to this idea is that Sâ would become concatenative, but Sâ would not. Sâ would still operate on environments of _named_ values, and the translation to Sâ would take care of turning name references into the appropriate stack manipulation statements. Throws a big monkey wrench in the initial bootstrap work I've been doing, but I think it's the right way to go.
I now think the two-stack option is the way to go, since it makes calls easier:
And now I can see why so many stack languages are going with multiple stacks, or stacks of stacks! (Dawn programs, for example, operate on a arbitrary set of named stacks.) It's already a hassle implementing something like âcloneâ, where you have to clone each of the top N items on the stack, but then also collate them together to close over each set in a distinct invokable!
That said, for Swanson, I think there's value in keeping the Sâ IR single-stack. (A single _data_ stack, that is â there would still be a control stack!) The multi-stack paradigm like Dawn uses would be one of many other styles that would be compiled down into single-stack Sâ. Since the goal of Sâ is to be simple, and easily translated into Sâ, I think I'm going to keep it single-stack as well, at least for now.
Update: Ah, but another thing we have to track is the âexpectationsâ of each invocation. At the very least, how many values will be removed and added as part of the call. In the long term, a full session type relating the pre- and postcondition. That will be easier to derive from the existing name-based Sâ syntax. (Not because it uses names to track values on the stack, but because there's an explicit list of inputs and outputs in the call syntax.) Or, we'd need to add some kind of stack effect annotation to the new concatenative syntax.
Need to write up the current notes on a concatenative Sâ, but an important part is that I'll need separate âsatisfiesâ and âunifiesâ relations. TAPL (p. 337) mentions good results in combining type reconstruction with row polymorphism:
The biggest success story in this arena is the elegant account of type reconstruction for record types proposed by Wand (1987) and further developed by Wand (1988, 1989b), Remy (1989, 1990; 1992a, 1992b, 1998), and many others. The idea is to introduce a new kind of variable, called a _row variable_, that ranges not over types but over entire ârowsâ of field labels and associated types. A simple form of _equational unification_ is used to solve constraint sets involving row variables.
Need to track down these papers and do some reading.
TAPL (p. 313)
Jim and Palsberg (1999) address type reconstruction for languages with subtyping and recursive types. As we have done in this chapter, they adopt a coinductive view of the subtype relation over infinite trees and motivate a subtype checking algorithm as a procedure building the minimal simulation (i.e., consistent set, in our terminology) from a given pair of types. They define notions of consistency and P1-closure of a relation over types, which correspond to our consistency and reachable sets.
[Jim1999] Type inference in systems of recursive types with subtyping
Found [Zhou2020], which goes into more detail about recursive types, subtypes, and type inference.