💾 Archived View for dcreager.net › stack-graphs › concatenative-language.gmi captured on 2024-05-12 at 15:41:50. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

A concatenative language for stack graphs

The CC language

CC is a concatenative continuation-passing language. CC programs operate on a stack of data values. Because CC is a continuation-passing language, there is no control stack or runtime call stack. (Though see below how we could implement a ‘call’ instruction with syntactic sugar that wraps the rest of the instruction stream in a quotation.)

There are three kinds of value that can appear on the data stack:

A “literal” is a basic data value. You can compare it for equality, and that's it. (For Code Nav, symbols would be atoms, and possibly also source locations)
A “quotation” represents an arbitrary chunk of code that you can pass control to. (This is a continuation-passing language, so this invocation is one-way.)
A “snapshot” is a sequence of values

P ≜ q
q ≜ i | q;q

A CC program ‘P’ consists of a single quotation. A clause ‘q’ is a sequence of one or more instructions.

The possible instructions are as follows. (Note that some instructions are “terminals”, which means that any instructions appearing after it in a quotation are ignored.)

General stack manipulation

[I] ‘drop’: Pops a value from the top of the stack and drops it.
[I] ‘dup’: Duplicates the value at the top of the stack.
[I] ‘rotate n’: Moves the value at position ‘n’ to the top of the stack. (0 is the current top of stack, 1 is the item just below that, etc. That makes ‘rotate 0’ a noop for this language.)

Atoms

[I] ‘push x’: Pushes a particular literal onto the stack.
[T] ‘ifeq’: Pops two literals and two quotations from the stack. If the literals are equal, passes control to the first quotation; otherwise to the second. If the stack doesn't contain at least 4 values, or if the top two are not both literals, or if the next two are not both quotations, abort.

Quotations

[I] ‘quote [q]’: Pushes a quotation representing the quotation ‘q’ onto the stack.
[T] ‘ijump’: “Indirect jump” Pops a quotation value from the stack and passes control to it. If the stack is empty or top-of-stack is not a quotation, abort.

Snapshot

TODO: Do we need snapshots?

[I] ‘pack’: Pops all values from the top of the stack, wraps them in a new snapshot value, and pushes that snapshot onto the stack.
[I] ‘unpack’: Pops a snapshot from the top of the stack and replaces the current stack with its contents. If the stack is empty or top-of-stack is not a snapshot, aborts.

Closure

A closure is a special tuple, which consists of a quotation and a stack (represented as a tuple). It gives you the ability to throw away your current execution state and resume somewhere else.

[This is more typically called a “continuation” in other concatenative languages, but I don't want to confuse names with the continuation used below in the execution semantics.]

[T] ‘continue-with’: Pops a value and a closure from the top of the stack. Replaces the current stack with the stack in the closure, pushes the value onto the (new) stack, and passes control the closure's quotation.

Syntactic sugar

All of the following instructions don't need to be first-class in the formalism, and can be implemented in terms of other instructions:

[I] ‘pushcc (q)’: Creates a new continuation consisting of the specified quotation and the current stack (which is cleared). Equivalent to ‘pack all; quote (q); pack 2’

[T] ‘abort’: Aborts the current program. Equivalent to (among many other possibilities) ‘push x; pop !x’ for any ‘x’

[T] ‘djump (q)’: “Direct jump” Jumps directly to a quotation. Equivalent to ‘quote (q); ijump’

[T] ‘call; b’: “Call with return” Wraps the rest of the instruction stream into a quotation, pushes that onto the stack, then invokes a quotation. Equivalent to ‘quote (b); rotate 1; ijump’

[T] ‘ite x (qt) (qe)’: If-then-else. Pops a value off of the stack and executes ‘qt’ if that value is ‘x’; executes ‘qe’ otherwise. Equivalent to ‘pop x; qt ⊓ pop !x; qe’ (though you'd have to distribute the ⊓ through the rest of the quotation's branches)

Syntactic sugar: Pairs

We don't need pairs, just store the two elements on the stack.

Syntactic sugar: Tagged unions

We need literals for the tag values (typically ‘LEFT’ and ‘RIGHT’, which is sufficient to construct any other union)

[I] ‘left’: Equivalent to ‘push LEFT’
[I] ‘right’: Equivalent to ‘push RIGHT’
[T] ‘match’: Pops a union value and two snapshots from the stack. If the union is a left union, passes control to the first quotation; otherwise to the second. If the stack doesn't contain at least 3 values, or if the top value isn't a union, or if the next two values are not both quotations, abort. Equivalent to ‘push LEFT; ifeq’

CC execution semantics

v ≜ x | q | s
s ≜ ∘ | v | s⋅s

∘⋅s = s⋅∘ = s
s₁⋅(s₂⋅s₃) = (s₁⋅s₂)⋅s₃
(s₁,s₂,...) ≜ s₁⋅s₂⋅...

CC programs operate on, and executing one produces, a stack of values. Literal values and quotations are represented directly. Tuples are represented by a stack containing the tuple's values.

S ≜ s▹κ
κ ≜ s→s
⟦⋅⟧ ≜ S↔S

The current state of a CC program consists of a stack, along with a continuation representing any additional work that needs to be performed. The interpretation ‘I’ of each language construct is a relation mapping a before state to an after state.

These semantics are similar to that of Oxcart, which also includes a continuation as part of the before- and after-state that each semantic function operates on.

Oxcart

Note that, because of how a quotation is defined via non-determinism, its interpretation is a _relation_, and not a function, of before states to after states.

⟦drop⟧ ∋ (v⋅s)▹κ ↦ s▹κ
⟦ijump⟧ (q⋅s)▹κ = s▹κ

⟦b⟧ ≜ S→S
⟦i;b⟧ s▹κ = ⟦i⟧ s▹(λs'.⟦b⟧ s'▹κ)

The interpretation of a CC branch is also a function mapping before states to after states. It is the sequential composition of the interpretation of each instruction in the branch.

Notes

- To implement loops, you'll need to ‘dup’ the loop body's quotation before invoking it. Otherwise you'd need to have a copy of the loop body within itself, which isn't syntactically possible.

The FCC language

FCC is a “flattened” version of CC. Whereas in CC, quotations are nested directly in the instruction stream, FCC programs consist of a flat list of _labeled_ quotations. In each instruction stream, quotations are referred to by their labels, instead of being embedded directly.

This will map nicely to our existing partial path construction, which includes a source and sink node. (And as we've mentioned in a few places, graph nodes are equivalent to instruction pointers are equivalent to non-closure quotations.)

The NCC language

TODO: Move non-determinism here. CC would be fully deterministic (and would use functions instead of relations in its semantics) (Note that means CC would need to grow a conditional instruction of some kind)

P ≜ q
q ≜ b | q⊓q
b ≜ i | b;b

An NCC program ‘P’ consists of a single quotation. A clause ‘q’ is the non-deterministic choice of one or more branches. A branch ‘b’ is a sequence of instructions.