đŸ Archived View for dcreager.net âș 2020 âș 12 âș swanson-s0.gmi captured on 2024-08-25 at 00:21:21. Gemini links have been rewritten to link to archived content
âŹ ïž Previous capture (2023-07-22)
-=-=-=-=-=-=-
2020-12-10
Note: This post is out of date! Swanson still has an Sâ language, but it looks completely different now.
In the previous post, we talked about Swansonâs execution model, but didnât really describe what Swanson code _looks like_. In this post, weâll look at Sâ (pronounced âess naughtâ), which is Swansonâs âassembly languageâ.
As weâll see, Sâ hews pretty closely to the Swanson execution model, and isnât really a language that youâll want to program in directly. Typically, youâll actually _write_ in some other higher-level language, which will be translated into Sâ. Weâll see in later posts how this process works. For now, donât be put off by the amount of boilerplate that you see here â itâs not something that youâll have to author directly!
Weâve carefully designed the concrete syntax of Sâ so that it is as simple to parse as possible â for instance, without requiring a particular parser generator. (The reference implementation is a simple recursive descent parser that only requires a single character of lookahead.) In fact, itâs more important for Sâ to be easy to parse, than for it to be easy for humans to write. After all, you will very rarely have to write Sâ directly!
This simplicity is important since the first part of writing a new Swanson host is being able to load in âbootstrapâ code, which is written in Sâ. This part of the host will need to be written directly in the hostâs language, and so we want to make that host-specific custom code as simple to write as possible.
As mentioned in the previous post, Swanson names are _binary_. There are three ways to encode a Swanson name in Sâ.
The first allows us to encode arbitrary binary content, written in hexadecimal within square brackets. Each octet in the name must be written with two hexadecimal characters, and there must be one or more whitespace characters in between each octet. For instance, the following is one way to encode the name `name`:
[6e 61 6d 65]
The second notation is a shortcut syntax for names that are ASCII-encoded strings, since there are enough higher-level languages that use ASCII to encode their identifiers. A _bare name_ is an ASCII-encoded name that only consists of common identifier characters: in particular, alphanumerics, underscore, period, at-sign (`@`), and dollar sign (`
$_ entry atom i ds.hashmap $0 entry@1 literal while primitives.bool
You might wonder why `@` and `gemini - kennedy.gemi.dev are considered bare name characters, since most programming languages limit identifiers to alphanumberic characters. (Lisps that allow `kebab-case` names are an exception!)
And thatâs actually the reason why! Since Sâ is a translation target for other languages, having a couple of atypical characters available lets us more easily construct âderivedâ Swanson names that cannot conflict with names that appear in the source languages.
The final notation is a syntax for names that are ASCII-encoded strings, but donât consist of _only_ bare name characters. If a name contains only ASCII printable characters, you can enclose them in double-quotes:
"entry" "kebab-case-with->arrows" "name with spaces" "name:with:colons"
Note that there are no escape sequences for this notation â which means in particular that you canât use this notation if the name contains a double-quote character. If it does, you have to use the hex notation described above.
Sâ code is organized into _modules_. Each module consists of a number of _blocks_, each of which has a distinct name. Blocks are used in _closures_, which are Swanson invokables whose branches are implemented in Sâ.
module test { $load: containing () receiving ($loaded) { $module = atom; -> $loaded; } }
Each block starts with `containing` and `receiving` clauses, which define which names are available in the environment at the start of the block. The names in the `containing` clause are part of the blockâs âclosure setâ â they represent values that are moved âintoâ the closure when itâs created. The names in the `receiving` clause are the âinputsâ of the block â the caller must ensure that the environment contains exactly these names when invoking the blockâs closure. When a block is invoked, its closure and input environments are merged together before execution proceeds â which means that the blockâs `containing` and `receiving` clauses canât have any names in common.
Each block also contains a _body_, enclosed in curly braces. The body consists of zero or more _statements_ followed by exactly one _invocation_.
There are four kinds of statement in Sâ:
Creates a new Swanson atom distinct from all others, and adds it to the environment:
dest = atom;
Creates a new Swanson literal, and adds it to the environment:
dest = literal [6e 61 6d 65];
Creates a new Sâ closure, and adds it to the environment:
dest = closure containing (value1, value2) branch true = true_branch, branch false = false_branch;
The statement has a `containing` clause which _removes_ the specified values from the environment, moving them into the new closure. Each branch of the closure has a name (`true`, `false`), and refers to one of the blocks in the enclosing module (`true_branch`, `false_branch`). The `containing` clause of each of those blocks must match the `containing` clause of the _create closure_ statement.
There is a shortcut syntax for when the new closure has a single branch, with an empty name:
dest = closure containing (value1, value2) -> block;
is exactly equivalent to:
dest = closure containing (value1, value2) branch [] = block;
Changes the name of a value in the environment:
dest = rename source;
For all of these statements, itâs an error if thereâs already a value in the environment with the desired âdestinationâ name. For create closure and rename statements, itâs an error if there _isnât_ already a value in the environment with the desired âsourceâ name.
And thatâs it! Youâll notice that you canât really do anything interesting with Sâ statements. Theyâre just used to set up the environment as needed for the blockâs invocation, which is where real computation happens.
Each ends with _exactly one_ invocation:
-> value branch;
This _removes_ the value named `value` from the environment, and passes control to its branch named `branch`. (Itâs an error if thereâs no value in the environment named `value`, or if that value isnât an invokable, or if that invokable doesnât have a branch named `branch`.)
Thereâs a shortcut syntax for invoking a branch with an empty name:
-> value;
Whatever values are still in the environment (after removing the invokable that weâre about to pass control to) are provided to the invokable as its inputs. If the invokable is an Sâ closure, then the `receiving` clause of the selected branchâs block must match the set of names that are in the environment that are about to be provided as input.
Sâ modules can be used as Swanson units. The module has a name, which is also used as the name of the unit. The first block in the module is its _loader block_, which is invoked to load the unit. The loader blockâs `containing` set must be empty.
The loader blockâs `receiving` set defines the dependencies of the unit. Each name is treated as the name of some other Swanson unit. The host will load those dependencies, and put them into the environment as inputs before invoking the loader block (just like for the `receiving` set of any other block).
The input named `$loaded` is handled specially. Instead of loading a unit named `$loaded` as a dependency, the host provides a special invokable for this input, which the loader block will invoke with the âvalueâ of the module (as an output named `$module`). Our example module from up above is a Swanson unit that produces a new atom when loaded:
module test { $load: containing () receiving ($loaded) { $module = atom; -> $loaded; } }
Putting it all together, this is an example module that:
module bool.can_evaluate_true { $load: containing () receiving ($loaded, primitive.bool) { $module = closure containing (primitive.bool) -> main; -> $loaded; } main: containing (primitive.bool) receiving ($finish) { $return = closure containing ($finish) -> main@1; -> primitive.bool true; } main@1: containing ($finish) receiving ($_, $0) { primitive.bool = rename $_; value = rename $0; $return = closure containing ($finish, value) -> main@2; -> primitive.bool drop; } main@2: containing ($finish, value) receiving () { $evaluate = closure containing ($finish) branch true = main@2$evaluate$true, branch false = main@2$evaluate$false; -> value evaluate; } main@2$evaluate$true: containing ($finish) receiving () { -> $finish succeed; } main@2$evaluate$false: containing ($finish) receiving () { -> $finish fail; } }
Can you see why this isnât a language that youâd want to program in directly? In the next post, weâll learn about Sâ, which provides some helpful syntactic sugar, while still being a very low-level language.