💾 Archived View for vierkantor.com › xukut › manual › description.gmi captured on 2024-08-31 at 11:49:26. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-05-10)
-=-=-=-=-=-=-
This is an explanation of low-level design in XukutOS. We hope that the design works well enough that you can stick to higher-level stuff for day to day activities.
Planned: we might switch to a different, per-process allocation system. If so, the spirit of these docs will be kept but the details will not.
XukutOS partitions memory blocks into two main types: objects which consist of pointers to memory blocks, and flat data which is treated basically as a (very big) number. This OS-level partitioning of data is essential for the correct operation of the garbage collector; also your own programs need to know what the current data represents in order to do something with it. Swail does this by recording a description along with each object. The description contains enough information to run basic built-in systems, and you can use the description to record your own information. There is not much separating this notion of `description' from `types' as they are used in e.g. Python. I just dislike that "type" is such an overloaded concept.
XukutOS manual → Memory: explanation of memory blocks, flat data and objects
Take, for example, the humble `cons'. This is represented in memory as two words, each a pointer to another object. By itself, the contents of this piece of memory don't say that they are pointers; it could just be a coincidence that there is a large number there that divmodded by 2^bits-per-word just so happens to give the addresses of two objects. The garbage collector should know that they are indeed pointers, otherwise it might incorrectly decide to recycle the contents of that cons. The description of a cons notes that there are indeed 2 pointers stored within the cons.
Another important piece of information we need to know about an object is how to apply it. There are many different kinds of objects that can be applied to arguments: `fn's, compiled functions, continuations, ... These could all be special-cased in the evaluator, but that would make everything hard to extend well. Specifying how to apply an object by giving another object that can be applied in order to get the result, causes infinite loops. Instead, the description of each object contains a pointer to executable code, which will perform the application of the object on the arguments.
Here are some of the ways that systems try to record information about the kind of an object:
- The traditional approach is to reserve some bits in each word in memory, recording the kind of object in those words. Thus, a pointer to a generic object would have least significant bits 0b..000 (because the value it points to is word-aligned), a fixnum ends in 0b..1, a pointer to a cons would have least significant bits 0b..010, etc. This approach results in awkward implementations: to compute `x + y' represented as `0bxxxxxx1 + 0byyyyyy1', you need to get rid of the least significant bit before adding and add it back in later. It is also not so extensible, since you have maybe 8 different basic object types.
- We reserve some memory at the start of each block and we record there how to apply the object, which words in the object contain pointers, etc. The more components you add to your systems, the more memory you spend on each object. Storing a fixnum in your cons atomically is impossible, because you would need to write the fixnum and also update which words contain pointers.
- The approach currently used by XukutOS: we reserve some memory at the start of each block and write a pointer to the description in there. The description then contains pointers to code that return the necessary information given an object. This makes it easy and cheap to extend the basic information we want to know about objects (we only change how we work with descriptions, not each object), and results in more flexibility in how we store the information (if it's the same for all objects, just return that constant value; if it's stored in a field, just return the value of that field; if it's encoded in a compressed way, return the decompressed result).
- Instead of storing the description per-object, each area (or page or segment or ...) of memory contains only blocks that look the same: this area only contains conses, this one only fixnums, this one only 2-word long span blocks, etc. This is the efficient approach when many objects look exactly the same. Drawback is that many operations become a bit more indirect. Could be combined with the previous approach.
The in-memory layout of an object in memory will look something like:
-2w -1w 0 1w nw | data used by garbage collector | description | contents | more contents ... |
The pointer to the object, points to displacement 0 in the diagram, so that the "user" data in the object has nonnegative index and "system" data negative index. The -1'th word in the object points to the description of the object, along with some bits used by the memory system.
XukutOS manual → Memory: bits in description data and their use in the memory system.
A description is itself also an object, and contains the following fields:
- appfn-asm: code using Swail-like calling convention; `o0' contains object and becomes result, other registers as in Swail calling convention. Apply the object to the list of arguments in `args'.
- appfn-oar: code using OAR calling convention. Apply the object to the list of arguments (passed according to OAR calling convention), and return the result of applying (returned according to the OAR calling convention).
- get-pointer: given block in `o0' and index in `n0' (with default value `0'), return in `o2' the next object pointed to and `n0' the next index to check for pointers. If `n0 = 0' there are no more pointers (and the value of `o0' should be `nil).
- set-pointer!: given block in `o0', index in `n0' and value in `o2', set
You only need to define either of the two `appfn's per description, and the other can be set to `appfn-oar-to-asm' (if you already have the `appfn-oar' and need to fill in `appfn-asm') or `appfn-asm-to-oar' (vice versa).
If you really need the extra efficiency, it might be worth the effort to define both `appfn's.
For the `pointer' fields, suitable built-in values are `get-pointer-flat'/`set-pointer-flat!' (for flat objects) and `get-pointer-full'/`set-pointer-full!' (for objects completely consisting of pointers)
Any questions? Contact me:
By email at vierkantor@vierkantor.com