XukutOS manual → Memory design

This is more of a design overview and implementation guide than an instruction manual.

There is also a manual page for working with the memory system from a userspace programmer's perspective.

XukutOS is designed around communicating sequential processes written in Lisp. As such, the basic design will be that all processes and the kernel, share a single memory space which is garbage collected (GCd). One of the major constraints is that all processes are interruptable at any point, as far as the architecture allows. (It's not completely determined yet how much ordering we require between processes.) Moreover, there are no (userspace) memory barriers: as far as code is concerned, loads and stores are just that. The single memory space might change in the future if there turns out to be a simple way to implement that.

More specifically, the XukutOS memory model is that memory is divided into *blocks*, which is the kernel-level concept corresponding to a Lisp object. Blocks each have a specific size and description. The description specifies the type of the block, which is used by Lisp, and it allows the garbage collector to determine the list of pointers contained within the block. The rest of the data is then numbers without intrinsic meaning: *flat*. So a cons cell is a block of size two words, with description `cons' that says the two words are both pointers, and a bignum is a block of arbitrary size with description `int' that says the contents are all flat.

Blocks have a few attributes, set at allocation time. The most important of these is when a block is *permanent*. A permanent block will never be moved or deallocated by the garbage collector. (In fact, deallocation of permanent blocks is not implemented at all yet!) They are the roots of the collection. An important example of permanent blocks are the objects that the kernel uses to record process state, process stacks, and all code that the kernel runs. It might turn out to be useful to add a notion of immoveable block that may be deallocated but not moved by the GC.

Every pointer must be a honest address pointing to a block. We don't allow tagging tricks where the low-order bits indicate the typ:, all this information is stored with the actual block. An important motivation is that tagged words need to be untagged before an operation or retagged afterwards, and this would violate the strict consistency rule.

Instead, registers and stack are split between object-valued (o0-o14) and number-valued (n0-n14) (+ 2 stack pointers). It might remind you of the M68k. Register-register operations and load-store operations are only allowed if the kinds match, or in the following special cases:

* Turning a pointer into flat data is always allowed. However, the resulting number is not guaranteed to have a lasting meaning, unless the object is permanent. The object being pointed to may have been deallocated or moved in the meantime.

* Turning flat data into a pointer is allowed only when the flat data came from a pointer to a permanent object and this object has not been deallocated in the meantime. (Similar things would be allowed for immoveable blocks.)

We still need to figure out how pointer arithmetic precisely fits in these rules.

Allocation happens in two steps: a process has a collection of memory regions that it can directly allocate into. A region owns a bit of memory space (measured in pages, the same as the processor's MMU uses) that can fit up to 512 blocks and can scale up to hold blocks of practically unlimited size. If the process's regions are exhausted, it performs a system call to get new regions from the kernel. The kernel works similarly, except it can directly obtain new regions. Regions keep track of which of their blocks are used and marked, and record the description of the objects inside. Regions are not designed to be safely shared between processes.

To allow processes to allocate without freeing explicitly, the garbage collector runs periodically. It is implemented as a mix between mark-and-sweep and a copying collector. The current implementation stops the world, but this should be fixable. Starting with the permanent blocks, the GC marks every object pointed to, then when marking is done, it deallocates every unmarked block. So far, a classic mark and sweep.

A complication: once a region fills up, a process wants to stop trying to allocate into it. Otherwise things will start to run slower and slower as the process has to go through each region checking that it is indeed full, before it tries getting new regions from the kernel. In addition, processes might stop while leaving behind a nearly unused region. So when a region fills up or its owning process stops, the garbage collector will evacuate it: any remaining objects are moved to and the memory is returned to the kernel. More specifically, once the world is stopped, the GC will make a copy of each remaining object in the region, and during the marking phase, all pointers to the old copy are rewritten to point to the new copy instead. Once marking is done, there are no more references to the old region and we can safely deallocate it.

A complication on the complication: since regions are represented by regular old objects, and these are updated during the marking phase, how do we make sure the GC uses the new copies of this region? Moreover, how do we ever have hope for parallelizing this? Here we do use a memory barrier: when a region gets evacuated, its pages are remapped, so any attempt to access the memory described by the region causes an interrupt. The kernel handles the interrupt by redirecting accesses to the new copies of the blocks.

We use a similar trick for big allocations: all pointers mentioned thus far use virtual addresses. When memory is accessed, the processor maps them to physical addresses, such as actual RAM, using the *page table* datastructure. But why does each potential virtual address need a physical address? Physical addresses are limited to whatever RAM the computer has access to, virtual addresses can span up to 52 bits or so depending on your architecture. We can assign physical addresses lazily, instead, only when needed. So if you allocate a 1 TB object but will only ever write to 1 MB, only that MB will require physical addresses and the rest can stay virtual only.

The kernel handles this by having any unassigned memory access trigger an interrupt handler, that assigns a new page of physical memory and retries the access. In fact, an attempted read will always insert the same read-only page filled with zeroes, with its own interrupt handler for writes that only then assigns a new unused page to that address. So we save even more copying.

Any questions? Contact me:

By email at vierkantor@vierkantor.com

Through mastodon @vierkantor@mastodon.vierkantor.com

XukutOS user's manual

XukutOS main page