💾 Archived View for tilde.cafe › ~stack › gemlog › 2022-12-26.limits.gmi captured on 2024-08-31 at 13:06:16. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-07-22)

-=-=-=-=-=-=-

On limits of Code and Data

The history of general-purpose computing has been a never-stopping expansion of addressable memory space (and with it, bit-width of registers to access such memory). It is my conjecture that we've long passed the limits of reasonable.

I will stick to the practical aspects of Harn implementation -- based on much previous experience implementing 8-, 9-, 12-, 16-, 18-, 24-, 32- and 64-bit systems. When I say experience, I mean actual nose-to-the-ground coding in assembly, not setting a flag on some compiler and changing some #defines... (not that there is anything wrong with that...)

Fun tricks with memory

Harn, an x86-64 application in its current form, has some serious hardwired limits on the size of its code, data, and meta segments. In addition, the segments are pinned to specific locations:

Seg   Address     Size       All in hex...
===========================
CODE  40000000    40000000
DATA  80000000    40000000
META  C0000000    40000000

This is weird by modern standards, where we tend towards infinity. However, this is more than sufficient for my needs, and I dare say, any reasonable needs.

BTW, you can still malloc and mmap in the rest of the global memory space if you need 'large' data...

Benefits of this trickery.

Fixing the position of code and data in memory allows for a bunch of easy optimizations: all global addresses are constants (and harn takes care of relocations). And they are 32-bit constants! That means we can load addresses into registers with very short instructions.

Managing segments is easy: given a pointer, the top two bits indicate which segment it's in. Segment base is just a C0000000 mask away. And it makes the relocation engine much simpler.

All internal references within Harn can get by with 32-bit pointers, which makes a difference!

Code is small...

A gigabyte of code is an absurdly large amount of code. We've been -- spoiled? ruined? -- by 'modern' applications. I am talking about idiocies like the 150MB utility, recommended by Raspberry Pi people, which does exactly what a single-line 'dd' invocation does -- create a disk image. It is completely insane!

Anyway, aside from including the electron environment, code is pretty small. My lifetime of Forth programming is summed up in maybe a couple of hundred kilobytes (probably less).

Harn is currently around 40Kb - and that's a complicated C program that is more of an operating system, one that maintains a referentially-correct managed execution environment and can juggle functions and data around while collecting garbage.

A simple function that prints "Hello World", compiled into harn, is 24 bytes of code (12 bytes of which is the constant string compiled along with it).

void foo(){
    puts("Hello World");
}

>cc
ingested foo: extern void foo (void); 24 bytes at 0x40000E40
> hd 40000e40
0x40000e40 48 8D 3D 05 00 00 00 E9 A4 F8 FF FF 48 65 6C 6C H...........Hell
0x40000e50 6F 20 57 6F 72 6C 64 00 00 00 00 00 00 00 00 00 o World.........
>

I dare you to write a megabyte of code by yourself... I mean of course you can write a macro expanding to an absurdly large function, but why, really?

Small code is beautiful code, and fast code. Fitting into fewer cache lines is a big win, no matter how fast your processor.

Data is even smaller

By data I mean the data in the Harn data segment, which is static, global data. Data that needs to be there, at a fixed address, persistent. I mean, what would you put there?

Not the transient data you are processing, most likely. You are still free to malloc and mmap data in the rest of your multi-gigabyte memory, for transient stuff.

Harn data is special -- pointer variables which point to other Harn data or code are referentially managed and will always be correct. Pointing outside of Harn space is fine, of course, but in case of serializing/deserializing, you better fix these yourself!

Metadata is God's portion

As for metadata, a gig is plenty for me. Symbols take up tens of bytes, and we are not likely to have more than what - ten thousand? A hundred thousand? It's probably too much. Don't sweat it.

index

home