💾 Archived View for gary.vern.cc › tinylog.gmi captured on 2024-09-29 at 00:07:30. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-08-18)
-=-=-=-=-=-=-
author: @gvv@noc.social
avatar: 🐐
license: CC SA
2024 V3 Roadmap:
If you come to read about my December Adventure 2023, you can find just those right here!
I often re-post (summaries of) these on mastodon too:
https://noc.social/@gvv/111507184640305505
I'm six weeks behind on my options trading classes... but spent a few hours writing the beginnings of an ISA execution test harness for V3, which was a lot more fun!
The posts in here have become far from tiny! In future I'll put the long form posts in the new gemlog directory, and keep these at a more digestible size!
V3 specification matches the implementation again
Started a Gemlog for post DecemberAdventure V3 musings
Day 三十二 of #DecemberAdventure
Would you be surprised to learn that I changed the assembly syntax again? And more or less rewrote the whole assembler and disassembler to implement it. I found the Dusk-OS inspired instruction modifiers cumbersome, so now everything that is not an operation mnemonic gets saved to the stack, and then each operation decides how many and what types of operands it expects, and then complains or writes out an opcode.
I'm a day late, and only have about 80% of the op-codes implemented in my V3 emulator using, so far, 118 lines of C for the main loop of nested switch statements... but, I have a working hello world at last!
This is the end of DecAdv2023, but over the coming days and weeks I feel confident that I can finish implementing the missing op-codes in the emulator, update the spec (yet again!) with the final assembly syntax, and continue forward with the stretch goals above, and beyond. I've procrastinated over doing this for a few years, so it's awesome to finally have something tangible. I'm sure I'll take part again in 2024.
$ cat hello.vas :_start %d strs.hello mov, <print-err> call, $0f $82 out, :<print-err> ; (str*--) while jmp, .do $13 @d+ out, .while @d $0 cmp, do jne, .end exit, :strs .hello "Hello, 32 , "World! 10 , 0 , $ lua v3asm.lua hello.vas hello.v3 assembled 39 cells, with 7 labels $ make printf '%s\n' -g -O0 -std=c99 -I. > compile_flags.txt cc -I. -g -O0 -std=c99 -o v3.o -c v3.c cc -o v3 v3.o $ ./v3 hello.v3 Hello, World!
Day 二十七 of #DecemberAdventure
Not much coding time the last few days, but I have:
updated the specification again, accordingly
Day 二十三 of #DecemberAdventure portal.mozz.us/gemini/gary.ver…
Added more tests today, which made me realize that I haven't implemented the `using` pseudo-operator in the bootstrap assembler 😳 Maybe its fine, as long as I don't need it for the self-hosted assembler?
Also, dusted off the remnants of an earlier experiment in making a VM in C and started code a new main loop for the V3 instruction set.
And cleaning up inconsistencies in the spec
Day 二十二 of #DecemberAdventure
Not too much coding time today. I was busy panic shopping for the Holidays. But...
All 347 unit tests written with the latest read syntax pass when using the rewritten bootstrap assembler, and using the new strategy of leveraging unused bits in the top-of-stack in-progress opcode to track what arguments have already been processed. One last thing before reviving the emulator: Now that those status bits tell us whether we already saw the source register, I'd rather avoid the spurious leading `>` on destination register arguments.
Possibly, optional destination (and any modifiers) followed by source (and any modifiers) can be canonicalized after reading the operation mnemonic to match how Forth normally orders arguments to non-commutative operations such as `divmod`? I'm postponing that decision until I write the assembly assembler though.
Day 二十一 of #DecemberAdventure
I spent the last couple of days adding test cases for the assembler, and fixing all the corner cases while becoming dismayed at the growing complexity. I plan to rewrite it all in assembly language as soon as I have a fully working Lua implementation, so that complexity is a multiplier on the required effort. But, today, I had an idea for how to simplify the internal logic significantly...
All the microcoded instructions are implemented as four individual nibbles:
1. a source register or reference
2. a destination register or reference
3. some flag bits: exit, post-increment, direct addressing etc
4. the ALU operation
In the assembly syntax, the operation is always parsed last, so I can use these 4-bits to keep track of what has already been seen and then swap them out for the operation nibble as the final step of assembling the instruction:
1. source nibble set?
2. destination nibble set?
3. flag bits set?
4. address mode set?
I think this will help immensely with de-spaghettification of all the code for deciding how to parse the incoming modifiers in context of what was already seen, where I've been relying on stack juggling and other dirty tricks that are hard to get right given that all values (including zero!) are valid in every instruction nibble.
Day 十八 of #DecemberAdventure
The Lua bootstrap assembler is essentially rewritten now, and while the implementation uses more than 3 times as many lines of code, this version consists almost entirely of nested tables of very similar tiny functions. The rest of the code is providing the data stack and memory model of the V3 virtual machine, so when I get to translating the Lua bootstrap assembler into a assembly language, I expect to be able to mechanically translate from Lua into assembly language and use the same table driven architecture.
All the tests need converting to the new syntax, and any bugs that flushes out will need fixing. Then I can finish the emulator and example programs and check that everything assembles and runs.
Day 十七 of #DecemberAdventure
Disassembler is now emitting much more Forth friendly source-code inspired a bit by DuskOS HAL, which should make writing the real assembler a lot more fun. Lack of round tripping has totally broken the testsuite I made yesterday though 😢:
~~~~~ old asm syntax ~~ hex object code ~~~ new disassembler syntax ~~~ :post-mnemonics deo %t, %b => $008b => %b out, dei %t, %b => $108b => %b in, mov %t, 2 => $2582 => 2 i) mov, inv %d, %b => $30db => %b >%d inv, and %b, 5 => $40b0 $0005 => %b 5 i) and, .test test %t, %b => $478b => %b test, .stackops pick %d, 0 => $8040 => dup, roll %d, 1 => $9041 => swap,
I'm itching to finish updating the emulator so I can run some actual programs (more likely: debug some actual programs), but if I don't update the bootstrap assembler for the new syntax first, might bootstrap an initial runnable assembler for the emulator and then never go back and finish. Gotta be disciplined!
Day 十六 of #DecemberAdventure
Cleaned up the disassember today, and added some make rules to run each test file first through the assembler, then back through the disassembler and check that the hexadecimal values for every cell are unchanged by the round trip:
$ make check lua v3asm.lua tests/encode-post-bits.vas tests/encode-post-bits.v3 assembled 24 cells, with 2 labels sed -n -e '/;/{s,^.*; *,,;p;}' tests/encode-post-bits.vas > tests/encode-post-bits.good lua v3dis.lua tests/encode-post-bits.v3 | sed -n -e '/;/{s,^.*; *,,;p;}' > tests/encode-post-bits.out lua v3asm.lua tests/lambda.vas tests/lambda.v3 assembled 10 cells, with 4 labels lua v3dis.lua tests/lambda.v3 | sed -n -e '/;/{s,^.*; *,,;p;}' > tests/lambda.out lua v3asm.lua tests/mnemonics.vas tests/mnemonics.v3 assembled 78 cells, with 11 labels lua v3dis.lua tests/mnemonics.v3 | sed -n -e '/;/{s,^.*; *,,;p;}' > tests/mnemonics.out check: encode-post-bits ok check: lambda ok check: mnemonics ok
It then occurred to me that using a traditional assembly language syntax is a terrible fit for a virtual Forth CPU, so I'm re-reading Brad Rodriguez' Moving Forth papers to help me settle on a syntax that will work well with the eventual Forth implemetation I want to bootstrap in the V3 emulator. I think the change of syntax ought to be moderately straight forward if I start by adjusting the disassembler output given a binary object file, and then when I'm happy with the result, adjust the assembler to read that syntax ensuring that the binary content remains the same throughout...
Day 十五 of #DecemberAdventure
Yesterday's quick fix for encoding `)` as `mov+x %a, %a` did not work. "mov" operations change the flag register, where exit should not. Fixed that problem by changing the encoding for "exit" to a non-sensical bit pattern for "set" (which does not change the flag register). Tests also uncovered a bug if there's a label for the address of `)` (any "exit" operation in fact), and it gets fused with the previous instruction then that label no longer matches the address of the `)`... The assembler has to also use an unfused "exit" immediately following any label to take care of that.
I have been validating output of the bootstrap assembler by eye-balling a dump of every memory cell and what operation it would represent as printed by a kludgey partial dissasembler phase at the end of the assembler run. It doesn't even differentiate instructions from immediate arguments!
Today, I wrote a separate (but naive and completely unfactored) disassembler for binary output files from the assembler, in just 120 lines of Lua. I'm pretty confident that rewriting the core instruction dispatch loop for the emulator to catch up with all of the changes I made to the specification in the last few days will be in similarly few lines of C. In fact I'll tackle that next so that I can actually begin writing and running the real assembler in assembly language to assemble itself.
And then I'll implement read syntax for all `post` bits of the common instruction microcode layout, and add support for the `using` pseudo instruction to set the `a` and `b` flags for call operations. That should allow my remaining tests to pass, and I'll have a bit under a week to complete the adventure...
Day 十四 of #DecemberAdventure
As I add more tests, I'm finding (and fixing) more bugs in the specification.
Here's the latest in progress version.
Turns out that relative addressing is of no use in V3, because:
1. Using an optional direct relative address operand means I have to know the relative distance to the destination before choosing a 1-cell direct operand instruction or a 2-cell immediate operand instruction;
2. That choice changes the destination address for forward jumps (multiplied by any other jumps in between), which would need a multi-pass assembler. Too much complexity for a project focussed on simplicity;
3. I could also use a 2-cell immediate operand instruction for relative jumps and give up some compactness, but since the the operand cell can already address all of memory, relative addressing would be slower (read program counter, read operand, add, write new program counter vs read operand, write new program counter) and add complexity to the instruction set, the assembler and the emulator;
4. Ergo, all label references are now assembled to an absolute address, which is faster and much simpler all around -- but, at the cost of longer instruction encoding for call and jmp.
With the freed up instruction slot in the ISA, I split "jmp" and "set" into separate instructions, and simplified to only 3 micro-coded bit layouts (from 5 previously):
|f e d c|b a 9 8|7 6 5 4|3 2 1 0| +-------+-------+-------+-------+ |cal/jmp|x|c|b|a| flags | m/src | | set |x| post| flags | s/dst | |therest|x| post| m/dst | m/src |
Additionally, for nested lambdas that end with consecutive `)` instructions (encoded as `exit`), learned that exit-fusion rolls all of those consecutive exits up onto the same last instruction cell by default, leaving the return stack unbalanced afterwards. Fixed by encoding `)` as `mov+x %a, %a` whenever the last instruction already has the x-bit set.
Day 十三 of #DecemberAdventure
Was starting to think that testing the bootstrap assembler so thoroughly was not the best use of time -- but, actually, writing a table-driven assembler is good practice for what will be the first program I write in assembly code, and I'm making a test-suite that will check that work too.
Test coverage is currently around 50%, but currently failing on relative jumps, so that's the next thing to fix before finishing comprehensive tests.
I added support for lambdas, where
:execute-lambda ( ; start anonymous function "counted string" ; anonymous string deo console.puts %d ; pop string address and write to console ) ; end anonymous function => exit call %d ; call it by address left on data stack exit ; fused into the previous instruction
In the example above, I'm toying with doing something similar for counted strings: assemble a forward jump over the following in-place counted string that leaves the start address on the data stack to be passed to the console string output port by the following `deo` instruction.
I also rearranged the ISA bits (again!) so that $0000 $0000 machine code means `deo system.fatal errorcode.null-pointer-dereference`. Hoping that will prevent some execution overruns, where it used to mean `mov %a, %a` (a null operation) so the instruction counter would run through the end of the memory segment, wrap around back to address $0000 and then start trying to execute the memory mapped registers and stacks as opcodes! 🔥
Day 十二 of #DecemberAdventure
After yesterday's whack-a-bug, and now that I'm happy with the instruction bit-layout, I decided to take a more disciplined approach to arriving at a bug free bootstrap assembler and emulator today. Starting with a very long assembly language input file for the assembler that aims to test all the instruction and argument encodings behave. Haven't figured out a strategy for testing the emulator -- maybe I'll get inspiration from the #uxn test rom...
While putting together the tests, I found a spare bit in the common instruction microcode layout that I'm not using any more, so repurposed that to flag the last nibble of the instruction to be a direct operand (rather than a memory/register reference). Not that there's much room in a nibble for direct operands, but encoding powers of two and one less than powers of two means a lot of common constants should be available as a direct value rather than spending 2 bytes of space in the instruction stream for an immediate operand.
I updated the specification with details.
Before I think too hard about testing the emulator, I'll first need to provide good coverage from the assembly test file, and that will also entail fixing label references (hence jumps and calls) to be relative. The idea is to make all code relocatable by default.
#DecemberAdventure 十一 Finished the V3 emulator, adding clarifications to the spec wherever I found the need. 120SLOC for the main loop. I assembled some small sample programs to run, and have been fixing bugs in assembler and emulator alike to get them working end to end ever since.
I think I'll add a rudimentary step debugger to the emulator next to save myself cycling the main loop in lldb over and over again to find the remaining bugs.
Also considering working towards having the emulator use the (non-bootstrap) assembler to assemble sources directly into memory ready for execution. I think that's a much neater way of avoiding endianness issues than changing the object file format.
#DecemberAdventure 十 No adventuring yesterday -- spent the day working out, Christmas shopping and teaching Kung Fu.
Today, I added some examples of assembly code as will as fixing all the inconsistencies I could find in the V3 Forth CPU spec I posted yesterday, then updated the Lua assembler to conform. I had hoped that I could sidestep endianness issues by only using 16-bit cells in a contiguous address space, but of course when writing those values to a binary object file, and when loading that object file back into memory endianness matters. For convenience, I'm currently using little-endian ordering for object files just to get things working, but I'm actually thinking of using the Unix `xxd` hex dump format or something similar instead of binary. But that will more than double the size of object files, so that might turn out to be a terrible trade off?
#DecemberAdventure 八 I went on a side quest to set up gemini hosting instead of coding today. Shout out to vern.cc Pubnix for easiest registration and setup! @eli_oat@tenforward.social kindly appended a proxied link from his https://eli.li/december-adventure page.
Perhaps more interesting:
I also put up the draft spec for the forth CPU I've been working on.
Note that I've flip-flopped on several design points, so there are probably still some inconsistencies.
Answering yesterday's question about direct addressing in deo/dei: it simplifies the emulator and the assembler to use the common micro-code format here, so now deo/dei instructions are the same as regular mov, only deo destination is in device address space and dei source is from device address space. All the other bits are the same, so x-bit fusion, auto-increment etc are consistent with the rest of the instruction set.
#DecemberAdventure 七 Finished typing in enough of an emulator to execute common instructions, but it still needs testing and debugging. Also found instruction set quirks after starting assembler in itself: With the x-bit used to distinguish call and jmp, I can't encode a conditional jmp followed by subroutine exit in one instruction as I had intended.
I'm also not thrilled with deo using direct addressing, because that takes up bits used for autoinc that make writing strings easier. What to do?
#DecemberAdventure 六 Answers for yesterday's questions:
1. 64k is small enough already, dei/deo with devices in their own address space;
2. tuck stack ops into otherwise invalid bit patterns, e.g:
mul %t, 2 ; %t *= 2 (double top of data stack) mul %d, 2 ; %w = pop %d * 2; push %d, %w (also doubles top of data stack)
The second of those is a slow way of doing the first, so I'm using that bit pattern for a stack only operation:
pick %d, 2 ; aka over
And, of course, I re-ordered the microde bits again.
And updated the documentation to match.
And made a skeleton for the emulator in C. But need to update the assembler again first...
#DecemberAdventure 五 rewrote documentation to keep up with tweaks I made to the instruction microcodes while implementing the bootstrap assembler; but putting off some final decisions until I have more assembly code written...
1. do I memory map devices to 0-page for efficient access with fetch/store direct operands, or rename to dei/deo and have their own address space?
2. use an operation slot for stack rotation instructions for simplicity, or tuck them in to otherwise invalid bit patterns?
#DecemberAdventure 四 finished enough of the Lua bootstrap assembler to assemble some examples that match my hand-assembled results. Back patching forward jumps was two lines of Lua:
1. add unknown forward label refs to fwdaddr->name map in first pass;
2. at the end, write address in final labeladdrs[name] into each fwdaddr.
An earlier implementation in C took hundreds of LOC. Lua tables are so great! 🥰 ...nonetheless, going back to C for writing an emulator. But first, assembling relative addrs!
#DecemberAdventure 三 J1/H2 Forth CPUs use a dedicated opcode 'x' bit to return from a subroutine call in the same clock cycle as its final instruction. But JMP & CALL repurpose that bit towards the target address -- you can't set the instruction pointer twice at the same time (RET from a subroutine and CALL another at once). Added x-bit fusion to my assembler, and learned that fusing CALL followed by RET sets an x-bit and makes a JMP instruction that gives us tail call elimination for free!!
Today's #DecemberAdventure: Repurposed a Lua tokenizer I wrote for an old project and made progress on the parser to assemble 1 of the 4 microcode instruction types I need (16 bits as 4 nibbles (nybbles?) for: opcode, dest register, 4 ALU flag bits, and src register). For simplicity, I decided not to use LPEG to bootstrap, and consequently already >200 LOC 🤯 Glad I decided against using C! Applied for a gemini hosting account, so I'll post more details there when I've finished registration...
#DecemberAdventure kickoff... Previously, I've noodled with an archival virtual machine spec for years, but never implemented anything. This seems like a fun way to get some traction without over-committing. So, following the most recent design for a "move machine" with 16-bit micro-coded machine instructions, I had already written some example code to output a 16-bit unsigned value as a 4-character hex string. Today I started on a simplistic recursive descent assembler in Lua to bootstrap...