💾 Archived View for gem.librehacker.com › gemlog › tech › 20220808-0.gmi captured on 2024-12-17 at 10:38:48. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
I picked up a few more interesting bits of insight in working on this I2C driver project. Well, interesting to me at least.
With the ARMv6-M architecture you can't simply drop a 32-bit value into a register, even though the registers are 32-bit registers. What I mean is, you can't drop the value in as an IMMEDIATE value, that is to say, as part of your processor instruction. You can load a 32-bit value from a memory address, say with the LDR instruction. But if you want an immediate value dropped in, you are limited to 8-bits, which is the size of the immediate field in encoding T2 of the ADDS instruction in ARM's Thumb instruction set.
What is the difference? Calls to memory are (I'm told) much more expensive, when compared to doing this with only direct manipulations on the register. If you want to load a 32-bit value without the memory call, the trick is to drop in the first byte, shift over the bits in the register by one byte, add the second byte, shift again, and so forth. So, for example, here is a simple word that drops DEADBEEFh on the stack, along with the dis-assembled code:
: foo $deadbeef ; ok. see foo 200203BE: 3F04 subs r7 #4 200203C0: 603E str r6 [ r7 #0 ] 200203C2: 26DE movs r6 #DE 200203C4: 0236 lsls r6 r6 #8 200203C6: 36AD adds r6 #AD 200203C8: 0236 lsls r6 r6 #8 200203CA: 36BE adds r6 #BE 200203CC: 0236 lsls r6 r6 #8 200203CE: 36EF adds r6 #EF 200203D0: 4770 bx lr ok.
The first two lines of assembly are storing the previous TOS value, and the last line is the return jump, and the rest is the adding/shifting which I described.
It is not so bad, though, if your constant is smaller, because than the Mecrisp compiler doesn't have to do as many add/shifts:
: foo $aa ; Redefine foo. ok. see foo 2002040A: 3F04 subs r7 #4 2002040C: 603E str r6 [ r7 #0 ] 2002040E: 26AA movs r6 #AA 20020410: 4770 bx lr ok.
That is really only one instruction, not counting adjusting the stack and returning from the subroutine.
Constants in FORTH simply do what is described above:
$deadbeef constant myconst ok. see myconst 200203EA: 3F04 subs r7 #4 200203EC: 603E str r6 [ r7 #0 ] 200203EE: 26DE movs r6 #DE 200203F0: 0236 lsls r6 r6 #8 200203F2: 36AD adds r6 #AD 200203F4: 0236 lsls r6 r6 #8 200203F6: 36BE adds r6 #BE 200203F8: 0236 lsls r6 r6 #8 200203FA: 36EF adds r6 #EF 200203FC: 4770 bx lr ok.
I was surprised to find out that, in Stellaris, this constant code is automatically inlined. So, take this little word I coded, which uses a register constant, and helper word I2C_TAR' (which does not interested us here)...
$40044004 constant I2C0_IC_TAR $40048004 constant I2C1_IC_TAR : i2c_tar ( -- ) cr ." I2C0 " I2C0_IC_TAR i2c_tar' ." I2C1 " I2C1_IC_TAR i2c_tar' ;
Here is the disassembled code:
see i2c_tar 2000B2CE: B500 push { lr } 2000B2D0: F7F7 bl 200027AA --> cr 2000B2D2: FA6B 2000B2D4: F7F7 bl 20002878 --> .' I2C0 ' 2000B2D6: FAD0 2000B2D8: 4905 2000B2DA: 4332 2000B2DC: 2030 2000B2DE: 3F04 subs r7 #4 2000B2E0: 603E str r6 [ r7 #0 ] 2000B2E2: 2680 movs r6 #80 2000B2E4: 0336 lsls r6 r6 #C 2000B2E6: 3688 adds r6 #88 2000B2E8: 02F6 lsls r6 r6 #B 2000B2EA: 3604 adds r6 #4 2000B2EC: F7FF bl 2000B238 --> i2c_tar' 2000B2EE: FFA4 2000B2F0: F7F7 bl 20002878 --> .' I2C1 ' 2000B2F2: FAC2 2000B2F4: 4905 2000B2F6: 4332 2000B2F8: 2031 2000B2FA: 3F04 subs r7 #4 2000B2FC: 603E str r6 [ r7 #0 ] 2000B2FE: 2680 movs r6 #80 2000B300: 0336 lsls r6 r6 #C 2000B302: 3690 adds r6 #90 2000B304: 02F6 lsls r6 r6 #B 2000B306: 3604 adds r6 #4 2000B308: F7FF bl 2000B238 --> i2c_tar' 2000B30A: FF96 2000B30C: BD00 pop { pc } ok.
From a performance perspective this is good, but it is not good for code size: half of the instructions in the word (fourteen instructions, to be precise) are devoted to dropping two values on the stack!
The helpful folks at #mecrisp@irc.hackint.org suggested, as one option, overriding CONSTANT with the following, which is not inlined:
: constant <builds , does> @ ;
Now, after recompiling i2c_tar, the constants are only four-byte calls to other code:
see i2c_tar 2000ABBE: B500 push { lr } 2000ABC0: F7F7 bl 200027AA --> cr 2000ABC2: FDF3 2000ABC4: F7F7 bl 20002878 --> .' I2C0 ' 2000ABC6: FE58 2000ABC8: 4905 2000ABCA: 4332 2000ABCC: 2030 2000ABCE: F7FF bl 2000AB72 --> I2C0_IC_TAR 2000ABD0: FFD0 2000ABD2: F7FF bl 2000AB20 --> i2c_tar' 2000ABD4: FFA5 2000ABD6: F7F7 bl 20002878 --> .' I2C1 ' 2000ABD8: FE4F 2000ABDA: 4905 2000ABDC: 4332 2000ABDE: 2031 2000ABE0: F7FF bl 2000AB9A --> I2C1_IC_TAR 2000ABE2: FFDB 2000ABE4: F7FF bl 2000AB20 --> i2c_tar' 2000ABE6: FF9C 2000ABE8: BD00 pop { pc } ok.