Mecrisp Stellaris: Code Size Optimization for Constants

I picked up a few more interesting bits of insight in working on this I2C driver project. Well, interesting to me at least.

Loading 32 bit numbers in registers

With the ARMv6-M architecture you can't simply drop a 32-bit value into a register, even though the registers are 32-bit registers. What I mean is, you can't drop the value in as an IMMEDIATE value, that is to say, as part of your processor instruction. You can load a 32-bit value from a memory address, say with the LDR instruction. But if you want an immediate value dropped in, you are limited to 8-bits, which is the size of the immediate field in encoding T2 of the ADDS instruction in ARM's Thumb instruction set.

What is the difference? Calls to memory are (I'm told) much more expensive, when compared to doing this with only direct manipulations on the register. If you want to load a 32-bit value without the memory call, the trick is to drop in the first byte, shift over the bits in the register by one byte, add the second byte, shift again, and so forth. So, for example, here is a simple word that drops DEADBEEFh on the stack, along with the dis-assembled code:

: foo $deadbeef ;  ok.
see foo 
200203BE: 3F04  subs r7 #4
200203C0: 603E  str r6 [ r7 #0 ]
200203C2: 26DE  movs r6 #DE
200203C4: 0236  lsls r6 r6 #8
200203C6: 36AD  adds r6 #AD
200203C8: 0236  lsls r6 r6 #8
200203CA: 36BE  adds r6 #BE
200203CC: 0236  lsls r6 r6 #8
200203CE: 36EF  adds r6 #EF
200203D0: 4770  bx lr
 ok.

The first two lines of assembly are storing the previous TOS value, and the last line is the return jump, and the rest is the adding/shifting which I described.

It is not so bad, though, if your constant is smaller, because than the Mecrisp compiler doesn't have to do as many add/shifts:

: foo $aa ; Redefine foo.  ok.
see foo 
2002040A: 3F04  subs r7 #4
2002040C: 603E  str r6 [ r7 #0 ]
2002040E: 26AA  movs r6 #AA
20020410: 4770  bx lr
 ok.

That is really only one instruction, not counting adjusting the stack and returning from the subroutine.

Constants

Constants in FORTH simply do what is described above:

$deadbeef constant myconst  ok.
see myconst 
200203EA: 3F04  subs r7 #4
200203EC: 603E  str r6 [ r7 #0 ]
200203EE: 26DE  movs r6 #DE
200203F0: 0236  lsls r6 r6 #8
200203F2: 36AD  adds r6 #AD
200203F4: 0236  lsls r6 r6 #8
200203F6: 36BE  adds r6 #BE
200203F8: 0236  lsls r6 r6 #8
200203FA: 36EF  adds r6 #EF
200203FC: 4770  bx lr
 ok.

I was surprised to find out that, in Stellaris, this constant code is automatically inlined. So, take this little word I coded, which uses a register constant, and helper word I2C_TAR' (which does not interested us here)...


$40044004 constant I2C0_IC_TAR
$40048004 constant I2C1_IC_TAR

: i2c_tar ( -- ) cr
    ." I2C0 " I2C0_IC_TAR i2c_tar'
    ." I2C1 " I2C1_IC_TAR i2c_tar' ;

Here is the disassembled code:

see i2c_tar 
2000B2CE: B500  push { lr }
2000B2D0: F7F7  bl  200027AA  --> cr
2000B2D2: FA6B  
2000B2D4: F7F7  bl  20002878  -->  .' I2C0 '
2000B2D6: FAD0  
2000B2D8: 4905  
2000B2DA: 4332  
2000B2DC: 2030  
2000B2DE: 3F04  subs r7 #4
2000B2E0: 603E  str r6 [ r7 #0 ]
2000B2E2: 2680  movs r6 #80
2000B2E4: 0336  lsls r6 r6 #C
2000B2E6: 3688  adds r6 #88
2000B2E8: 02F6  lsls r6 r6 #B
2000B2EA: 3604  adds r6 #4
2000B2EC: F7FF  bl  2000B238  --> i2c_tar'
2000B2EE: FFA4  
2000B2F0: F7F7  bl  20002878  -->  .' I2C1 '
2000B2F2: FAC2  
2000B2F4: 4905  
2000B2F6: 4332  
2000B2F8: 2031  
2000B2FA: 3F04  subs r7 #4
2000B2FC: 603E  str r6 [ r7 #0 ]
2000B2FE: 2680  movs r6 #80
2000B300: 0336  lsls r6 r6 #C
2000B302: 3690  adds r6 #90
2000B304: 02F6  lsls r6 r6 #B
2000B306: 3604  adds r6 #4
2000B308: F7FF  bl  2000B238  --> i2c_tar'
2000B30A: FF96  
2000B30C: BD00  pop { pc }
 ok.

From a performance perspective this is good, but it is not good for code size: half of the instructions in the word (fourteen instructions, to be precise) are devoted to dropping two values on the stack!

The helpful folks at #mecrisp@irc.hackint.org suggested, as one option, overriding CONSTANT with the following, which is not inlined:

: constant <builds , does> @ ;

Now, after recompiling i2c_tar, the constants are only four-byte calls to other code:

see i2c_tar 
2000ABBE: B500  push { lr }
2000ABC0: F7F7  bl  200027AA  --> cr
2000ABC2: FDF3  
2000ABC4: F7F7  bl  20002878  -->  .' I2C0 '
2000ABC6: FE58  
2000ABC8: 4905  
2000ABCA: 4332  
2000ABCC: 2030  
2000ABCE: F7FF  bl  2000AB72  --> I2C0_IC_TAR
2000ABD0: FFD0  
2000ABD2: F7FF  bl  2000AB20  --> i2c_tar'
2000ABD4: FFA5  
2000ABD6: F7F7  bl  20002878  -->  .' I2C1 '
2000ABD8: FE4F  
2000ABDA: 4905  
2000ABDC: 4332  
2000ABDE: 2031  
2000ABE0: F7FF  bl  2000AB9A  --> I2C1_IC_TAR
2000ABE2: FFDB  
2000ABE4: F7FF  bl  2000AB20  --> i2c_tar'
2000ABE6: FF9C  
2000ABE8: BD00  pop { pc }
 ok.