Code Density Compared Between Way Too Many Instruction Sets

A lot of code density claims I see online - "RISC-V code density is best in class", "x86 code density is better than any RISC", - have always struck me as unlikely and inconsistent with what I've seen in the trenches. After I tried and failed to find a modern comparison with a broad range of instruction sets, I decided to run my own. The cool-kid approach to this would be to use SPEC or similar, and look at density alongside dynamic and static instruction counts, but I have a deep-seated loathing of both SPECtools and the subtests themselves and had no desire to try to make them build for m68k or Xtensa. (nb: SPEC is actually a great benchmark - the best available. It just isn't always much fun.) Instead, I did it the janky way: do a buildroot run, with -Os and as few changes to default settings as possible, and count the bytes in the busybox executable. The results were unsurprising in places - Thumb2 being excellent, for instance - but I was surprised to see just how terrible the density of the "classic" RISCs is.

Without further ado, here's the table. CSky didn't finish building and I wasn't particularly in a mood to diagnose it, so it's not included. (Sorry, CSky.) Every other supported ISA has at least one result, with little-endian preferred as well as an attempt at matching a common embedded config when further options were presented.

Observations