Thoughts on separate compilation (part 3)

In previous parts I showed that separate compilation increases resulting binary size and discussed approaches that allow converting some build system based on separate compilation to one-file one.

(part 1)

(part 2)

Today I am talking of my new experiment called "lodestar", which contains couple small C programs linked against custom C library with minimal runtime. Source is organized one function per file, which simplifies dependency tracking and makes sure combining sources does not have issues described in second part of this series.

https://git.sr.ht/~kaction/lodestar

Every program is compiled in three different ways:

* suffix .exe: normal separate compilation * suffix .bin: combine all sources, compile them and link to custom crt.o * suffix .out: translate .S compiler output into .fasm and compiling everything with fasm(1) assembler

Results:

-rwx--x--x    1 kaction  kaction       9152 Sep  1 17:51 lodestar/main/hello.bin
-rwx--x--x    1 kaction  kaction       9152 Sep  1 17:51 lodestar/main/hello.exe
-rwx--x--x    1 kaction  kaction        182 Sep  1 18:41 lodestar/main/hello.out
-rwx--x--x    1 kaction  kaction       9544 Sep  1 17:51 lodestar/main/mini-stat.bin
-rwx--x--x    1 kaction  kaction       9840 Sep  1 17:51 lodestar/main/mini-stat.exe
-rwx--x--x    1 kaction  kaction       1548 Sep  1 18:41 lodestar/main/mini-stat.out
-rwx--x--x    1 kaction  kaction       9608 Sep  1 17:51 lodestar/main/readdir.bin
-rwx--x--x    1 kaction  kaction       9808 Sep  1 17:51 lodestar/main/readdir.exe
-rwx--x--x    1 kaction  kaction       1214 Sep  1 18:41 lodestar/main/readdir.out
-rwx--x--x    1 kaction  kaction      10208 Sep  1 18:41 lodestar/main/sha256.bin
-rwx--x--x    1 kaction  kaction      10512 Sep  1 18:41 lodestar/main/sha256.exe
-rwx--x--x    1 kaction  kaction       2108 Sep  1 18:41 lodestar/main/sha256.out

First observation is that ld(1) brings about 8Kb of overhead, but it is fixed. Secondly, combined compilation can save around 10-15% of raw code volume -- difference between sha256.bin and sha256.exe is 300 bytes, while raw code volume is around 2Kb.

Now, let's talk about performance. I'll analyze sha256, since everything else is spends most of the time in kernel space. So, let's calculate sha256 of 256Mb of random junk:

$ fallocate -l256M junk.txt
$ # This is system busybox implementation, provided by Alpine=3.15
$ time sha256sum < junk.txt
a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484  -
real   0m 1.40s
user   0m 1.37s
sys    0m 0.01s
$ time ./main/sha256.exe < junk.txt
a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484
real   0m 1.43s
user   0m 1.42s
sys    0m 0.01s
$ time ./main/sha256.bin < junk.txt
a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484
real   0m 1.40s
user   0m 1.39s
sys    0m 0.01s
$ time ./main/sha256.out < junk.txt
a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484
real   0m 1.46s
user   0m 1.43s
sys    0m 0.03s

Numbers are very stable, with almost no variation.

Binary compiled combined is both smaller and faster than binary compiled separately, which in turn faster than binary compiled with fasm. I think it is explained by inlining opportunities in first case and alignment considerations in second one.

In perfect world, combined compilation would be one of the parts of what constitutes "release", as opposed to "development", build. Unlikely to happen, unfortunately.