In previous parts I showed that separate compilation increases resulting binary size and discussed approaches that allow converting some build system based on separate compilation to one-file one.
Today I am talking of my new experiment called "lodestar", which contains couple small C programs linked against custom C library with minimal runtime. Source is organized one function per file, which simplifies dependency tracking and makes sure combining sources does not have issues described in second part of this series.
https://git.sr.ht/~kaction/lodestar
Every program is compiled in three different ways:
* suffix .exe: normal separate compilation * suffix .bin: combine all sources, compile them and link to custom crt.o * suffix .out: translate .S compiler output into .fasm and compiling everything with fasm(1) assembler
Results:
-rwx--x--x 1 kaction kaction 9152 Sep 1 17:51 lodestar/main/hello.bin -rwx--x--x 1 kaction kaction 9152 Sep 1 17:51 lodestar/main/hello.exe -rwx--x--x 1 kaction kaction 182 Sep 1 18:41 lodestar/main/hello.out -rwx--x--x 1 kaction kaction 9544 Sep 1 17:51 lodestar/main/mini-stat.bin -rwx--x--x 1 kaction kaction 9840 Sep 1 17:51 lodestar/main/mini-stat.exe -rwx--x--x 1 kaction kaction 1548 Sep 1 18:41 lodestar/main/mini-stat.out -rwx--x--x 1 kaction kaction 9608 Sep 1 17:51 lodestar/main/readdir.bin -rwx--x--x 1 kaction kaction 9808 Sep 1 17:51 lodestar/main/readdir.exe -rwx--x--x 1 kaction kaction 1214 Sep 1 18:41 lodestar/main/readdir.out -rwx--x--x 1 kaction kaction 10208 Sep 1 18:41 lodestar/main/sha256.bin -rwx--x--x 1 kaction kaction 10512 Sep 1 18:41 lodestar/main/sha256.exe -rwx--x--x 1 kaction kaction 2108 Sep 1 18:41 lodestar/main/sha256.out
First observation is that ld(1) brings about 8Kb of overhead, but it is fixed. Secondly, combined compilation can save around 10-15% of raw code volume -- difference between sha256.bin and sha256.exe is 300 bytes, while raw code volume is around 2Kb.
Now, let's talk about performance. I'll analyze sha256, since everything else is spends most of the time in kernel space. So, let's calculate sha256 of 256Mb of random junk:
$ fallocate -l256M junk.txt $ # This is system busybox implementation, provided by Alpine=3.15 $ time sha256sum < junk.txt a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484 - real 0m 1.40s user 0m 1.37s sys 0m 0.01s $ time ./main/sha256.exe < junk.txt a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484 real 0m 1.43s user 0m 1.42s sys 0m 0.01s $ time ./main/sha256.bin < junk.txt a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484 real 0m 1.40s user 0m 1.39s sys 0m 0.01s $ time ./main/sha256.out < junk.txt a6d72ac7690f53be6ae46ba88506bd97302a093f7108472bd9efc3cefda06484 real 0m 1.46s user 0m 1.43s sys 0m 0.03s
Numbers are very stable, with almost no variation.
Binary compiled combined is both smaller and faster than binary compiled separately, which in turn faster than binary compiled with fasm. I think it is explained by inlining opportunities in first case and alignment considerations in second one.
In perfect world, combined compilation would be one of the parts of what constitutes "release", as opposed to "development", build. Unlikely to happen, unfortunately.