💾 Archived View for gemini.kaction.cc › log › 2022-07-14.1.gmi captured on 2024-08-18 at 17:21:06. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-02-05)

-=-=-=-=-=-=-

Thoughts on separate compilation (part 1)

C programming language has concept of separate compilation which means that it is possible to have program source in multiple files, compile them separately and than link them together into final executable file. This way each individual source file have manageable size and only files changed need to be recompiled. But it comes at price of losing opportunities for code size and performance optimizations.

Opportunity missed

Here is small example that was inspired by GNU ed editor, version 1.4. Let us start with following "foo.c" file that defined static variable and two functions:

#ifndef MY_STATIC
#define MY_STATIC
#endif

static volatile int foo;

MY_STATIC int get_foo() {
	return foo;
}

MY_STATIC void set_foo(int value) {
	foo = value;
}

In GNU ed sources such functions are called in response to user input, but in our case we have to define variable as volatile to prevent compiler from realizing that all manipulations are completely pointless. Macro "MY_STATIC" will allow us to compare difference between static and non-static definitions.

Now, we create file "main.c" with following content:

int main()
{
	int x = get_foo();
	set_foo(x + 12);
	return get_foo();
}

Clearly this is equivalent to plain "return 12;", but since we marked "foo" variable as volatile, compiler is obliged to actually perform two reads and one write, no shortcuts.

And now three versions of putting these two parts together:

// split.c (link with foo.o)
extern int get_foo();
extern void set_foo(int);
#include "main.c"

// combined-nonstatic.c
#include "foo.c"
#include "main.c"

// combined-static.c
#define MY_STATIC static
#include "foo.c"
#include "main.c"

In case of "split.c", when "foo" and "main" are in different translation units, linker is not capable to inline access to "foo" and both "get_foo" and "set_foo" are included into resulting binary. Here is output of "objdump -d".

000000000040101f <main>:
  40101f:	50                   	push   %rax
  401020:	31 c0                	xor    %eax,%eax
  401022:	e8 10 00 00 00       	callq  401037 <get_foo>
  401027:	8d 78 0c             	lea    0xc(%rax),%edi
  40102a:	e8 0f 00 00 00       	callq  40103e <set_foo>
  40102f:	31 c0                	xor    %eax,%eax
  401031:	59                   	pop    %rcx
  401032:	e9 00 00 00 00       	jmpq   401037 <get_foo>

0000000000401037 <get_foo>:
  401037:	8b 05 c3 2f 00 00    	mov    0x2fc3(%rip),%eax        # 404000 <__bss_start>
  40103d:	c3                   	retq   

000000000040103e <set_foo>:
  40103e:	89 3d bc 2f 00 00    	mov    %edi,0x2fbc(%rip)        # 404000 <__bss_start>
  401044:	c3                   	retq   

In both static and non-static combined approaches, compiler inlined access to "foo", as can be seen in output of "objdump -d":

000000000040101f <main>:
  40101f:	83 05 da 2f 00 00 0c 	addl   $0xc,0x2fda(%rip)        # 404000 <__bss_start>
  401026:	8b 05 d4 2f 00 00    	mov    0x2fd4(%rip),%eax        # 404000 <__bss_start>
  40102c:	c3                   	retq   

When "get_foo" and "set_foo" are static, they are eliminated by compiler; when they are non-static they can be eliminated by linker if you provide necessary flags, it does not happen by default.

./2021-01-17.1.gmi

So, having everything in one translation unit reduces function "main" size from 8 instructions to 3, from 24 bytes to 14, plus eliminates two functions, 7 bytes each for total win of 24 bytes. For some reason, size(1) has different idea and reports difference of 80 bytes.

$ size -G combined-nonstatic combined-static split
   text	   data	    bss	    dec	    hex	filename
    855	      0	    100	    955	    3bb	combined-static
    855	      0	    100	    955	    3bb	combined-nonstatic
    935	      0	    100	   1035	    40b	split

I counted bytes in disassembly of whole .text section, difference is exactly 24 bytes. On other hand, total size of binaries differs by 80 bytes, so size(1) definitely has a point:

$ stat -c '%s %n' split combined-static combined-nonstatic
9208 split
9128 combined-static
9128 combined-nonstatic

These are sizes of static stripped binaries, compiled with dietlibc=0.34 and clang=11.1.0; your mileage may vary.

Opportunity recovered

Now that we know that putting all code into single translation unit can win us dozen bytes, let's think how we can achieve it. These are just ideas, I haven't implemented any of this yet.

One doesn't just concatenate all source files together and call it a day due following scenarios:

This is easy. We can safely rename every enumeration and every name withing it, and things will keep working, since enumerations are essentially integers.

This can be handled by automatic renaming static definitions into something unique. Building AST of pre-processred C file, finding static top-level definitions and renaming all references to it within same file should be reasonably easy.

This is hard to do cleanly. If we decide to work with pre-processed file, than we won't know whether definition is local to file and should be renamed or it came from header file and should not be renamed.

I see two mutually-exclusive approaches here. One is to rename every struct and union definition and rely on fact that C allows implicit cast between incompatible pointers. For example, following snippet will compile (with warning) and work.

struct foo { int x; int y; };
struct bar { int x; int y; };

void accept_foo(struct foo*);

int main()
{
	struct bar o = { 0 };
	accept_foo(&o);

	return 0;
}

Functions that accept structures by value won't work, though. It is uncommon, but happens. For example, GNU dbm does it.

Another approach is to build AST of concatenated source file and exclude multiple identical struct/union definitions. If multiple definition are not identical, human intervention would be required. It is quite uncommon to have multiple different struct or union definitions, so that approach may be viable.

Working with non-preprocessed source file would require re-implementing C preprocessor and still wouldn't handle situation when one definition of structure "foo" is shared between two source files and another definition of structure "foo" is shared between another two source files, so I consider this approach strictly inferior to ones described above.

Further research

Fully stripped static executable discussed in previous section has around 1Kb of code in text section but around 8Kb of disk size. Size of text section is not only size of "main" function, but also size of C runtime -- code that is run between "_start" and "main", so that number is understandable. File size 8 times bigger is quite an overhead, though.

https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html