A Near Disaster Turns Into an Opportunity

TLDR: an unexpected gcc switch solves a dire problem and improves the system.

Last night I almost cried in dispair! As you, dear reader, know, I've been trying to harness the C compiler into generating code snippets that I subsequently suck into my non-C code repository.

And while I do not care to keep much of C linkage semantics on the grand scale -- I want to avoid unexpected surprises at the function level. And so I tried to do something simple - load the address of a function. After all, I have metadata for all the functions in the system. So, right from the command line I should be able to:

> return &printf;

This should obviously return the address of the function printf (or more accurately, its in-system-binding). Unfortunately, it returned some garbage. Checking the ELF file shows that it's loading a 64-bit value from somewhere.

   0:	48 8b 05 00 00 00 00 	mov    0x0(%rip),%rax

0000000000000003  000000040000002a R_X86_64_REX_GOTPCRELX 0000000000000000 printf - 4

Shite. Looking at the relocations, it looks like it expects the address of printf to be in a GOT - the Global Object Table - which I neither have, nor want to have.

Those of you who've seen the innards of ELF files know that 'modern' linkers create a jump table for all external functions, and another for data, and at load-time populate it with addresses of external objects. That works, but it's just dumb, and I am not doing it at all. And this relocation looks particularly nasty...

Another Idiocy To Deal With...

OK, that is unexpected. Because, the compiler should compile code as generically as it can, and let the linker do the linking. Who would expect the compiler to rely on a linkage table, for no good reason at all?

Why do I say no good reason? Because the compiler can generate a call to printf without the GOT or PLT, and the linker resolves it in-place. So if it can compile a pc-relative call to printf, why can't it just compile a pc-relative 'lea' (load effective address) of printf? That is what a sane person would expect.

Just to make sure I am not crazy -- who knows with Intel, perhaps the addressing mode does not exist or work, I assembled it in nasm - and it works:

0:   48 8D 05 00 00 00 00   	lea	rax,[rel .printf]	;

And so I cursed, slapped my head in frustration, ran a couple of miles, watched some Stargate Atlantis and went to sleep, in a foul mood.

What to do (upon waking up)?

Sleeping on unsolved problems is a tossup - sometimes I wake up with an epiphany. Not this morning - I woke up from a dream in which I was Cornholeo from Beavis and Butthead, and my arms were sore from sticking up over my head... A couple of hours for keeping my meatsack alive and functioning, and I am back in the hotseat.

So even though my standards of acceptable C code are very low, I fully expect to be able to load an address of a function. If it fails silently, I failed.

My options are:

Prohibit taking addresses using normal C syntax. That just sucks! No way.
Detect and mokey-patch the code from a mov to lea. Detection is easy, because the ELF ingestor is signaling a special PLT relocation. And it's just one byte to fix. However, that seems a bit too kludgy even for an experimental toy. I will keep it as a last resort option.
Figure out a way to get the compiler to output something more usable. Let's go with that.

No pie, or PIE... Please!

After some reading I realized that all my problems have to do with how 'modern' compilers generate position-independent code, pie. Or is it PIE. That is the thorn in my side. What if I turn it off? -fno-pie -fno-PIE. What does it do? The manual does not say. Let's compile a simple function and see:

  printf("%p\n",&printf);

   0:	be 00 00 00 00       	mov    $0x0,%esi              # load address of printf
   5:	bf 00 00 00 00       	mov    $0x0,%edi              # load string
   a:	31 c0                	xor    %eax,%eax
   c:	e9 00 00 00 00       	jmp    11 <command_line+0x11>

Relocation section '.rela.text' at offset 0x128 contains 3 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000001  000000040000000a R_X86_64_32            0000000000000000 printf + 0
0000000000000006  000000020000000a R_X86_64_32            0000000000000000 .rodata.str1.1 + 0
000000000000000d  0000000400000004 R_X86_64_PLT32         0000000000000000 printf - 4
N

Son of a bitch! This is exactly what I want -- actually, better than what I had up to now. (the PLT32 relocation is fine -- it does not require a PLT!) Compare it with what I generated yesterday:

  0:	48 8b 35 00 00 00 00 	mov    0x0(%rip),%rsi        # load printf entry in GOT (GOD, WHY!)
   7:	48 8d 3d 00 00 00 00 	lea    0x0(%rip),%rdi        # load address of string
   e:	31 c0                	xor    %eax,%eax             # someday I will learn why
  10:	e9 00 00 00 00       	jmp    15 <command_line+0x15>  # call printf

It's 4 bytes shorter, to start with... and look, it uses 32-bit registers! For everyone else, this sucks, but I could not ask for such luck. Because, as you may know, the entire system lives in the low 4GB, and is addressable with 32-bit absolute addresses. See

On Limits of Code and Data

Yes, the relocation R_X86_64_32 indicates a 32-bit absolute address. To everyone else, this makes the code non-relocatable, but I have a relocation engine that fixes absolute addresses. My system supports this exact relocation (I call it A32 for absolute-32-bit relocation). It was my original relocation type back in 68000 days. I was just lamenting that only my internal metadata uses this wonderfully compact 32-bit pointer format...

The call is still PC-relative. No problems, I can handle 3 kinds of relocations: A32, R32 (pc-relative 32-bit offset) and A64, 64-bit pointers.

Problem Solved

And so, I score another victory. Not only can I load addresses as expected, but the generated code will now be noticeably smaller and faster...

I will leave you with the disassembly of the Hello World function, which yesterday was 24 bytes, and today -- 22. And with absolute addresses, it is a little more readable (note how it loads the address of the string that follows, at 40000E4A):

int foo(){
    puts("Hello World");
}
0000000000000000 <foo>:
   0:	bf 00 00 00 00       	mov    $0x0,%edi
   5:	e9 00 00 00 00       	jmp    a <foo+0xa>

> cc
Ingested foo: extern int foo (void); 22 bytes
> hd 40000e40
0x40000e40 BF 4A 0E 00 40 E9 A6 F8 FF FF 48 65 6C 6C 6F 20 ..........Hello 
0x40000e50 57 6F 72 6C 64 00 00 00 00 00 00 00 00 00 00 00 World...........
> foo();
Hello World
>

Some days you just get lucky.

https://github.com/stacksmith/harn

index

home