I'm looking at the code GCC (Gnu's Not Unix Compiler Collection) [1] produced for the 32-bit system (I cut down the number of lines of code [2]):
804836b: 68 ac 8e 04 08 push 0x8048eac 8048370: e8 2b ff ff ff call 80482a0 <puts@plt> 8048375: 68 ac 8e 04 08 push 0x8048eac 804837a: e8 21 ff ff ff call 80482a0 <puts@plt> 804837f: 68 ac 8e 04 08 push 0x8048eac 8048384: e8 17 ff ff ff call 80482a0 <puts@plt> 8048389: 68 ac 8e 04 08 push 0x8048eac 804838e: e8 0d ff ff ff call 80482a0 <puts@plt> 8048393: 68 ac 8e 04 08 push 0x8048eac 8048398: e8 03 ff ff ff call 80482a0 <puts@plt> 804839d: 68 ac 8e 04 08 push 0x8048eac 80483a2: e8 f9 fe ff ff call 80482a0 <puts@plt> 80483a7: 68 ac 8e 04 08 push 0x8048eac 80483ac: e8 ef fe ff ff call 80482a0 <puts@plt> 80483b1: 68 ac 8e 04 08 push 0x8048eac 80483b6: e8 e5 fe ff ff call 80482a0 <puts@plt> 80483bb: 83 c4 20 add esp,0x20
My initial thought was Why doesn't GCC (Gnu's Not Unix Compiler Collection) just push the address once? but then I remembered that in C, function parameters can be modified. But that lead me down a slight rabbit hole in seeing if printf() (with my particular version of GCC) even changes the parameters. It turns out that no, they don't change (your mileage may vary though). So with that in mind, I wrote the following assembly code:
bits 32 global main extern printf section .rodata msg: db 'Hello, world!',10,0 section .text main: push msg call printf ;; 1,999,998 more calls to printf call printf pop eax xor eax,eax ret
Yes, I cheated a bit by not repeatedly pushing and popping the stack. But I was also interested in seeing how well nasm [3] fares compiling 1.2 million lines of code. Not too badly, compared to GCC:
[spc]lucy:/tmp>time nasm -f elf32 -o pg.o pg.a real 0m38.018s user 0m37.821s sys 0m0.199s [spc]lucy:/tmp>
I don't even need to generate a 17M (Megabyte) assembly file though, nasm can do the repetition for me:
bits 32 global main extern printf section .rodata msg: db 'Hello, world!',10,0 section .text main: push msg %rep 1200000 call printf %endrep pop eax xor eax,eax ret
It can skip reading 16,799,971 bytes and assemble the entire thing in 25 seconds:
[spc]lucy:/tmp>time nasm -f elf32 -o pf.o pf.a real 0m24.830s user 0m24.677s sys 0m0.144s [spc]lucy:/tmp>
Nice. But then I was curious about Lua. So I generated 1.2 million lines of Lua:
print("Hello, world!") -- 1,999,998 more calls to print() print("hello, world!")
And timed out long it took Lua to load (but not run) the 1.2 million lines of code:
[spc]lucy:/tmp>time lua zz.lua function: 0x9c36838 real 0m1.666s user 0m1.614s sys 0m0.053s [spc]lucy:/tmp>
Sweet!