💾 Archived View for gemini.kaction.cc › log › 2022-08-06.2.gmi captured on 2024-12-17 at 10:12:35. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-02-05)

-=-=-=-=-=-=-

Deeps of C runtime

Recently I have been reading x86-64 SysV ABI documentation, and I found curious paragraph there:

When "main" returns its value is passed to "exit" and if that has been
over-ridden and returns, _exit (which must be immune to user interposition).

Okay, standard says that user should be able to override "exit" function (odd, but easily done with weak symbols) and this function may be overridden in brain-damaged way, C runtime must deal with it.

What is means that every program that uses standard C runtime, that more likely than not won't override "exit" function pays the price. Speaking x64 assembler, it will look like following in C runtime:

_start:
	...
	call main
	push rax  ; bogus

	mv rdi, rax
	call exit ; bogus

	pop rdi   ; bogus
	call _exit

Three lines marked as "bogus" are useless 99.99% of time, and yet if you want to be standard-compliant, you have to have them. There are a lot of gems like that.

Errno variable stores negative return value of system call in global variable instead of returning directly. There is no benefit in doing it, but it adds extra code and it was one of the reason of creating abomination called "thread-local storage".

Minimal C runtime (not to be confused with C library) looks like following:

_start:
	mov rdi, [rsp]
	lea rsi, [rsp + 8]
	lea rdx, [rsp + 8 * rdi + 16]
	call main

	mov rdi, rax
	mov rax, 60 ; sys_exit
	syscall

You need only that much to go from "_start" entrypoint to familiar world of C functions. (Okay, you need dozen more instructions to pass pointer to aux vector to "main" function.)

int main(int argc, char **argv, char **envp);

And yet every C program includes a lot more, whether it uses it or not, because standards say so. Welcome to the wonderful world of historical baggage.

I have to admit, "a lot" is probably exaggeration. Initialization of C runtime takes negligible time compared to time it takes kernel to create new process or, for what it worth, for initializing runtime of any other language, yet purist inside me is unhappy. When I program in Python, I know what I am doing; when I program in C, I expect full control.

Anyway, that was just curious observation. If programs break when ported between different C libraries or even between different versions of the same library because they violate the standard, what hope we have to make backward-incompatible change to the standard?

https://xkcd.com/1172

https://bugzilla.redhat.com/show_bug.cgi?id=638477