💾 Archived View for tilde.pink › ~kaction › log › 2022-03-05.1.gmi captured on 2023-04-19 at 23:03:39. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

-=-=-=-=-=-=-

Linux capability is not a downscaled suid

In previous post I talked about assigning capabilities to executable file using native Linux interface, but after executable is run and process with capabilities is created, there are still some subtleties.

./2022-02-19.1.gmi

The most non-intuitive part capabilities are cleared on exec(2), unless you perform some steps to prevent it. I learned it hard way, when I created simple wrapper around busybox implementation of NTP client, put 'cap_sys_time=eip' on wrapper and I was still getting 'adjtimex: permission denied'.

To allow capabilities to propagate to child processes, one need to populate "ambient" set. As somebody aptly put it on the Internet, "ambient" set is what most people expect "inheritable" to be.

When process with "cap_sys_time=ep" starts, both inheritable and ambient sets are empty, and one has to perform several system calls to populate them.

First step is to make inheritable set the same as permitted, for which we use capget(2) and capset(2) system calls (error handling code is omitted for brevity).

#include <linux/capability.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <unistd.h>

// code
struct __user_cap_header_struct header = { 0 };
struct __user_cap_data_struct body[2];
header.version = _LINUX_CAPABILITY_VERSION_3;

syscall(SYS_capget, &header, &body);
body[0].inheritable = body[0].effective = body[0].permitted;
body[1].inheritable = body[1].effective = body[1].permitted;
syscall(SYS_capset, &header, &body);

Setting capabilities in ambient set is more involved: one prctl(2) syscall per capability. There is no way to configure ambient set as a whole.

for (unsigned i = 0; i <= 64; ++i) {
	int ix = i / 32;
	int shift = i % 32;
	if (body[ix].permitted & (1 << shift)) {
		prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0);
	}
}

And only after that, you can exec(2) and have child process to have escalated capabilities. Probably there are reasons why things are that complicated, but it is quite unfortunate that file-assigned capabilities behave so different from suid binaries.

Update(2022-03-07): After careful reading of capabilities(7) manual page I realized that all these complex transformation rules make sense, but only for applications that juggle their capabilities. Too invasive to get much of traction, if you ask me.