How to set Linux capability and not allocate

Unix systems has concept of suid binaries since the beginning of time. Idea is that no matter what user runs executable, process gets effective uid of executable owner, often root. This is usually used to allow non-privileged users to perform privileged operations in controlled manner.

Binaries with suid flag run in very dangerous environment, with file descriptors, environment variables, current directory, argv array and other things under control of potentially malicious user, and should that user fool executable into doing something it is not supposed to do, that something will be done with superuser privileges.

To limit potential exposure, concept of capabilities(7) was introduced. Instead of marking binary as suid, granting it full power of superuser, it is only granted capabilities it needs to do its job. For example, instead of running NTP client as root, we can run it as unprivileged user, but grant CAP_SYS_TIME on the binary, so should attacker take over the binary he will have only limited power.

On command line we can set capability with setcap(8) tool, that is usually provided as part of "libcap" package with following funny syntax:

https://git.kernel.org/pub/scm/linux/kernel/git/morgan/libcap.git/

$ setcap 'all= cap_sys_time=ep' ./a.out

You can read more about effective, permitted and inheritable sets in capabilities(7), but idea is that "all=" part clear (lowers) all capabilities and "cap_sys_time=ep" grants (raises) cap_sys_time capability.

Now, what if I want to set capability on a file programmatically? The most natural idea -- reading documentation documentation of libcap, leads to discovery of following functions:

int cap_set_file(const char *path_p, cap_t cap_p);
int cap_set_flag(cap_t cap_p, cap_flag_t flag, int ncap, const cap_value_t *caps, cap_flag_value_t value);
cap_t cap_init(void);
int cap_free(void *obj_d);
cap_t cap_from_text(const char* buf_p );

Quite a bit for a simple task of setting bitmask. Furthermore, "cap_t" type is opaque, the only way to create it involves memory allocation. No, no, no, this is madness, there must be another way.

If we dig deeper, closer to the kerner, we learn that capabilities are stored as fixed-size structure in "security.capability" extended attribute with following definition (see "include/uapi/linux/capability.h") in in linux kernel tree:

#define VFS_CAP_U32 2
#define VFS_CAP_REVISION_2 0x02000000
#define VFS_CAP_FLAGS_EFFECTIVE 0x000001
struct vfs_cap_data {
	__le32 magic_etc;            /* Little endian */
	struct {
		__le32 permitted;    /* Little endian */
		__le32 inheritable;  /* Little endian */
	} data[VFS_CAP_U32];
};

Knowing that, we can set file capabilities using much simpler interface. And since we are using kernel interface, we can be sure that it is rock-solid.

#include <linux/capability.h>
#include <sys/xattr.h>
#include <endian.h>
#include <stdio.h>

int main() {
	struct vfs_cap_data conf = { 0 };
	int err;
	conf.magic_etc = htole32(VFS_CAP_REVISION_2 | VFS_CAP_FLAGS_EFFECTIVE);
	conf.data[0].permitted = htole32(1 << CAP_SYS_TIME);

	err = setxattr("test.txt", "security.capability", &conf, sizeof conf, 0);
	if (err) {
		perror("setxattr failed");
		return 1;
	}
	return 0;
}

By eliminating abstraction layer and digging straight to binary format, we found efficient and elegant solution. Reason why libcap provides such poor interface can be found too in capabilities(7):

No standards govern capabilities, but the Linux capability
implementation is based on the withdrawn POSIX.1e draft standard; see
⟨https://archive.org/details/posix_1003.1e-990310⟩.

And indeed, on page 196 there is declaration of "cap_init" function:

cap_t cap_init (void);

As I discussed in another post, such initialization style forces sub-optimal implementations.

A case against dynamic linking

In general, I agree that standards, portability and inter-operability are good thing, and I think capabilities are great idea, but this is yet another case when I have to choose between inefficient, roundabout POSIX interface and efficient, to-the-point Linux interface.

https://drewdevault.com/2017/11/13/Portability-matters.html

And hey, designing decent interface is not hard. Extended attributes are definitely implementation detail of capabilities, but capabilities are inherently isomorphic to bit mask, so we could had following interface instead:

#define CAP_FOO   0  /* set 0th bit in 0th byte */
#define CAP_BAR   1
...
#define CAP_BAR  63 /* set 7th bit in 7th byte */
#define CAP_FROB 64 /* set 0th bit in 8th byte */
...
#define CAP_MAD  99

int capability_fd_set(int fd, const uint8_t *p, size_t plen,
                              const uint8_t *e, size_t elen,
                              const uint8_t *i, size_t ilen);
int capability_fd_get(int fd, uint8_t *p, size_t plen,
                              uint8_t *e, size_t elen,
                              uint8_t *i, size_t ilen);

This way we expose nothing about how exactly capabilities are stored, have no arbitrary limits on number of capabilities and preserve source and binary compatibility with existing applications when new capabilities are added.

Why people who write all these nice standards papers can't think about implementations?!