💾 Archived View for gemini.kaction.cc › log › 2021-07-05.1.gmi captured on 2024-09-28 at 23:51:46. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-02-05)
-=-=-=-=-=-=-
For a long time, I took for granted that POSIX is super-important since it allows my programs to run on different operating systems and hardware configurations. Probably this article by Drew DeVault summarizes this pro-POSIX point of view perfectly.
https://drewdevault.com/2017/11/13/Portability-matters.html
A week ago, I got reason to question this belief.
I was reading the documentation for the s6 supervision suite, trying to understand whether I like it more than runit I use right now and found a statement that some particular program does not use dynamic memory with note in fine print that it uses the readdir(3) function, which does use dynamic memory.
https://skarnet.org/software/s6/
I did some quick checks with strace(1) and verified that this is true: readdir(3) function results in brk(2) syscalls. If we look at the function interface, it becomes quite clear that there is no way to implement this interface without dynamic memory and arbitrary limits:
struct DIR; /* opaque */ DIR *opendir(const char *name); struct dirent *readdir(DIR *dirp); int closedir(DIR *dirp);
Unfortunately, POSIX provides no better interface. But can we do better if we abandon POSIX and go straight to Linux syscalls.
Linux exposes the content of a directory with getdents64(2) syscall (there are also readdir(2) and getdents(2), but they are strictly worse). This syscall expects the user to provide a buffer and populates it with as many structures, representing entries in the directory that can fit into the provided buffer. That interface allows the user can balance memory usage versus the number of system calls depending on his needs.
For example, ls(1) is very slow in huge directories not because inherit limitations of the kernel, but because readdir(3) uses hard-coded size buffer for getdents(2) calls and can do no better constrained to POSIX interface.
Here is my take on how to design a better interface for enumerating content of directory (full source is not published yet):
/* User is not supposed to touch fields with double underscores */ struct hpl_direntry { // 64-bit inode number ino64_t inode; // 64-bit offset to next structure off64_t __off; // Size of this dirent unsigned short __reclen; // File type unsigned char type; // Filename (null-terminated) char name[]; }; struct hpl_dir { int fd; void *buf; size_t bufsize; /* Bookkeeping stuff is omited. Initialize it to zeros */ }; /* Return NULL on the end of the directory or when the error happened. * Check *errp to distinguish. * * If *errp = EINVAL, it means that buffer is too small for some entry * to fit. Assign bigger buffer to {buf} and {bufsize} fields, and try * again. */ struct hpl_direntry *hpl_readdir(struct hpl_dir *d, int *errp);
With this interface, the user is in charge of memory allocations and file descriptors, so only one function is needed, down from three. I think my interface unequivocally closer to the lodestar of minimalistic C interfaces.
https://nullprogram.com/blog/2018/06/10/
Sticking to POSIX means your program will run on different software and hardware platforms at the price of sub-optimal interfaces that induce sub-optimal implementation. Does it worth it for you?
Not for me. I have never had access to a real computer (mobile phones do not count) with architecture different from x86 or x86_64, and I have never used any POSIX-compatible operating system for real work other than Linux.
I do not expect this to change soon, and I do not want it to change. While I hate what happens to GNU/Linux ecosystem right now -- systemd, cater-to-idiots philosophy, web interfaces everywhere -- I am satisfied with Linux proper and my x86_64 system.