💾 Archived View for thrig.me › blog › 2023 › 09 › 27 › pidnull.gmi captured on 2024-07-09 at 01:11:04. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-11-14)

-=-=-=-=-=-=-

pidnull

One mental model of a process on unix is something that can be forked: the result of executing a command like cat, ./pids, doas, chmod, or sed. This is not necessarily a bad model, though it can run into problems when process tools report Process Identifiers (PID) alien to that model.

    $ cat pids
    #!/usr/bin/env python3
    import psutil
    pids = psutil.pids()
    for i in pids:
         p = psutil.Process(i)
         with p.oneshot():
             print(str(i) + " " + p.name())
    $ doas pkg_add py3-psutil
    ...
    $ chmod +x pids
    $ ./pids | sed 4q
    0 swapper
    1 init
    6910 getty
    7941 cwm

What the heck is PID zero? Or, what is a PID?

The kernel has a structure or record for each process (struct kinfo_proc). This record contains details for each process running on a system: the process ID, who all owns the process, and other such metadata. Nothing prevents the kernel from putting kernel internal code onto the process list. There may be various reasons to do so: prioritization comes to mind so that userland and kernel code can be scheduled appropriately. Users of tools or abstractions built on top of the kernel interface therefore need to be aware that kernel internal processes can appear in listings. Buyer beware?

    $      ps xH | perl -lane 'print if $F[0] == 0'
      PID     TID TT  STAT        TIME COMMAND
    $ doas ps xH | awk '$1 == "PID"{print}$1 == 0{print}'
      PID     TID TT  STAT        TIME COMMAND
        0  100000 ??  DK       0:02.97 (swapper)

Low Level Code

A tool like ps(1) or library for a programming language may not give you the slice you want. An alternative is to use the kvm.h (or similar) interface. Advantages here are speed and getting exactly what you want; disadvantages include code that is not portable, and time wasted learning the low-level interface. Still, this may be less bad than debugging some scripting language library that maybe in turn calls ps(1), maybe does not get the information you need, and maybe could drain the battery too much.

One method is to look at the source code for ps(1) and adapt that to your needs, as well other tools that use the kernel interface, notably the source code for sysctl(8), if you need access to ACPI or other such kernel or system information. For example the j.c in the following repository use a low-level interface to find the TTY associated with various PID so that vi(1) or other processes can be associated with the tmux terminal they are running in, in the event you forgot where you put some vi(1) instance in all your tmux windows.

https://thrig.me/src/scripts.git

Race Conditions

Another fun problem is that the list of processes changes in the kernel over time. This can lead to various race conditions or possibly security issues, like say if a program can fork itself when it notices someone starting a process listing iteration so that the new PID is missed by that process listing. Or maybe a process can play games to limit or conceal heavy resource use: use lots of CPU, but to not surface that usage in top(1). Various Common Vulnerabilities and Exposures (CVE) entries can probably be found for such.

tags #openbsd #process #unix