💾 Archived View for thrig.me › blog › 2023 › 09 › 28 › busy-stuck-process.gmi captured on 2023-12-28 at 15:50:43. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-11-14)

-=-=-=-=-=-=-

Can a stuck process use large amounts of CPU?

Obviously, yes, if I'm asking the question. The typical answer is "no" because if a program is stuck it cannot be doing anything, because it is stuck. Logic! So let's first devise a program that "gets stuck". One way is to block on a pipe read. There are other ways this can happen.

    #include <err.h>
    #include <stdio.h>
    #include <unistd.h>

    int
    main(void)
    {
        char buf[1];
        int fds[2];
        if (pipe(fds) != 0) err(1, "pipe");
        while (1) {
            read(fds[0], buf, 1);
            fprintf(stderr, "not stuck\n");
        }
    }

stuck.c

    $ make stuck && ./stuck
    cc -O2 -pipe    -o stuck stuck.c
    ...

Elsewhere, top(1) might report something along the lines of:

    $ top -d 1 all| awk '/PID/{print}/stuck/{print}'
      PID USERNAME PRI NICE SIZE  RES STATE   WAIT TIME   CPU COMMAND
    38172 jmates   -6  0    172K 856K idle  piperd 0:00 0.00% stuck

Is this process stuck stuck? WAIT is on "piperd", and it's idle. It's not getting through the while loop, that's for sure (or the "printf debugging" to stderr is broken, which is unlikely but not impossible, but easy to test by removing the blocking read). Not getting through the main loop could be considered stuck. Now we need a process that does work, while also being stuck.

stuckbusy

stuckbusy.c

    $ CFLAGS=-pthread make stuckbusy && ./stuckbusy
    cc -pthread   -o stuckbusy stuckbusy.c
    ...

So this is a strange process according to top(1); it is stuck in wait on "piperd", will (usually) show as being in the "idle" state, but will (sometimes) show CPU usage, and will accumulate CPU time as it runs. (Other process tools on other unix may show different things.) The columns have been collapsed a bit so that they better fit in an 80-column display.

    $ top -d 1 all| sed -n '6p;6q'
      PID USERNAME PRI NICE SIZE RES STATE  WAIT    TIME  CPU   COMMAND
    $ for i in `jot 4`; do top -d 1 all| grep stuck; sleep 1; done
    45647 jmates  -6  0 2604K 1532K idle    piperd  0:02  0.05% stuckbusy
    45647 jmates  -6  0 2604K 1532K idle    piperd  0:02  0.05% stuckbusy
    45647 jmates  -6  0 2604K 1532K idle    piperd  0:02  0.00% stuckbusy
    45647 jmates  -6  0 2604K 1532K idle    piperd  0:02  0.10% stuckbusy

Threads obviously complicate matters, as one part of a program might be stuck or wedged while other parts continue to work. top(1) and other such tools generally report on wafer-thin snapshots of the process state, so can show contradictory or confusing information for such processes. More detail may be obtained process tracing (ktrace, strace, etc) which can be hilariously verbose, but at least will indicate what system calls are being made. Or you may need other forms of debugging, such as taking thread dumps in Java, etc.

    $ kdump -f ktrace.out | sed 22q
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy CALL  nanosleep(0x2b873a32018,0)
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy CALL  nanosleep(0x2b841317a78,0)
      45647 stuckbusy CALL  nanosleep(0x2b8a49f8868,0)
      45647 stuckbusy CALL  nanosleep(0x2b917a8ed98,0)
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy RET   nanosleep 0
      45647 stuckbusy CALL  nanosleep(0x2b873a32018,0)
      45647 stuckbusy CALL  nanosleep(0x2b8a49f8868,0)
      45647 stuckbusy CALL  nanosleep(0x2b917a8ed98,0)
      45647 stuckbusy CALL  nanosleep(0x2b841317a78,0)
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      45647 stuckbusy STRU  struct timespec { 0.000099999 }
      ...

Wacky states like this do occur in production, where there may be a daemon that accepts connections but does not get any work done, or one that uses blistering amounts of CPU but is stuck according to some metric. Debugging may take longer if folks roll to disbelieve the metrics—"a stuck process that uses CPU? that's unpossible!"—instead of digging in to figure out what exactly is going on.

So can a stuck process use large amounts of CPU?

stuckverybusy.c

      PID USERNAME PRI NICE SIZE   RES STATE    WAIT    TIME    CPU COMMAND
    84474 jmates    10    0  17M 1740K onproc/9 nanoslp 2:25 519.58% stuckveryb

Wait says that it is sleeping, but it is also using a bit of CPU. Other unix might be interesting to investigate to see how their process tools report such wacky conditions. Or what other strange process status states can you create?