💾 Archived View for thrig.me › tech › exit-status-word.gmi captured on 2024-09-29 at 01:02:24. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-04-19)

-=-=-=-=-=-=-

Exit Status Word

The exit code of a unix process is held in the shell $? variable;

    $ false ; echo $?
    1
    $ true  ; echo $?
    0

this may lead some to claim that Perl is weird because it needs a >> 8 shift applied to obtain the same value as the shell:

    $ perl -E 'qx(false); say $? >> 8'
    1
    $ perl -E 'qx(true);  say $? >> 8'
    0

Actually, the >> 8 shift is perfectly normal; it is the shell that is weird for mangling the exit status word into a different form. The main confusion here is that Perl uses the same variable name as the shell uses; unlike the shell that variable contains the exit status word, and not the shell mangled value.

What is the exit status word? Simply put it is a 16-bit value, something like JSON but scaled back to systems in the 1970s where a 16-bit register was a rare commodity, especially after something complicated like fork(2). Within this 16-bit value various bits or ranges of bits carry different meanings, which can be checked by a shift or a mask of the bit pattern. Often there will be macros or such to abstract away the details.

The >> 8 is simply a right shift by eight bits, which means the exit code occupies the upper eight bits of the exit status word:

    $ perl -e 'qx(false); printf "word %016b\n>> 8 %016b\n", $?, $? >> 8'
    word 0000000100000000
    >> 8 0000000000000001

This may be easier to see if a particular pattern is put into the upper bits and shifted over:

    $ perl -e '$n=0b1010101011111111; printf "%016b\n%016b\n", $n, $n >> 8'
    1010101011111111
    0000000010101010
    $ perl -e 'exit 0b10101010' ; echo $?
    170
    $ perl -e 'printf "%08b\n", 170'
    10101010

The 16-bit exit status word can be defined as:

         1         0
    5432109876543210
    eeeeeeee????????

where the eeeeeeee get filled in with the exit code. The ? are as yet unknown. Here's some C code that does much the same as the Perl, only with more characters:

status-word.c

    $ make status-word
    cc -O2 -pipe    -o status-word status-word.c
    $ ./status-word
    256 1
    $ perl -e 'printf "%016b\n%016b\n", 256, 1'
    0000000100000000
    0000000000000001

For the ? that remain we now devise test programs that exit in novel ways; what happens if we signal a process?

    $ perl -e 'kill SIGTERM => $' ; echo $?
    Terminated
    143
    $ perl -e 'printf "%08b\n", 143'
    10001111
    $ perl -E 'say 0b10000000'
    128
    $ perl -E 'say 0b1111'
    15

The shell shows 143 for a SIGTERM. A big clue is to know the number associated with a SIGTERM:

    $ kill -l | grep TERM
    15   TERM Terminated                    31   USR2 User defined signal 2
    $ grep SIGTERM /usr/include/signal.h
    $ grep SIGTERM /usr/include/sys/signal.h
    #define SIGTERM 15      /* software termination signal from kill */

From this we might guess that the shell has mangled the signal into 128 plus the signal number. But what did the exit status word contain before the shell mangled it?

    $ perl -e 'qx(perl -E "kill SIGTERM => \$\$"); printf "%016b\n", $?'
    1000111100000000

Whoops, that is a terrible test. qx/STRING/ (or `STRING`) in perl runs the command through a shell, so that's the same shell mangled value as seen in our prior test.

signally

Here a process forks, the parent waits and prints the binary form of the exit status work, and the child hits itself with SIGTERM. No /bin/sh involved.

    $ ./signally
    5432109876543210
    0000000000001111

A signal sets some number of the lower bits, as opposed to the upper eight for a non-signal exit. How many bits are reserved for signals might be good to know; the macros are probably listed in wait(2) and can be looked up from the standard include files.

    WIFSIGNALED(status)
            True if the process terminated due to receipt of a signal.
    ...
    WTERMSIG(status)
            If WIFSIGNALED(status) is true, evaluates to the number of the
            signal that caused the termination of the process.
    $ grep -rl WTERMSIG /usr/include
    /usr/include/sys/wait.h
    $ grep WTERMSIG /usr/include/sys/wait.h
    #define WTERMSIG(x)     (_WSTATUS(x))
    $ grep _WSTATUS /usr/include/sys/wait.h
    #define _WSTATUS(x)     ((x) & 0177)
    #define _WSTOPPED       0177            /* _WSTATUS if process is stopped */
    #define WIFSIGNALED(x)  (_WSTATUS(x) != _WSTOPPED && _WSTATUS(x) != 0)
    #define WTERMSIG(x)     (_WSTATUS(x))
    #define WIFEXITED(x)    (_WSTATUS(x) == 0)

For those bad at math (myself included) the ((x) & 0127) works out to a mask using the lowest

    $ perl -e 'printf "%016b\n", 0177'
    0000000001111111

seven bits, which means the 16-bit exit status word can be better defined as:

         1         0
    5432109876543210
    eeeeeeee?sssssss

where the "e" are for the (non-signal) exit code, if that exists, and "s" hold the signal number, if any. There remains one ? mystery bit unaccounted for. This is the coredump flag, so we must trigger one of those.

    $ perl -e CORE::dump
    Abort trap (core dumped)
    $ echo $?
    134
    $ dc
    134 128 -pq
    6
    $ kill -l | grep 6 | sed 1q
     6   ABRT Abort trap                    22   TTOU Stopped (tty output)

The shell may be mangling this value, so we also should inspect the actual exit status word:

    $ perl -e 'fork ? do { wait; printf "%016b\n", $? } : CORE::dump'
    0000000010000110
    $ rm perl.core

The mystery bit is therefore a boolean that indicates whether the process exited with a coredump, along with in this case the signal involved. Therefore the complete definition for the exit status word is 8 bits for the exit status, if any, a boolean flag that indicates whether a coredump happened, and 7 bits for the signal number, if any.

         1         0
    5432109876543210
    eeeeeeeeCsssssss

But What Good Is All This For?

Good question! Many programs can get away with 0 == OK, otherwise NOT OK checks. Other programs (monitoring and unit tests come to mind) will want specific details on exactly how a program exits: we expect test N to coredump, so we check for the coredump flag. Hence the Test::UnixExit perl module. Some applications respond in specific ways to specific exit status codes, for example an EX_TEMPFAIL from a Mail Delivery Agent (mail.local, procmail) will typically cause a Mail Transport Agent (Sendmail, Postfix) to queue the mail message for redelivery instead of bouncing it.

    $ grep TEMP /usr/include/sysexits.h
     *      EX_TEMPFAIL -- temporary failure, indicating something that
    #define EX_TEMPFAIL     75      /* temp failure; user is invited to retry */

Back to tech index

tags #unix #sh #perl