💾 Archived View for thrig.me › blog › 2023 › 01 › 22 › unix-is-spawning-programs-weirdly.gmi captured on 2023-09-28 at 16:29:44. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-04-19)

➡️ Next capture (2023-11-14)

-=-=-=-=-=-=-

Unix Is Spawning Programs Weirdly

no really unix is weird. it's like some hackers sat down on a PDP-7

and then history happened

gemini://blog.schmidhuberj.de/2023/01/21/lisp-is-spawning-programs-weirdly/

posits problems running programs via SBCL, in particular complicated programs that both interact with the standard I/O streams and also require a controlling terminal.

Let's explore this.

Newline Ultimatum

First up, on unix, always end every line with a newline. For example, you may have the following interface, which appears to work,

    $ cat runecho.lisp
    (require :uiop)
    (format t ">>>~a<<<~&"
            (with-output-to-string (stream)
              (uiop:run-program '("./echo.pl")
                                :output stream :input '("coi"))))
    $ cat echo.pl
    #!/usr/bin/env perl
    # an inefficient cat(1)
    print while readline
    $ sbcl --script runecho.lisp
    >>>coi<<<

but fails when echo.pl is replaced with the buggy shell script echo.sh

    $ cat echo.sh
    #!/bin/sh
    # an extremely inefficient and buggy cat(1)
    while read line; do printf '%s\n' "$line"; done
    $ cat runecho.lisp
    (require :uiop)
    (format t ">>>~a<<<~&"
            (with-output-to-string (stream)
              (uiop:run-program '("./echo.sh")
                                :output stream :input '("coi"))))
    $ sbcl --script runecho.lisp
    >>><<<

you now have silent data loss due to the missing ultimate newline. Some people will argue that shell scripts must not be used for tricky things like looping over a list of unknown inputs, and also that shell while loops are terribly slow, especially when corrected to account for invalid lines, but shell scripts will continue to be written and used where they are not a good fit, so always ultimate newline your lines on unix.

    (require :uiop)
    (format t ">>>~a<<<~&"
            (with-output-to-string (stream)
              (uiop:run-program '("./echo.sh")
                                :output stream
                                :input (list #.(format nil "coi~%")))))

Spawning Programs Weirdly

uiop:run-program is a pretty high level interface--who knows what system calls it makes behind the scenes. One could study the SBCL code to see exactly what it is doing (or whatever's implementation of Common LISP), or admit that Common LISP may not be a good systems programming language, where systems programming means interacting here with very specific unix system calls and functions related to I/O and the terminal and process management. Non-portably one might use specific calls in SBCL (or whatever) to perform the necessary pipe, fork, exec, dup, avoiding buffering hell, portability problems, etc. involved with something "simple" like

    printf 'test\n' | someinteractiveprogram

First we will need someinteractiveprogram, ideally one that is simple and we have the source code for. Let's start with yet another bad implementation of cat(1) this time involving curses; our program reads a single (valid?) line, displays that line, then waits for a keypress before exiting.

    #!/usr/bin/env perl
    use Curses;
    chomp( my $line = readline );
    initscr;
    move 1, 1;
    addstring $line;
    getch;
    endwin;

A first attempt:

    // interact - reads a single line, displays it, awaits a key, exits
    
    #include <err.h>
    #include <ncurses.h>
    #include <stdio.h>
    
    int
    main(int argc, char *argv[])
    {
        char *line      = NULL;
        size_t linesize = 0;
        ssize_t linelen = getline(&line, &linesize, stdin);
        if (linelen < 0) {
            if (ferror(stdin))
                // e.g. `./interact <&-` to close stdin
                err(1, "getline failed");
            else
                // nothing was printed, e.g. `printf '' | ./interact`
                errx(1, "unexpected EOF");
        }
        // TWEAK one might instead replace the \n with \0 to accept
        // non-compliant input lines
        if (line[linelen - 1] != '\n')
            errx(1, "invalid line");
        else
            line[linelen - 1] = '\0';
        initscr();
        move(1, 1);
        printw("%s", line);
        getch();
        endwin();
    }

And a Makefile, because OpenBSD 7.2 ships with make(1). Other unix systems may vary as to what they provide, how to compile with curses, and exactly how this code behaves. The shell here is ksh(1) as shipped with OpenBSD, by the way, and all the man page references are to OpenBSD documentation.

    CFLAGS=-O2 -std=c99 -fno-diagnostics-color -fstack-protector-strong\
        -pipe -pedantic -Wall -Wextra\
        -Wcast-qual -Wconversion -Wformat-security -Wformat=2\
        -Wno-unused-function -Wno-unused-parameter -Wnull-dereference\
        -Wpointer-arith -Wshadow -Wstack-protector -Wstrict-overflow=3
    # TWEAK this may be necessary on some operating systems
    #CFLAGS+=`pkg-config --libs --cflags ncurses`
    CFLAGS+=-lncurses
    interact: interact.c

If built and run in a shell without input the program will wedge, waiting for something on standard input, where ■ represents the cursor:

    $ make interact
    cc -O2 -std=c99 -fno-diagnostics-color -fstack-protector-strong -pipe -pedantic -Wall -Wextra -Wcast-qual -Wconversion -Wformat-security -Wformat=2 -Wno-unused-function -Wno-unused-parameter -Wnull-dereference -Wpointer-arith -Wshadow -Wstack-protector -Wstrict-overflow=3 -lncurses   -o interact interact.c
    $ ./interact
    ■

Should a line be typed and entered, Curses will do the right thing, or at least it does for me. It may not work if the terminal is in a bad state; reset(1) is a thing for when you've been messing around with programs that mess around with the terminal. Meanwhile, the following fails; it does not wait for a keystroke before exiting:

    $ printf foo'\n' | ./interact

A reasonable hypothesis here is that since standard input is a pipe wired up by the shell from printf to ./interact, either the Curses calls failed because that pipe is in no way connected to a terminal, or there's nothing for getch to read from. Note that we did not need LISP to make the program run weirdly. It may be that both initscr(3x) and getch(3x) failed due to the pipe or some other unexpected condition--perhaps the terminal incorrectly defaults to a non-blocking, non-ICANON mode.

    $ grep getch interact.c
            if (getch() == ERR) warn("getch failed");

But that's not important--our program does not do what we want it to, regardless of how, where, and why it is broken ("the parable of the arrow" may apply).

Interact Version Two

Since standard input has been read from, and is a pipe not connected to a terminal, what we want is some way to connect our program with a terminal. This may not be possible, e.g. after setsid(2) and a "double fork" but here we will assume that some terminal is available if only we could get to it.

At least two methods are possible; the newterm(3x) call could be used instead of initscr(3x), or we could close standard input, open the terminal, and then call initscr as per usual. open(2) uses the lowest available file descriptor number, and if we close standard input, the terminal should be where initscr(3x) wants it to be.

    // interact - reads a single line, displays it, awaits a key, exits
    #include <err.h>
    #include <fcntl.h>
    #include <ncurses.h>
    #include <stdio.h>
    #include <unistd.h>
    int main(int argc, char *argv[]) {
        char *line      = NULL;
        size_t linesize = 0;
        ssize_t linelen = getline(&line, &linesize, stdin);
        if (linelen < 0) err(1, "getline failed");
    
        if (close(STDIN_FILENO) == -1) err(1, "close failed??");
        if (open("/dev/tty", O_RDWR) < 0) err(1, "open /dev/tty failed");
    
        initscr();
        move(1, 1);
        printw("%s", line);
        getch();
        endwin();
    }

This version operates correctly when invoked from the shell, for me:

    $ make interact && printf foo'\n' | ./interact
    ...

But What About Common LISP?

Glad you asked! We first needed to lay some metaphorical groundwork with background material and a test program, which we can now use under Common LISP.

    $ cat runecho.lisp
    (require :uiop)
    (format t ">>>~a<<<~&"
            (with-output-to-string (stream)
              (uiop:run-program '("./interact")
                                :output stream
                                :input (list #.(format nil "coi~%")))))
    $ sbcl --script runecho.lisp
    Unhandled UIOP/RUN-PROGRAM:SUBPROCESS-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING
                                                             {1001990003}>:
      Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {10021E2FD3}>
     with command ("./interact")
     exited with error code 1

    Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1001990003}>
    0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1002482C13}> #<unused argument> :QUIT T)
    1: (SB-DEBUG::RUN-HOOK *INVOKE-DEBUGGER-HOOK* #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1002482C13}>)
    2: (INVOKE-DEBUGGER #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1002482C13}>)
    3: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND ("./interact") :CODE 1 :PROCESS #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {10021E2FD3}>)
    4: (UIOP/RUN-PROGRAM::%CHECK-RESULT 1 :COMMAND ("./interact") :PROCESS #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {10021E2FD3}> :IGNORE-ERROR-STATUS NIL)
    5: (UIOP/RUN-PROGRAM::%USE-LAUNCH-PROGRAM ("./interact") :OUTPUT #<SB-IMPL::STRING-OUTPUT-STREAM {26041EE13}> :INPUT ("coi
    ...

Mercifully omitted here is most of the too long and too large stack trace generated by SBCL. The error could come from any of several calls that exit with code 1 in ./interact, and since running ./interact under a debugger under SBCL is not something I want to grapple with, I'll instead use "print debugging" and change the err(3) exit codes in interact.c to be unique for each call.

    ...
      Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {10021E2FD3}> with command ("./interact")
     exited with error code 4
    ...
    $ fgrep 4 interact.c
            if (open("/dev/tty", O_RDWR) < 0) err(4, "open /dev/tty failed");

So, ./interact was not able to open /dev/tty. The standard error has helpfully been omitted, but instead of a debugger we can simply call an exec wrapper script that logs stderr somewhere, and replace the exec wrapper with the real program:

    #!/bin/sh
    exec 2>err
    exec ./interact

Some quick edits later, we obtain an error message (and an errno).

    $ grep -1 tty interact.c
            if (close(STDIN_FILENO) == -1) err(3, "close failed??");
            if (open("/dev/tty", O_RDWR) < 0)
                    err(4, "open /dev/tty failed (%d)", errno);

    $ cat wrapper
    #!/bin/sh
    exec 2>err
    exec ./interact
    $ grep wrapper runecho.lisp
              (uiop:run-program '("./wrapper")
    $ sbcl --script runecho.lisp 2>/dev/null
    $ cat err
    interact: open /dev/tty failed (6): Device not configured
    $ doas pkg_add moreutils
    ...
    $ errno 6
    ENXIO 6 Device not configured

The documentation is not very enlightening to me, perhaps SBCL or uiop:run-program has done something like a "double fork" to make the device behind /dev/tty not available?

    $ man 2 open | fgrep -2 ENXIO
         [ENFILE] The system file table is full.

         [ENXIO]  The named file is a character special or block special
                  file, and the device associated with this special file
                  does not exist.
    --

         [ENXIO]  The named file is a FIFO, the O_NONBLOCK and O_WRONLY
                  flags are set, and no process has the file open for
                  reading.

Here we reach a branching point; we could read up on the documentation for uiop:run-program--can we make it not block access to /dev/tty?--or we can dig around in the SBCL manual to see if we can obtain direct access to fork, pipe, dup, and all those necessary calls and make them directly. Or, we could dig through the source code for uiop:run-program and try to figure out why opening /dev/tty might error out. Yet another way would be to study the code behind open(2) and figure out the reasons ENXIO could happen and maybe that would help us figure out what SBCL or UIOP did to get us into this mess. If I had to guess, it's because we cannot create or access a controlling terminal.

Another approach is to try to replicate the error with a simple program, perhaps after looking at what calls SBCL makes under ktrace(1):

     29574 sbcl     CALL  pipe(0x21722fff8)
     29574 sbcl     STRU  int [2] { 5, 6 }

     29574 sbcl     CALL  pipe(0x21722fff8)
     29574 sbcl     STRU  int [2] { 7, 8 }

     29574 sbcl     CALL  pipe(0x21722fff8)
     29574 sbcl     STRU  int [2] { 10, 11 }

     29574 sbcl     CALL  fork()
     29574 sbcl     RET   fork 87734/0x156b6

     29574 sbcl     CALL  close(11)

     87734 sbcl     CALL  close(10)

     87734 sbcl     CALL  setsid()

     87734 sbcl     CALL  dup2(5,0)
     87734 sbcl     RET   dup2 0
     87734 sbcl     CALL  dup2(8,1)
     87734 sbcl     RET   dup2 1
     87734 sbcl     CALL  dup2(9,2)
     87734 sbcl     RET   dup2 2
     87734 sbcl     CALL  dup2(11,3)
     87734 sbcl     RET   dup2 3

     87734 sbcl     CALL  execve(0x10021e3100,0x250f4e008,0x7f7ffffd45c8)
     87734 sbcl     NAMI  "./interact"

I wonder if the setsid(2) causes any grief. Let's replicate something like these calls in C.

    // runner - runs a program something like how SBCL does it, but does not
    // wire up a pipe for stderr like SBCL does as that would be yet more
    // boilerplate and I'm pretty sure the stderr handling is irrelevant
    #include <sys/wait.h>
    #include <err.h>
    #include <stdio.h>
    #include <unistd.h>
    char rbuf[640];
    int main(void) {
        int tochild[2], fromchild[2];
        pipe(tochild); // NOTE lack of error checking
        pipe(fromchild);
        pid_t pid = fork();
        if (pid < 0) err(1, "fork failed");
        if (pid == 0) { // child
            // NOTE also lacks various checks
            close(tochild[1]);
            dup2(tochild[0], STDIN_FILENO);
            close(tochild[0]);
            close(fromchild[0]);
            dup2(fromchild[1], STDOUT_FILENO);
            close(fromchild[1]);
            setsid();
            execl("./interact", "interact", (char *) 0);
            err(1, "execl failed");
        } else { // parent
            close(tochild[0]);
            close(fromchild[1]);
            write(tochild[1], "coi\n", 4);
            close(tochild[1]);
            // KLUGE bad, but it's getting late...
            ssize_t linelen = read(fromchild[0], rbuf, 639);
            printf("%*s\n", (int) linelen, rbuf);
            close(fromchild[0]);
            int status;
            wait(&status);
        }
    }

Yep, it's the setsid(2) call that SBCL makes:

    $ make runner
    cc -O2 -std=c99 -fno-diagnostics-color -fstack-protector-strong -pipe -pedantic -Wall -Wextra -Wcast-qual -Wconversion -Wformat-security -Wformat=2 -Wno-unused-function -Wno-unused-parameter -Wnull-dereference -Wpointer-arith -Wshadow -Wstack-protector -Wstrict-overflow=3 -lncurses   -o runner runner.c
    $ ./runner
    interact: open /dev/tty failed (6): Device not configured

So you'll need to find a way to make the (unportable!) pipe, fork, dup2, and exec calls directly in Common LISP, or find some way to disable the setsid(2) call. Or use a different language that plays better with the unix interface. Or SBCL does not perform a "double fork" so it may be possible to recover a controlling terminal...

What Have We Learned?

tags #unix #lisp #perl #c #debug

bphflog links

bphflog index

next: Orphans Of Athens