Stringify the Arguments

This is a clever and fast if maybe not portable means to convert the contents of argv into a string. In higher level languages one might use something like

    (format nil "~&~{~A~^ ~}~&" (cdr *posix-argv*))

    $ perl -E 'say "@ARGV"' foo bar
    foo bar

but these take more complicated objects and mangle them into a sequence of bytes, plus other overhead. A less expensive way is to directly manipulate argv, assuming nothing else has mangled that memory. Also I'm pretty sure unixlike systems follow the following convention, but someone out there may not follow this standard, which is that the arguments are a blob of bytes followed by a blob of bytes for the environment. The pointers for each element of the outer argv array all (in theory, and by default) point to addresses inside the argument blob, and likewise the outer environ array points into the environment blob.

             memory address 640         644         648           ...
                            f  o  o  \0 b  a  r  \0 P  A  T  H  = ...
                            ^           ^           ^
    argv    address         |           |           |
    0       640     --------/           |           |
    1       644     --------------------/           |
                                                    |
    environ address                                 |
    0       648     --------------------------------/
    ...

The above diagram simplifies matters; the program name has been skipped over (typical for code involving getopt(3) or similar), and the memory addresses are usually much larger numbers that can be troublesome for humans. Also on modern systems the addresses are typically randomized, which can make debugging more difficult.

    $ cfu -E 'foo bar' 'printf("%p %p %s\n", *argv, *environ, *argv)'
    0x71234ce1dea0 0x71234ce1deac cfu
    $ perl -E 'say 0x71234ce1deac - 0x71234ce1dea0'
    12
    $ cfu -E "foo bar" 'printf("%p %p %p\n", argv[0], argv[1], argv[2])'
    0x7542a21e4560 0x7542a21e4564 0x7542a21e4568

Four bytes apart, three arguments, so the environment should start 12 bytes after the arguments.

    0  1  2  3  4  5  6  7  8  9  10 11 12
    c  f  u  \0 f  o  o  \0 b  a  r  \0 environment starts here ...

The fast trick for stringifying argv is to loop from somewhere in argv (usually without the program name) to the beginning of the environment, replacing any NUL bytes with a space. Of course this modifies argv and will render it unsuitable for most other uses; if that's a problem you'll need to copy the bytes elsewhere and modify the NUL as that copy goes along.

    $ cat buggy.c 
    #include <stdio.h>
    int main(int argc, char *argv[]) {
        extern char **environ;
        char *ap = *argv;
        while (ap < *environ) {
            if (*ap == '\0') *ap = ' ';
            ap++;
        }
        printf("%s\n", *argv);
    }
    $ make buggy
    cc -O2 -pipe    -o buggy buggy.c 
    $ ./buggy foo bar
    ./buggy foo bar _=./buggy

Err, the fast and less buggy trick for stringifying argv is to fencepost one byte prior to where environ starts so that the ultimate NUL on the argv bytes is not clobbered, thus preventing the first environment variable from being included in the new arguments string, though this bug does confirm that the environment follows directly after the arguments. Also we probably do not want to include the program name.

    #include <stdio.h>
    int main(int argc, char *argv[]) {
        extern char **environ;
        argv++;
        char *ap   = *argv;
        char *halt = *environ - 1;
        while (ap < halt) {
            if (*ap == '\0') *ap = ' ';
            ap++;
        }
        printf("%s\n", *argv);
    }

On the other hand, programs probably do not spend much time in the "parsing the arguments list" part of the code, so most if not all code can get away with being more inefficient. Unless you're writing something that spawns tens of thousands of nmap processes—don't ask, I was as usual making something up as I went along, and on unix one may start with a shell script—in which case you need a different design that does not involve forking off tens of thousands of nmap processes, and not to waste time optimizing argv processing.

"The death of optimizing compilers". Daniel J. Bernstein. 2015.

Those slides are however a bit hard to follow; maybe there's a video or paper somewhere?