Append Write Corruption

Someone on the #gemini IRC channel had guestbook code up for review; the results of malloc and fopen calls were not checked. These calls can and do fail, folks!

    // bad!
    char *foo = malloc(BAR);
    ...

    // okay (season with better error handling to taste)
    char *foo = malloc(BAR);
    if (!foo) abort();
    ...

(On linux the OOM killer may instead blast random holes in your process table rather than letting a malloc fail. This, uh, feature probably contributes to sloppy mallocs that lack proper error checks.)

However, lack of error checking is not the main point of this posting, which is rather whether code to append to a guestbook log file is atomic or not. That is, would the writes ever become interleaved should multiple log entries be written by two (or more) instances of the CGI program running at the same time?

    ...
    FILE *fh = fopen("guestbook", "a");
    // again, season with better error handling to taste
    if (!fh) abort();
    // NOTE the client might include '\n' in their input, what then?
    fprintf(fh, "... %s\n", input);
    fclose(fh);

Now if the server is single-threaded in that only one instance of the guestbook program can run at a given time, then there is no problem. However, if multiple processes (or threads) can append to the file at the same time, there may be a problem. Some example code will help explore the problem.

append.c

repcharcount.c

So we have two programs; "append" spawns multiple threads (it could also fork, same difference) that write strings of a unique character of a certain length to the same "out" file. "repcharcount" counts the length of repeated characters in a file. If everything goes well, we would expect to see contiguous blocks of our certain length—or factors of the length, say for length=42 writing "a" or "b" one might see 42*2 "a" followed by 42 "b"; maybe the code that's writing "a" was able to run twice because the code writing "b" was busy at that time. This code uses 99 instead of 42, go figure.

    $ CFLAGS=-lpthread make append
    cc -lpthread   -o append append.c
    $ make repcharcount
    cc -O2 -pipe    -o repcharcount repcharcount.c
    $ ./append
    $
    $ ./repcharcount out | sed 3q
    99 a
    198 b
    99 c
    $ ./repcharcount out | wc -l
          29
    $ ./repcharcount out | grep -v 99
    198 b

Here the thread writing "b" got two sequential calls in while "a" and "c" were doing who knows what. Otherwise the "a" "b" and "c" threads all took turns, and there was no obvious corruption of the file. No problem, right?

Wrong! The string size is too small to exhibit corruption. That is, it is lucky that this code works. It will fail, horribly, should the string become too long. Let's embiggen the string.

    $ grep STRINGSIZE append.c
    #define STRINGSIZE 99U
                    char *s = malloc(STRINGSIZE);
                    memset(s, 'a' + i, STRINGSIZE);
    $ ed append.c
    1215
    /STRINGSIZE
    #define STRINGSIZE 99U
    s/99/99999
    #define STRINGSIZE 99999U
    wq
    1218
    $ CFLAGS=-lpthread make append
    cc -lpthread   -o append append.c
    $ ./append
    $ ./repcharcount out | grep -v 99999
    32768 b
    32768 c
    32768 b
    32768 c
    32768 b
    32768 c
    1695 b
    32768 a
    1695 c
    65536 a
    32768 b
    1695 a
    32768 b
    32768 c
    32768 b
    32768 a
    32768 c
    1695 b
    32768 a
    32768 c
    32768 a
    1695 c
    34463 a
    32768 b
    32768 a
    32768 b
    32768 c
    32768 a
    32768 b
    32768 c
    1695 a
    1695 b
    34463 c
    65536 a
    32768 b
    32768 c
    32768 a
    32768 b
    32768 c
    1695 a
    32768 b
    32768 c
    1695 b
    1695 c
    32768 a
    65536 b
    32768 a
    32768 b
    32768 c
    32768 a
    1695 b
    32768 c
    1695 a
    34463 c
    32768 b
    32768 a
    32768 c
    32768 b
    32768 a
    32768 c
    32768 b
    32768 a
    32768 c
    1695 b
    1695 a
    1695 c
    32768 b
    32768 a
    32768 b
    32768 a
    32768 c
    32768 b
    32768 a
    32768 c
    1695 b
    1695 a
    34463 c
    65536 b
    65536 a
    32768 b
    32768 a
    32768 c
    1695 b
    1695 a
    67231 c
    32768 b
    32768 a
    32768 b
    32768 a
    32768 b
    32768 a
    32768 c
    1695 b
    1695 a
    67231 c
    98304 b
    32768 c
    1695 b
    67231 c

Whoops. Not a lot of "99999, or multiples thereof" in that output. Indeed, the output file shows corruption; writes have been interleaved because 99999 is larger than the buffer size for the write(2) calls involved, which someone clever can probably take a good guess as to what that value is. Less clever folks like me will process trace the script and see what it is doing.

    $ ktrace ./append
    $ kdump | grep write | grep RET | sed 10q
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 1695/0x69f
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000
     62878 append   RET   write 32768/0x8000

You'll not find writes larger than 32768 on this system, OpenBSD 7.3, though the buffer size may be smaller or larger on other systems. Older systems will typically have smaller buffer sizes, which will make them more prone to exhibit corruption problems. Put another way, modern systems will better hide the problem until a too large fprintf(3) call is made, which is then split across multiple write(2) calls, and then, surprise! you may have file corruption.

Fixes

Alternatives might be to write to a database, or to instead use a Maildir style inbox for guest messages where each message is written to a unique file and then something else parses those messages into a time-ordered fashion (or some algorithm injects ads). Another solution would be to having everything that interacts with the guestbook file use flock(2). These alternatives have various advantages and drawbacks, but generally do not suffer from the "guestbook is corrupted if large enough entries are written at the same time" problem.

tags #unix