Unit test this

Or, “What is a ‘unit test,’ part II [1]”

I saw a decent answer to my question [2] which makes sense for C. Another decent (if a bit vague) answer was:

So to answer Sean's question, a unit test is that which requires the least amount of work to setup but is able to reduce the need for coverage of the larger system. Whatever you want to consider a "unit" is up to you and the language you're using.

“Re: What is a "unittest"? [3]”

I left off my previous entry pointing to a function that I would love to have seen someone else “unit test,” but alas, no one did. But I always had plans on going all “The Martian [4]” on the code [5] and “unit test the XXXX out of it.”

So here's the code in question:

/***********************************************


#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <syslog.h>
#include <sysexits.h>

/************************************************************************/

bool run_hook(char const *tag,char const *argv[])
{
  assert(tag     != NULL);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  pid_t child = fork();
  
  if (child == -1)
  {
    syslog(LOG_ERR,"%s='%s' fork()='%s'",tag,argv[0],strerror(errno));
    return false;
  }
  else if (child == 0)
  {
    extern char **environ;
    int devnull = open("/dev/null",O_RDWR);
    if (devnull == -1)
      _Exit(EX_UNAVAILABLE);
    if (dup2(devnull,STDIN_FILENO) == -1)
      _Exit(EX_OSERR);
    if (dup2(devnull,STDOUT_FILENO) == -1)
      _Exit(EX_OSERR);
    if (dup2(devnull,STDERR_FILENO) == -1)
      _Exit(EX_OSERR);
    for (int fh = STDERR_FILENO + 1 ; fh <= devnull ; fh++)
      if (close(fh) == -1)
        _Exit(EX_OSERR);
    execve((char *)argv[0],(char **)argv,environ);
    _Exit(EX_UNAVAILABLE);
  }
  else
  {
    int status;
    
    if (waitpid(child,&status,0) != child)
    {
      syslog(LOG_ERR,"%s='%s' waitpid()='%s'",tag,argv[0],strerror(errno));
      return false;
    }
    
    if (WIFEXITED(status))
    {
      if (WEXITSTATUS(status) != 0)
      {
        syslog(LOG_ERR,"%s='%s' status=%d",tag,argv[0],WEXITSTATUS(status));
        return false;
      }
    }
    else
    {
      syslog(LOG_ERR,"%s='%s' terminated='%s'",tag,argv[0],strsignal(WTERMSIG(status)));
      return false;
    }
  }
  
  return true;
}

/************************************************************************/

As you can see, it's one function, in one file, with the only dependencies being the operating system. So this should be the “perfect unit” to write some “unit tests” for. The code does replicate a bit of the standard C function system(), so why not use system() in the first place? The answer comes from the manual page for Linux:

Do not use system() from a privileged program (a set-user-ID or set-group-ID program, or a program with capabilities) because strange values for some environment variables might be used to subvert system integrity. For example, PATH could be manipulated so that an arbitrary program is executed with privilege. Use the exec(3) [6] family of functions instead, but not execlp(3) [7] or execvp(3) [8] (which also use the PATH environment variable to search for an executable).

“system(3) - Linux manual page [9]”

This function runs as part of a set-user-ID program (mod_blog [10] in particular, for reasons beyond the scope of this entry) so no system() for me. Also, I avoid having to construct a command string that might have failed to properly escape the filename to avoid complications with the shell's use of certain characters. And it's not like the function was hard for me to write. I've done functions like this before, and it worked the first time without issue when I wrote it (and the small changes to it since have been a simplification of the parameters, and changes to the logging messages). It's also not a very long function (I'm sorry Ron Jefferies, but 14 lines of code [11] isn't “a lot of code”).

The reason I wanted some unit test proponents to look at this code is that it involves quite a bit of interaction with the operating system in C, a not-very-popular programming language these days, and I was curious as to the level of “unit testing“ that would be done. No bites, but my gut feeling is that a “unit test proponent” faced with this code would just throw two scripts to it, one to return successfully:

int main(void)
{
  return 0;
}

and one to return failure:

int main(void)
{
  return 1;
}

and call it “battle tested.” The two test cases themselves are pretty easy to write:

#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

#include "tap14.h"

extern bool run_hook  (char const *,char const **);

int main(void)
{
  tap_plan(2,NULL);
  tap_assert( run_hook("script",(char const *[]){ "./success" , NULL }),"success script");
  tap_assert(!run_hook("script",(char const *[]){ "./failure" , NULL }),"failure script");
  return tap_done();
}

(I'm using my own testing framework based on TAP (Test Anything Protocol) [12]. I wrote my own to be as minimal as possible to get the job done—other TAP (Test Anything Protocol) frameworks for C I looked at were too overblown for my tastes.)

An improvement would be to add a test script that terminates via a signal. It's again easy enough to write that script:

#include <signal.h>

int main(void)
{
  raise(SIGINT);
  return 1;
}

and the appropriate test:

tap_assert(!run_hook("script",(char const *[]){ "./terminate" , NULL }),"terminate script");

But that only tests half the code. How do I know I didn't mess up the codepath in the child process before I execute the script in question? At The Enterprise, it was expected our tests cover about 70% of the code at least— I'm short of that target here. And as I say, I'm aiming to “unit test the XXXX out of this” and get 100% code coverage, because shouldn't that be the goal of “unit testing?”

But to achieve that target, I'm going to have to deal with “failing” a bunch of existing functions, and according to my interprestation of “A Set of Unit Testing Rules [13],” if I'm not mocking, I don't have a “unit test.” So I have to mock some system calls.

And here is where I hit a problem—to do so will invoke the dreaded “undefined behavior of C.” Seriously–if I provide my own function for, say, dup2(), I am technically invoking undefined behavior of the C machine (this incredibly long flame war on the Lua mailing list [14] of all places, goes into the gory details behind that statement). Now granted, certain tool chains on certain operating systems allow one to override functions, but you can't rely upon this behavior in general. Given that I'm doing all of this on Linux, and Linux in general allows this, I can proceed (carefully) with mocking system functions.

That should be straightforward enough. The mocked open() function:

static int err_open;
static int ret_open;

int open(char const *pathname,int flags)
{
  (void)pathname;
  (void)flags;
  if (err_open != 0)
    errno = err_open;
  return ret_open; // XXX had bug here
}

This should be fine for my purposes as I don't actually need to read from the file. If I really needed to call into the original function, this might work:

static int err_open;

int myopen(char const *pathname,int flags)
{
  if (err_open == 0)
    return open(pathname,flags,0);
  errno = err_open;
  return -1;
}

#define open	myopen

But as the “A Set of Unit Testing Rules” article states, “A test is not a unit test if: it touches the file system.” So the above isn't a “true mock,” and I shall continue with my “true mocked” function instead. I can continue with similar implementations for the functions dup2(), close() and waitpid(). Unfortunately, there are three functions that may present some interesting challenges: fork(), execve(), and _Exit(). The first returns twice (kind of—if you squint and look sideways), the second only returns if there's an error, and the third never returns.

Now looking over the implementation of the function I'm testing, and thinking about things, I could do a similar implementation for fork()—the returning twice thing is where it returns once in the parent process, and once in the child process, but I can treat that as just a normal return, at least for purposes of testing. For execve(), I can only test the error path here as the script being “run” isn't being run. That just leaves _Exit() as the final function to mock. And for that one, I wrap the entire call to run_hook() (the function being “unit tested”) around setjmp() and longjmp() to simulate the not-returning aspect of _Exit(). So a test of the close() codepath would look like:

static bool X(char const *tag,char const *argv[])
{
  volatile int rc = setjmp(buf_exit);
  if (rc != 0)
    return false;
  return run_hook(tag,argv);
}

int main(void)
{
  /* ... */

  ret_open  = 4;
  err_dup2  = 0;
  ret_dup2  = 0;
  bad_dup2  = -1;
  err_close = EIO;
  ret_close = -1;
  tap_assert(!X("script",(char const *[]){ "./success" , NULL }),"close() fail");

  /* ... */
  return tap_done();
}

I got all the test cases written up and all 11 tests pass:

TAP version 14
1..11
ok 1 - success script
ok 2 - failure script
ok 3 - terminate script
ok 4 - fork() fail
ok 5 - open() fail
ok 6 - dup2(stdin) fail
ok 7 - dup2(stdout) fail
ok 8 - dup2(stderr) fail
ok 9 - close() fail
ok 10 - execve() fail
ok 11 - waitpid() fail

A successful “unit test” with 100% code coverage. But I'm not happy with this. First off, I don't get the actual logging information for each test case. All I get is:

Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./failure' status=1
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./terminate' terminated='Interrupt'
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./success' fork()='Cannot allocate memory'
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./success' waitpid()='No child processes'

and not

Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./failure' status=1
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./terminate' terminated='Interrupt'
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' fork()='Cannot allocate memory'
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=69
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=69
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' waitpid()='No child processes'

(And **no!** I am not checking that syslog() got the right message in the test cases—been there, done that and all I got was a stupid tee-shirt and emotional scars. It's easy enough to just manually check after the test runs, at least for this entry.)

It just doesn't feel right to me that I'm testing in a faked environment. No, to get a better “unit test” I'm afraid I'm going to have to keep invoking undefined C behavior that is allowed by Linux, and interpose our functions by using LD_PRELOAD to override the functions. And I can set things up so that I can still call the original function when I want it to succeed. So all that needs to be done is write a shared object file with my versions of the functions, and include this function:

static pid_t (*___fork)   (void);
static int   (*___open)   (char const *,int,mode_t);
static int   (*___dup2)   (int,int);
static int   (*___close)  (int);
static int   (*___execve) (char const *,char *const [],char *const []);
static pid_t (*___waitpid)(pid_t,int *,int);

__attribute__((constructor))
void init(void)
{
  ___fork    = dlsym(RTLD_NEXT,"fork");
  ___open    = dlsym(RTLD_NEXT,"open");
  ___dup2    = dlsym(RTLD_NEXT,"dup2");
  ___close   = dlsym(RTLD_NEXT,"close");
  ___execve  = dlsym(RTLD_NEXT,"execve");
  ___waitpid = dlsym(RTLD_NEXT,"waitpid");
}

(I include all three parameters to open() even though the last one is optional—I don't want to have to deal with the variable argument machinery with C—this should work “just fine on my machine”—I'm already into territory that C formally forbids. I'm using triple leading underscores because single and double leading underscores are reserved to the C compiler and implementation, but nothing is mentioned about three leading underscores.)

Now, how to get information to my replacement functions about when to fail. I thought about it, and while there is a way to do it with global variables, it gets complicated and I'd rather do this as simply as possible. I figured I could sneak variables through to my replacement functions via putenv(), getenv() and unsetenv(). This will make the close() failed test case look like:

  putenv((char *)"SPC_CLOSE_FAIL=5"); /* EIO */
  tap_assert(!run_hook("script",(char const *[]){ "./success" , NULL }),"close() fail");
  unsetenv("SPC_CLOSE_FAIL");

And the corresponding close() function is:

int close(int fd)
{
  char *fail = getenv("SPC_CLOSE_FAIL");
  if (fail == NULL)
    return (*___close)(fd);
  errno = (int)strtoul(fail,NULL,10);
  return -1;
}

The other functions work simularly, and when run:

TAP version 14
1..11
ok 1 - success script
ok 2 - failure script
ok 3 - terminate script
ok 4 - fork() fail
ok 5 - open() fail
ok 6 - dup2(stdin) fail
ok 7 - dup2(stdout) fail
ok 8 - dup2(stderr) fail
ok 9 - close() fail
ok 10 - execve() fail
ok 11 - waitpid() fail

More importantly, since the functions can actually function as intended when I don't want them to fail, I get the full output I expect in the system logs. But per the “A Set of Unit Testing Rules” article, this is no longer a “proper unit test.”

I don't know. The more I try to understand “unit testing,” the less sense it makes to me. There is no real consensus as to what a “unit” is, and it seems strange (or in my more honest opinion, stupid) that we as programmers are not trusted to write code without tests, yet we're trusted to write a ton of code untested as long as such code is testing code. As I kept trying to impart to my former manager at The Enterprise before I left, the test case results aren't to be trusted as gospel (and it always was by him) because I didn't fully understand what the code was supposed to do (because the business logic in “Project Lumbergh [15]” has become a literal mess of contradictory logic and communication among the team seriously broke down).

So maybe we're not supposed to “unit test” functions that involve input, output, or system interactions. Maybe we're supposed to “unit test” more “pure functions” and leave messy real world details to, oh, I don't know, some other form of testing. Okay, I have one final function [16] that should be perfect for “unit testing.”

We shall see …

[1] /boston/2022/10/08.1

[2] /boston/2022/10/10.2

[3] gemini://gemini.sh0.xyz/log/2022-10-12.gmi

[4] https://www.imdb.com/title/tt3659388/

[5] https://github.com/spc476/mod_blog/blob/049e9dcf0985d4e089ac3593f39c987f077f0bcb/src/hooks.c#L39

[6] https://man7.org/linux/man-

[7] https://man7.org/linux/man-pages/man3/execlp.3.html

[8] https://man7.org/linux/man-pages/man3/execvp.3.html

[9] https://man7.org/linux/man-pages/man3/system.3.html#NOTES

[10] https://github.com/spc476/mod_blog

[11] https://ronjeffries.com/xprog/articles/oksudoku/

[12] https://testanything.org/

[13] https://www.artima.com/weblogs/viewpost.jsp?thread=126923

[14] http://lua-users.org/lists/lua-l/2018

[15] /boston/2018/09/11.2

[16] https://github.com/spc476/mod_blog/blob/67e97a769ea793f53a7be27f4b7568c5e102e758/src/wbtum.c#L147

Discussions about this entry

Unit test this | Lobsters

gemini://thrig.me/blog/2022/12/22/unit-tests.gmi

Gemini Mention this post

Contact the author