Computers excel at following instructions to the letter.
Programmers don't quite excell at giving instructions to the computer.
Case in point: the daemon I'm working on [1]. Through testing, I found that the automatic restarting [2] wasn't working in all cases. If the program ran in the foreground, it would restart properly upon a crash. If it started up at an actual daemon though, it would fail. It took me a few hours to debug the problem, primarily because for this problem, I couldn't use gdb (the Unix debugger) for a few reasons:
Painful as it is, the lack of a debugger can be worked around. And before I reveal the actual problem, here's the relevant code (sans error checking, as that only clutters things up):
>
```
int main(int argc,char *argv[])
{
global_argv = argv; /* save argument list for later restarting */
if (gf_run_in_foreground == 0)
daemon_init();
signal(SIGSEGV,crash_recovery);
/* rest of program */
}
void daemon_init(void)
{
pid_t pid;
pid = fork();
if (pid == 0) /* parent exits, child process continues on */
exit(EXIT_SUCCESS);
chdir("/tmp"); /* safe place to execute from */
setsid(); /* become a session leader */
close(STDERR_FILENO); /* close these, we don't need them */
close(STDOUT_FILENO);
close(STDIN_FILENO);
}
void crash_recovery(int sig)
{
extern char **environ;
sigse_t sigset;
syslog(LOG_ERR,"restarting program");
/*---------------------------------
; unblock any blocked signals,
; including the one we're handling
;---------------------------------*/
sigfillset(&sigset);
sigprocmask(SIG_UNBLOCK,&sigset,NULL);
/*---------------------------------
; restart ourselves. If the call
; to execve() fails, there's not
; much else to do but exit.
;---------------------------------*/
execve(global_argv[0],global_argv,environ);
_exit(EXIT_FAILURE);
}
```
Another bit of critical information: I would start the program thusly:
>
```
GenericUnixPrompt> ./obj/daemon
```
If you're good (say, the calibre of Mark [3]) you'll see the problem. If not, don't worry—it took me a few hours. Here's a hint: Once I removed the call to chdir(), the code worked fine in daemon mode, and no, chdir() wasn't failing.
In fact, it didn't matter where I put the chdir() call, having it in there would cause the re-exec to fail when running in daemon mode.
The problem?
By changing directories, the relative path I was using to start the program was no longer valid when calling execve(), and of all the places where I could check the return code, that wasn't one of them. It didn't dawn on me (until thinking about it for a while after removing the call to chdir()) what the actual problem was.
Sheesh.
Here was the program, doing exactly what I told it to do, only I didn't realized what I was telling it to do wasn't what I thought I was telling it to do.
My brain hurts.
As a postscript to this, even if I were able to start the program under gdb, trace into the new process created, pass on the segfault to the signal handler, it wouldn't reveal the problem because gdb uses the full path to the program when running it, thus masking the real problem.
Lovely, huh? [4]
[4] http://blogs.msdn.com/ishai/archive/2004/10/25/247471.aspx