I wrote crashreport() in an attempt to find out why glibc was reporting a double free (or memory corruption) [1], so imagine my surprise when I found other crashes happening [2]. I did find the root causes for the crashes yesterday, but I have yet to figure out why the memory corruption happened.
First off, no points to Apache [3] for failing to report the unexpected termination of a child process. I can certainly understand that the Apache developers don't expect anyone to use CGI (Common Gateway Interface) anymore, and if people do, to use a CGI developed in a scripting language that probably won't core dump. But still, they make the CGI module [4], and that the program the CGI module executes can be written in anything and hiding the fact that a program crashed due to SIGSEGV or SIGABRT is, to me, inexcusable.
Had Apache logged the crash, I probably would have found the error a few years ago (seriously). The actual crash only happened after the output was generated and sent to the browser, so I never saw anything unusual. And because Apache never said anything about a crash and well … everything is okay, right?
Second, the code path with the crash was in a seldom used code path—specifically, when the addentry.html page was requested. I normally use email to create entries, not the web interface. But it's not like I never use the web interface, but I can safely count on two hands the number of times I've used it over the past thirteen years.
So to say it doesn't get a lot of use is an understatement.
Now, are there features I don't use? Yes. And such code is currently commented out. That code was written at a time when I expected other people might use the codebase, but alas, only one other person ever used mod_blog (only to stop blogging due to personal reasons) and now, as far as I know, I'm the only one who uses this codebase. That doesn't bother me, but it does indicate that I should probably remove the code that I don't use.
But the web interface? I use it just enough to justify its existence in the codebase.
Third, the addtion of command line and evironment variables to the output of crashreport() (and I solved the global variable issues I had) certainly helped with the diagnosis. It revealed a request that would reliably crash the program (the aforementioned addentry.html page) and with a reliable way to crash the program, it's easy to isolate the buggy code (if a bit tedious).
And to tell the truth, the bug has existed since May 26^th, 2009 [5], when I made the following commit:
Basically, I rewrote the core blogging engine over the past twelve hours. I still have yet to support adding new entries via the engine, but until I get that fixed, I can add them manually.
only I didn't quite update all the code properly. And since the code path in question isn't executed except when called as a CGI program (I should note that mod_blog can be run from the command line as well), and Apache never logs CGI programs that crash, no wonder I never saw this bug.