I spent several days testing the greylist daemaon [1] on the development server, and could not for the life of me reproduce the crash. I cleaned up the code a bit, and again, I couldn't get the program to crash on the development server.
Moved the latest version to the production server, with the checkpoint feature enabled, and after a few hours (about 4½ hours this time) it froze.
Disable the checkpoint feature, and it runs fine on the production server.
I'm giving up on this bug hunt for now. The program saves its state when it stops running—the only way we'll lose the state is if the server it's running on suddenly loses power, and if that's the case, then we have more issues to worry about.