Yup, August was definitely a bad month for coding [1].
Spent most of the day cleaning up the code for the greylist project [2]. The first major problem was incorrectly calling recvfrom(). So while I got the data properly, the IP (Internet Protocol) address of the remote side got munged (oh, I should mention that I'm writing a stand alone graylist server accessible over the network). The second major problem was a memory problem (not unexpected when writing in C) that would only show up after some 1,000 requests came in, but at least was very consistent (same place every time). It took a few hours to realize I confused the purpose of two different arrays.
Yeah, August—bad month [3].
On the good side though, the code is shaping up. The greylist concept works around three pieces of information—the sender's IP address, the sender email address and the recipient email address, stored as a tuple. I've been recording such tuples on my email server (testing the Postfix [4] interface) and have a testbed of 21,200 tuples to test (I've also found out I average about 1,800 emails per day).
To stress test the program, I've been pumping all 21,200 tuples through the code as fast as possible (most of the run time is spent just logging what comes through), and under the worst settings (setting what I call the “embargo timelimit” to just one second instead of the recommended one hour), only 93 tuples (not emails mind you, just IP, sender and recipient) made it through to the whitelist.
That's only 0.4%.
Not bad.
Of course, my server is just accepting all incoming emails, so some spam could be coming from what I'm calling “legitimate servers” (“legitimate servers” are those that actually requeue and deliver the email at a later time) so the final amount might be a bit larger, but so far I'm optimistic this will drastically cut the amount of spam I receive.