Email tracking

Since Friday, I've been trying to track down an email problem, and I still haven't been able to solve it. Part of it was trying to figure out what the problem was from the initial reporting of it:

Happening again with XXXXXXXXXXX XXXX—this time the email was from EVE [I'll be referring to the individual email addresses by name—they have no relation to the actual email addresses involved, but will make it easier to follow along. Trust me on this, it gets confusing] and it did not come through. Any suggestion?

It took about four exchanges and checking the account information before I had a handle on what exactly was happening. The actual events were something like this: DAVE sends a message to CAROL. The address CAROL then automatically forwards the email to ALICE (a local mailbox on the server) and BOB (somewhere else on the Internet). DAVE's messages get through. EVE (from the same domain as DAVE incidently) sends a message to CAROL. The message is delivered to ALICE, but not to BOB.

I have to figure out why.

And that, my friends, is easier said than done.

We have mail logs going back about a month. Five log files (including the current one) in total, and each one averages about 800M (Megabytes) in size (about 4,000,000 lines of logging information each). Log files. Which means a linear search through the file for any type of information needed. Yeah, it takes time, but I can do other things while I'm running a search. The major problem is that the information I need to check a single email through the system is spread out throughout the log. For instance, it took me about 10 minutes to fish the following information out of the log files for a single message (manual search, output formatted by hand):

>
```
**message-id:** 200603271643.k2RGhesn032068
**queue-id:** k2RGheP6012025
**from:** DAVE
**to:** CAROL
**stat:** queued
**at:** 11:43:40
**queue-id:** k2RGhesn032068
**from:** DAVE
**to:** CAROL
**stat:** sent
**at:** 11:43:40
**queue-id:** k2RGheP6012025
**to:** ALICE
**stat:** sent
**at:** 11:43:42
**queue-id:** k2RGhgR2032089
**from:** DAVE
**to:** ALICE
**to:** BOB
**stat:** sent
**at:** 11:43:45
```

This single message took a total of five seconds to be processed. In that five seconds, the information was spread across eight lines of the log file, among 30 such records made in that five second period. It took me four searches to retrieve the eight lines from the log file. You see, each email message (and this is true globally) has a unique message-id, but sadly, only two of the log records had the message-id. The message ended up with four different queue-ids (and that's my term for it) within the system (which is why it took multiple searches to obtain all the information).

And since we don't have a program to analyse the mail logs, this has to be done manually right now.

It also turns out that the mail from EVE isn't logged as being from EVE, but from MALLET, due to the software EVE's using to send email. DAVE is also being logged as coming from MALLET (same software, or email server), which is further confusing the issue (and since the message 200603271643.k2RGhesn032068 was delivered sucessfully, I'm assuming it's from DAVE and not EVE).

Sigh.

Even the thought of writing a program to analyze the logs is giving me a headache.

Update on Tuesday, March 28^th, 2006

Never did figure out where the supposed missing emails went.

Gemini Mention this post

Contact the author