Notes on a program

This entry is less a polished entry and more just notes on a project I've been given. If it seems somewhat random and hard to fathom, that's why.

Smirk is drowning in email. As such, he's looking for a filtering solution whereby he can run a job daily that scans his email (using IMAP (Internet Message Access Protocol)) and shuffles email to different folders. The criteria is something like “I've read it and it's older than seven days, move to this folder. If it's unread and older than three days, move to this other folder. If I've read it and replied to it, move it to yet another folder.”

procmail [1] won't really handle that, as it's meant more for initial delivery and filtering of email. He also rejected sieve [2] as it apparently doesn't handle date parsing that well (or something like that). So he asked me if I could write such a program, preferrably using PHP since he knows that language (and since I equally hate Perl and PHP, it's six of one, half dozen the other, and I would prefer C, but that's me).

So, the design of the program. Given some input file describing the filtering to do on email:

>
```
account imap://alice:zahg34!@mail.example.net/
{
mailbox INBOX
{
foreach message
{
if (header.subject =~ /[Vv][Ii1][Aa@][Gg][Rr][Aa]/)
moveto Trash;
if (status = REPLIED) moveto Replied;
if (header.date ~ "3 days ago" && status = UNREAD)
moveto Archive;
if (header.date ~ "7 days ago" && status != UNREAD)
moveto ReadArchive;
}
}
mailbox Archive
{
if (messages > 5000)
sendmail("Yo! There are too many messages in the archive!");
if ((messages > 3000)
|| (message[1].header.date >~ "6 months ago"))
sendmail("Yo! Check your archive!");
}
}
```

Okay, maybe nothing quite so grandiose, but some file to explain the rule sets for moving messages from one box to another, run as a job periodically (a cron job).

Obligatory PHP Documentation Links

Since Smirk wants this in PHP

We need to retrieve information via IMAP. We need to parse the email headers. We'll need regular expressions, as well as date processing utilities (“3 days ago,” “less than 5 hours,” etc). We'll need to read and parse the rules file (using whatever syntax I come up with). Oh, I would like to translate all the text to some intermediary character set so we can filter consistently [7], which means using iconv (and parsing MIME (Multipurpose Internet Mail Extensions) specific headers and MIME-encoded headers).

So the main program flow for processing each message would look something like:

>
```
get headers for next message
convert to consistent character set (probably UTF (Unicode Transformation Format)-8)
for each rule to check again
check conditions of rule against message
if all conditions apply, apply action
```

The hardest parts appear to be getting a version of PHP with all the required exentions installed. Next would be defining the input file and parsing that into some internal format for processing. The rest pretty much just falls into place.

Most of the time will be spent in building the required version of PHP, and in playing with the various modules to figure out how they work and what exactly one gets. I would also need to set up a play IMAP account to test the program against (there's no way I want to run this on my email account, or on Smirk's for that matter).

[1] http://www.procmail.org/

[2] http://www.nada.kth.se/datorer/e-post/sieve-at-nada.shtml

[3] http://us3.php.net/manual/en/print/ref.imap.php

[4] http://us3.php.net/manual/en/print/ref.pcre.php

[5] http://us3.php.net/manual/en/print/ref.mailparse.php

[6] http://us3.php.net/manual/en/print/ref.iconv.php

[7] /boston/2005/04/27.2

Gemini Mention this post

Contact the author