💾 Archived View for thrig.me › blog › 2023 › 04 › 07 › mbox-maildir-other.gmi captured on 2023-09-28 at 16:22:57. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-04-19)
-=-=-=-=-=-=-
Mailbox is the traditional storage format for emails on unix; a Mail Transport Agent (MTA) such as Sendmail would chat up a Mail Delivery Agent (MDA) such as mail.local or procmail, and eventually if everything went well the message would be appended to a file, /var/mail/spongebob perhaps. That's the conventional BSD directory.
A major disadvantage is the problem of locking--how does the MDA append a message given that at the same time another program, perhaps the user's mail client, is editing the file? Locking! This assumes both sides use the same locking, and may become terribly complicated should the mailbox files be located on a NFS server.
Another problem is that "From " is used to delimit messages, which means that no line within the message can begin with "From " so sometimes one might see ">From " and wonder where that > came from. It most likely came from a mailbox file.
Mailbox is probably okay if you do not deal with very many messages, when the locking is known to be solved between the mail client and MDA, and when you know not to blindly copy out of a mailbox file without correcting for leading >From. Or who cares what gets into Project Gutenberg in these glorious post-editor days?
maildir came about some years later; it uses the atomic rename(2) call to move a new message from the tmp over to the new directory, and then there is no locking problem with a mail client, and no need to modify messages containing "From" as each message is stored in an individual file.
There are disadvantages; in particular "folders" with large numbers of messages may run into various filesystem limits: directory operations can become terribly slow, or commands may fail due to a shell glob expanding out to too many files. A filesystem may run out of inodes, and then no more writes will be possible. Check both `df` and `df -i`. The severity of this problem will depend on the filesystem, how many files are in a directory, and whether the user is using any shell globs to process files, and if so what the maximum length of arguments to an execv(3) call is.
This format is generally good, unless you hoard email messages and therefore have "folders" with tens or hundreds of thousands of messages in them, or have users who do that. An expert will probably know to write software that rotates messages into weekly or monthly archives to limit the total message count per "folder". Or, they might save archive messages into a database.
I generally follow "mailbox zero" and delete the heck out of messages; others are quite happy to have 70,000 messages in their inbox, 50,000 of them unread. That user complained that the IMAP server was slow.
The "MH Message Handling System" exists. Haven't ever much looked at it. pine, and then mutt, was good enough for me. Outlook was terrible.
Otherwise one will probably use a database, in which case one need not worry about the various file or filesystem issues of the above.
Disadvantages include lack of client support; a custom MDA and mail client--the MUA or Mail User Agent, if you were perchance running low on acronyms--would need to know about the custom database, or maybe the database could be hidden away behind LMTP, IMAP, or other interfaces.
Backups would probably be more complicated than "let rsync under rsnapshot or similar copy the files off elsewhere" that more or less works for mailbox and maildir. Probably not an issue if you've got the database setup and managed aright. Having worked in IT, backups are often broken or untested.
Obviously there is complexity here: mailbox or maildir may suffice. What to use probably boils down to how many emails you deal with, what you do with them, how often you need to search them, archive requirements, how many mail clients you use, what those clients support, how okay you are with complexity, etc.
For me, the network latency of IMAP makes it unusable. I've never owned nor ever much used a Smartphone so invariably have a single device to read the mail on; the number of emails I deal with are limited, except when some coworker years ago had Nagios setup to generate ~10,000 messages per year. I generally delete all emails pretty quickly. Probably too quickly. noatime is set on the filesystems, which means that mutt may not detect new mail in a mailbox file. Therefore I use maildir.
next: Re: Daniel Janus, Ted Nelson, and the Web of Documents