💾 Archived View for yasendfile.org › TipTricks › email_extraction.gmi captured on 2022-07-16 at 14:20:25. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Written by Wim Stockman - on 06 Aug 2020 - Updated on 23 Feb 2021
All commands are done in bash on a Linux Arch System
If you downloaded some mails in mbox format from gmail takeout or some newsgroup.
Sometimes you just want every mail as a single file.
Here is a nice one liner to do this with Awk:
awk '/^From / {nr +=1;next;} ; {print $0 >> sprintf("%06d.eml",nr)'}' your_mbox_file
A Perl equivalent:
perl -pe 'open STDOUT, sprintf(">m%05d.mbx", ++$n) if /^From /' < your_mbox_file > before-first
When your coworker sents you a bunch of emails as an attachment inside an email and you want every mail separatly.
Or you selected a lot of important mails for yourself and sented them to yourself in one mail so you could easily save them.
And now you want every mail as a single file for you new archiving system you are building.
Here are some steps to get you going.
Tools required: munpack link to install: https://salsa.debian.org/debian/mpack
The command:
munpack -t yourmail.eml
This will extract your different emails and name them part1 part2 part3 etc... without extension.
if you want to rename them to mail1.eml mail2.eml you can run this command:
for f in part*; do mv "$f" "mail${f:4}.eml"; done
To combine both commands into a nice oneliner:
munpack -t yourmail.eml && for f in part*; do mv "$f" "mail${f:4}.eml"; done
The best result is achieved with munpack from the mpack package.
I tried with ripmime but this fails to much.
You can install the mpack from source from https://salsa.debian.org/debian/mpack or use your package manager.
The command is simple:
munpack yourmail
This will extract the attachments from your email. if you also want to text part use the "-t" option
munpack can only extract from one email at a time but it does it really well.
So how do we get those hundreds mails with attachment processed in a bliss.
Presuming all your mails have a suffix of .eml we can easly select them with a wildcard *.eml
So we throw in some bash magic for-loop and we and end up with this command:
for f in *.eml; do munpack "$f"; done
If munpack encounters duplicated names of files it wil add a numbered suffix to it.
So you don't have to worry about that.c
If all your mail-files have different names you should first copy them to a separate folder
and then use the more wild wild-card at it.
for f in *; do munpack "$f"; done
Say you have this nice mbox archive of a cartoonist whom made a picture every week over the past decade.
You are intrested in to see all the pictures and not the comment he made.
So this will combine the codes we learned from Chapter 1 and 2 and combine them together
in this command:
awk 'BEGIN {RS="\r\n";} /^From / { cmd="munpack"; print mail | cmd ;close(cmd) ;mail ="";} {mail = mail $0 "\n";}'
The RS="\r\n" is only needed if your mbox file is created in a dos or windows environment. I noticed with google takeout it is needed.
Sidenote: I love awk, that's why I made this work from inside awk where it calls munpack as a subprocess.
Have fun.