💾 Archived View for yasendfile.org › TipTricks › email_extraction.gmi captured on 2023-01-29 at 15:46:58. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2022-07-16)

➡️ Next capture (2023-05-24)

-=-=-=-=-=-=-

Email and Batch Attachments Extraction - Tips and Tricks|

Written by Wim Stockman - on 06 Aug 2020 - Updated on 23 Feb 2021

Preface

All commands are done in bash on a Linux Arch System

1. Extracting E-mails

1.1 Extracting e-mails From mbox file format.

If you downloaded some mails in mbox format from gmail takeout or some newsgroup.

Sometimes you just want every mail as a single file.

Here is a nice one liner to do this with Awk:

awk '/^From / {nr +=1;next;} ; {print $0 >> sprintf("%06d.eml",nr)'}' your_mbox_file

A Perl equivalent:

perl -pe 'open STDOUT, sprintf(">m%05d.mbx", ++$n) if /^From /' < your_mbox_file > before-first

1.2 Extracting e-mails from encapsulated emails.

When your coworker sents you a bunch of emails as an attachment inside an email and you want every mail separatly.

Or you selected a lot of important mails for yourself and sented them to yourself in one mail so you could easily save them.

And now you want every mail as a single file for you new archiving system you are building.

Here are some steps to get you going.

Tools required: munpack link to install: https://salsa.debian.org/debian/mpack

The command:

munpack -t yourmail.eml

This will extract your different emails and name them part1 part2 part3 etc... without extension.

if you want to rename them to mail1.eml mail2.eml you can run this command:

for f in part*; do mv "$f" "mail${f:4}.eml"; done

To combine both commands into a nice oneliner:

munpack -t yourmail.eml && for f in part*; do mv "$f" "mail${f:4}.eml"; done

2. Extracting Attachments

2.1 Extract attachments of a single mail

The best result is achieved with munpack from the mpack package.

I tried with ripmime but this fails to much.

You can install the mpack from source from https://salsa.debian.org/debian/mpack or use your package manager.

The command is simple:

munpack yourmail

This will extract the attachments from your email. if you also want to text part use the "-t" option

2.2 Extract attachments of multiple emails

munpack can only extract from one email at a time but it does it really well.

So how do we get those hundreds mails with attachment processed in a bliss.

Presuming all your mails have a suffix of .eml we can easly select them with a wildcard *.eml

So we throw in some bash magic for-loop and we and end up with this command:

for f in *.eml; do munpack "$f"; done

If munpack encounters duplicated names of files it wil add a numbered suffix to it.

So you don't have to worry about that.c

If all your mail-files have different names you should first copy them to a separate folder

and then use the more wild wild-card at it.

for f in *; do munpack "$f"; done

2.3 Extracting all attachments out of an mbox file.

Say you have this nice mbox archive of a cartoonist whom made a picture every week over the past decade.

You are intrested in to see all the pictures and not the comment he made.

So this will combine the codes we learned from Chapter 1 and 2 and combine them together

in this command:

awk 'BEGIN {RS="\r\n";} /^From / { cmd="munpack"; print mail | cmd ;close(cmd) ;mail ="";} {mail = mail $0 "\n";}'

The RS="\r\n" is only needed if your mbox file is created in a dos or windows environment. I noticed with google takeout it is needed.

Sidenote: I love awk, that's why I made this work from inside awk where it calls munpack as a subprocess.

Have fun.