Filtering spam using Rspamd and OpenSMTPD on OpenBSD

Comment on Mastodon

Introduction

I recently used Spamassassin to get ride of the spam I started to receive but it proved to be quite useless against some kind of spam so I decided to give rspamd a try and write about it.

rspamd can filter spam but also sign outgoing messages with DKIM, I will only care about the anti spam aspect.

rspamd project website

Setup

The rspamd setup for spam was incredibly easy on OpenBSD (6.9 for me when I wrote this). We need to install the rspamd service but also the connector for opensmtpd, and also redis which is mandatory to make rspamd working.

pkg_add opensmtpd-filter-rspamd rspamd redis
rcctl enable redis rspamd
rcctl start redis rspamd

Modify your /etc/mail/smtpd.conf file to add this new line:

filter rspamd proc-exec "filter-rspamd"

And modify your "listen on ..." lines to add "filter "rspamd"" to it, like in this example:

listen on em0 pki perso.pw tls auth-optional   filter "rspamd"
listen on em0 pki perso.pw smtps auth-optional filter "rspamd"

Restart smtpd with "rcctl restart smtpd" and you should have rspamd working!

Using rspamd

Rspamd will automatically check multiple criteria for assigning a score to an incoming email, beyond a high score the email will be rejected but between a low score and too high, it may be tagged with a header "X-spam" with the value true.

If you want to automatically put the tagged email as spam in your Junk directory, either use a sieve filter on the server side or use a local filter in your email client. The sieve filter would look like this:


if header :contains "X-Spam" "yes" {
        fileinto "Junk";
        stop;
}

Feeding rspamd

If you want better results, the filter needs to learn what is spam and what is not spam (named ham). You need to regularly scan new emails to increase the effectiveness of the filter, in my example I have a single user with a Junk directory and an Archives directory within the maildir storage, I use crontab to run learning on mails newer than 24h.

0  1 * * * find /home/solene/maildir/.Archives/cur/ -mtime -1 -type f -exec rspamc learn_ham {} +
10 1 * * * find /home/solene/maildir/.Junk/cur/     -mtime -1 -type f -exec rspamc learn_spam {} +

Getting statistics

rspamd comes with very nice reporting tools, you can get a WebUI on the port 11334 which is listening on localhost by default so you would require tuning rspamd to listen on other addresses or you can use a SSH tunnel.

You can get the same statistics on the command line using the command "rspamc stat" which should have an output similar to this:

Results for command: stat (0.031 seconds)
Messages scanned: 615
Messages with action reject: 15, 2.43%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 9, 1.46%
Messages with action greylist: 6, 0.97%
Messages with action no action: 585, 95.12%
Messages treated as spam: 24, 3.90%
Messages treated as ham: 591, 96.09%
Messages learned: 4167
Connections count: 611
Control connections count: 5190
Pools allocated: 5824
Pools freed: 5801
Bytes allocated: 31.17MiB
Memory chunks allocated: 158
Shared chunks allocated: 16
Chunks freed: 0
Oversized chunks: 575
Fuzzy hashes in storage "rspamd.com": 2936336370
Fuzzy hashes stored: 2936336370
Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 344; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 3822; users: 1; languages: 0
Total learns: 4166

Conclusion

rspamd is for me a huge improvement in term of efficiency, when I tag an email as spam the next one looking similar will immediately go into Spam after the learning cron runs, it draws less memory then Spamassassin and reports nice statistics. My Spamassassin setup was directly rejecting emails so I didn't have a good comprehension of its effectiveness but I got too many identical messages over weeks that were never filtered, for now rspamd proved to be better here.

I recommend looking at the configurations files, they are all disabled by default but offer many comments with explanations which is a nice introduction to learn about features of rspamd, I preferred to keep the defaults and see how it goes before tweaking more.