[SOLVED] Teach rspamd to learn ham/spam when user moves mail to/out of Junk

Rspamd does a good job of filtering real mail from junk though, it may sometimes hit or miss.

It is possible to teach rspamd through its web interface by uploading the raw message and classifying it as ham or spam. This process takes time and effort and is not always practical.
It’s also possible to create a cron job to teach rspamd by scanning mails in the junk/inbox folders.

It would be nice if the mail user could ‘teach’ rspamd what is spam and what is ham immediately when they moved mail in/out of their junk folder.

Some web search and reference in the salsa freedombox issue #56 has actually described the steps to achieve this. I’m planning on trying to give this a try but would like to ask on the forum if anyone has any previous experience or knowledge on the matter.

Thanks.

EDIT: Apparently, the cronjob way worked for me in the end.

I couldn’t get “automated bayesian spam/ham training” working as dovecot required some additional plugins that im not familiar with installing. Without too much delay, I’ve figured a “shortcut” with a simple script and cronjob to achieve what i needed.

I’m sharing it here for anyone who might be interested.

STEP 1: Create a script for learning ham / spam.
rspamd has a command for leaning ham/spam in specified folder locations. under my home folder, a created a simple (bash) script that scans my inbox for ham and my junk folder for spam.

 #!/bin/bash

rspamc learn_ham /var/spool/mail/my_user/mail/mailboxes/INBOX/dbox-Mails/u.*
rspamc learn_spam /var/spool/mail/my_user/mail/mailboxes/Junk/dbox-Mails/u.*

please replace my_user with your own user and dont forget to make the file executable

I named this script rspamd_learn_my_user.sh. I preferred to create a dedicated script for user-by-user basis.

STEP 2: Create a root cronjob (sudo crontab -e) to run this script once a day.
You can revise it to run as often as you like. The script needs to be run as root otherwise using rspamd command and access to mail folders will not be possible.

# learn spam and ham
0 3 * * * /bin/sh /home/my_user/rspamd_learn_my_user.sh

this command will run the script once at 3am every morning

and that’s basically it… it’s more of a hack and may not really be the best practice but it’s getting the job done.

A better way to do this - with realtime learning - would be achieving the steps the shared in my previous post (using sieve). So, although I’m closing this thread for now, if anyone gets “automated bayesian learning” working, feel free to comment below.

2 Likes