virtualmin postfix spamassassin clamd

Subject:

https://www.virtualmin.com/node/19714#comment-form

2016-12-07 18:21:48gstlouis

the link below explains how you can train SA (SpamAssissing)
 

http://faisal.com/docs/salearn.html

gstlouis
vote
2016-12-09 19:40:05

Why Train?

SpamAssassin comes with a large set of rules for likely spam behavior. This is somewhat effective, but not very smart. Training allows SA to learn what kind of spam you get over time, and adjust its results accordingly.

How To Train

First, file all your mail into folders which only contain spam or non-spam ("ham"), but not both.

Then, run the commands listed below, based on the format your mail server uses to store mail. Most unix systems use "mbox" format folders. Some use "mbx" folders, while others use Maildir format. If you aren't sure what your system uses, check with an administrator.

You'll run sa-learn once per folder, telling sa-learn whether to consider that folders' contents to be spam or ham. It will then use the contents as an examples to compare to future mail.

Make sure you have filed your mail into folders correctly before running sa-learn, and make sure you run sa-learn with the right flags. If you leave mail from your mother in a folder you train as spam, SA will start to think your mom is a spammer. If you have a message that is false-negative or false-positive and you train with it (e.g. a false positive -- move it into the inbox and retrain) it will learn the contents of that message and should do the right thing with them in the future

SA's training starts to work effectively once:

  • You've trained about 3000 messages each of both spam and ham.
  • You have an equal amount of spam and ham trained, or more ham than spam trained. This last bit is important: If you only train spam -- and not ham -- the filter will become biased towards spam.

I've found that it makes sense to:

  • train regularly until SA becomes smart enough that spam isn't annoying me
  • train again when spam slips through
  • train intermittently, even when spam doesn't slip through, just to keep SA up to date (of course, this necessitates keeping old spam around)

You should sweep your spam folder occasionally to make sure you aren't accidentally trapping legitimate mail. If you are, refile and retrain. Be careful of false positives. Training is a feedback loop, and legitimate mail learned to be spam will lead to more legitimate mail captured as spam.

Training with mbox format

The general format is:

sa-learn --no-sync [--spam or --ham] --mbox [folder]

For exmaple, assuming you've shoved all spam into a spam folder in the ~/mail directory:

sa-learn --no-sync --spam --mbox ~/mail/spam

And assuming you've filed ham into several folders (in this example, "friends" and "lists", also in the ~/mail directory):

sa-learn --no-sync --ham --mbox ~/mail/friends sa-learn --no-sync --ham --mbox ~/mail/lists

You'll also want to clear all spam out of your inbox, and file that. Most systems using the mbox format store the inbox in a special location. On Obscure, the location is /var/spool/mail/[userid]. For example, for me the command is:

sa-learn --no-sync --ham --mbox /var/spool/mail/faisal

Once you've trained all the folders you're using, you'll need to run this command to tell sa-learn to clean up after itself and rebuild its database:

sa-learn --sync

If you'd like to see what's currently in the database, do:

sa-learn --dump magic

nspam and nham are the number of spam and ham messages that SpamAssassin has learned from.

Training with mbx format

Training with mbx format works much the same same as training with mbox format, except you must use "-mbx" instead of "-mbox" for all commands:

sa-learn --no-sync [--spam or --ham] --mbx [folder]

Generally the special folder for the inbox using mbx format is the INBOX folder in the user's home directory:

sa-learn --no-sync --ham --mbx INBOX

Training with Maildir format

Maildir format is a bit different -- it stores each message in a seperate file within one of three subdirectories ('cur', 'new', and 'tmp'). Instead of pointing sa-learn at a specific mbox or mbx file, you point sa-learn at the directories and it looks at all the files inside:

sa-learn --no-sync [--spam or --ham] [folder/{cur,new}]

For example:

sa-learn --no-sync --spam ~/Maildir/.INBOX.Spam/{cur,new}

or

sa-learn --no-sync --ham ~/Maildir/.INBOX/{cur,new}

gstlouis
vote
2017-03-27 19:27:52