In the English version, some basic tests for French, German, and Italian are also included. From their number alone, you can tell they are a varied lot, but they include tests for the common indicators of spam in headings, in the bodies of email, and in HTML code, as well as tests for recognizing offers for anti-viruses, drugs, and pornography. More than 50 are listed in my current installation of Debian Stable. You can view the Perl scripts used by SpamAssassin in /usr/share/spamassassin. Many tests, although not all, rely heavily on regular expressions to catch variations of words and phrases. SpamAssassin’s main approach is to identify the characteristics of spam and then run tests to locate them. SpamAssassin takes a different approach from Bogofilter. bogofilter folder in your home directory. Advocates of this approach emphasize its simplicity, as well as its lower number of false positives once it is trained – that is, once the white and black lists are produced. However, the most important point for the average user is that Bogofilter relies on statistical probability, supplemented by each user’s list of spam and ham. The mathematically inclined can learn more about how Bogofilter assigns the probability of an email being spam by following the links and reading the man page for the filter. However, the basic approach remains that advocated by Graham. The modern refinements include recognizing MIME types, treating each hostname and IP address as a separate token (rather than dividing them up into separate words), and ignoring dates and Message-IDs as irrelevant. Today, Bogofilter is maintained by other developers,and has refined Graham’s calculations based on Gary Robinson’s suggestions. For this reason, he also included the possibility of using white lists to indicate non-spam, or “ham,” and black lists to indicate spam.Īfter reading Graham’s essay, Eric S. However, he also recognized that the more personalized the filter was, the more accurate it would be. ![]() If the probability was greater than 0.9, the message was considered spam.Īccording to Graham, the advantage of this statistical approach is that it refers to something real – the probability of being spam – and worked with both neutral and spam-indicating words. By examining the top 15 tokens in the header and body of each new email message, he calculated the possibility that it was spam. Graham’s solution was to parse his samples of spam and non-spam into tokens, or individual words, and use Bayesian tools to assign each token the possibility that it indicates spam, biasing them slightly in favor of not being spam to minimize false positives. After trying to develop filters based on the identifying characteristics of spam, Graham concluded that beyond a certain point, the more rules he added, the more false positives he obtained – that is, the more email messages that were incorrectly identified as spam. However, to make an informed choice between spam filters requires considerably more detail.īogofilter has its roots in “ A Plan for Spam,” a 2002 essay by English developer Paul Graham. The more suspect words contained in an email, the greater the chances it is spam. More specifically, both apply Bayes’ work by collecting words and assigning a probability that each word indicates spam. To call them Bayesian means nothing more than their structure is based on the the 18th century work of Thomas Bayes in statistics and probability. In fact, learning that Bogofilter and SpamAssassin are “Bayesian” is useless for choosing between them. Instead, most users simply nod solemnly when they read that both involve “Bayesian filtering.” Most of us – including many who use the phrase – have no idea what Bayesian filtering is, but it sounds scientific and reassures us that either choice is acceptable. However, what is less often discussed is which filter is the best to use in which circumstances. ![]() Although a few other choices (e.g., SpamBayes) are available, when an email reader offers a plugin, it is almost always for either Bogofilter or SpamAssassin. ![]() Other choices, like DSPAM, are no longer in development. Obviously, you'd need to rewrite a couple lines in your web administration packages that allows users to modify their SQL prefs with proper GLOBALS ($GLOBAL) and DOMAIN (%) references.These days, the choice of spam filters comes down to Bogofilter and SpamAssassin. And if neither exist, it will assume global required_hits. | | rewrite_header Subject | - |ĭoing it this way guarantees proper sorting of prefs, so the last required_hits found would be that of the user if defined, or that of the domain if defined. | $GLOBAL | score USER_IN_BLACKLIST | 10 | | $GLOBAL | score USER_IN_WHITELIST | -10 | (username='$GLOBAL' OR username='%' ORDER by username ASC Mysql> select username,preference,value from userpref WHERE
0 Comments
Leave a Reply. |