Full Disclosure mailing list archives
Re: Spam with PGP
From: Bob Apthorpe <apthorpe+fd () cynistar net>
Date: Tue, 7 Oct 2003 15:18:28 -0500 (CDT)
Hi, I suggest that before you start explaining what SpamAssassin does and how it does it that you visit http://www.spamassassin.org/, specifically the README at http://www.spamassassin.org/full/2.7x/dist/README On Tue, 7 Oct 2003, Jonathan A. Zdziarski wrote:
[missing attribution] wrote:Of course, SpamAssassin does bayesian filtering as well. heuristic + bayesian is better than either alone, IMHO.
Actually the way SA does it weakens filtering. SA's bayesian filtering is only a very small piece of SA, and unfortunately not much attention has been given to it. The filter's final calculation is only a small percentage of the actual final score.
Here are SA's Bayesian scores; the four columns of scores are: 1: no network tests (DNSBLs, Razor, DCC, Pyzor), no Bayes 2: network tests, no Bayes 3: no network test, Bayes 4: network tests, Bayes score BAYES_00 0 0 -4.901 -4.900 score BAYES_01 0 0 -0.600 -1.524 score BAYES_10 0 0 -0.734 -0.908 score BAYES_20 0 0 -0.127 -1.428 score BAYES_30 0 0 -0.349 -0.904 score BAYES_40 0 0 -0.001 -0.001 score BAYES_44 0 0 -0.001 -0.001 score BAYES_50 0 0 0.001 0.001 score BAYES_56 0 0 0.001 0.001 score BAYES_60 0 0 1.789 1.592 score BAYES_70 0 0 2.142 2.255 score BAYES_80 0 0 2.442 1.657 score BAYES_90 0 0 2.454 2.101 score BAYES_99 0 0 5.400 5.400 The lowest positive Bayesian score (BAYES_60 w/network tests) is 1.592, providing ~32% of the (default) 5 points necessary for a message to be flagged as spam. This would appear to counter your claims that SA's Bayesian classifier provides only a small fraction of the total score.
Because true Bayesian filtering performs a huge majority of the same tests that SA performs, SA's own ruleset easily waters down any bayesian findings whenever there are opposing values between the two.
The Bayesian classifier does not perform the same rule-based heuristic tests. Depending on how vigilant the end-user was in training the Bayesian classifier, it's rare that the statistical scores and the heuristic scores are both large and of opposite signs.
For example, a pine MUA...SA thinks a pine MUA suggests an innocent message, but a majority of the emails with a pine MUA my wife receives are spams. In this case, the hard-coded MUA rule will unfortunately water down the score, even if Bayes thinks a pine MUA is spam. Obviously the pine MUA is just a small rule, but if you apply this to the other rules, you get the same results.
SA 2.5x had a number of negative-scoring tests that were easily forged (various MUA signatures, REFERENCES, IN_REP_TO, PGP signatures, etc.) These rules have been dropped from SA 2.60 or have had their scores far reduced to counter this known problem.
What's worse is that last time I looked (this may have changed), SA's bayesian filter did not appear to have a mechanism for learning, but was just a static dictionary. If users got spam there was no way for the user to forward their spams into the system for processing. Again, this may have changed and if it has, that's great.
SA has included sa-learn for manual training ever since the Bayesian classifier was incorporated into the code (v2.50.) Additionally, SA contains thresholds above/below which messages will be automatically learned as spam/ham so the system trains itself (albeit slowly) without user intervention.
The product of Bayesian filtering includes all the heuristic tests as well, so having both _hurts_ you, and is not something you benefit from.
No it does not, on all counts. You need to review the difference between heuristic and statistical classifiers.
It is much better to focus on creating a strong probability-based filter IMHO...and I think the statistics agree with me.
Then perhaps you should join forces with the people already performing such statistical comparisons between SpamAssassin, CRM114, bogofilter, and the like. The SA development list is at http://lists.sourceforge.net/mailman/listinfo/spamassassin-devel This problem (evading spam-filtering by including a bogus PGP sig) is a recognized and dead issue. The solution is to keep your security tools up-to-date. As SA filters more spam, spammers will find new ways around the filters, heuristic, statistical, or otherwise. -- Bob Apthorpe _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.netsys.com/full-disclosure-charter.html
Current thread:
- Re: Spam with PGP, (continued)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Michael D Schleif (Oct 08)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- RE: Spam with PGP Kurt Weiske (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Bob Apthorpe (Oct 07)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Devin Nate (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Devin Nate (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
- Re: Dealing with spam (was: Spam with PGP) Paul Russell (Oct 08)