Full Disclosure mailing list archives
Re: Spam with PGP
From: "Jonathan A. Zdziarski" <jonathan () nuclearelephant com>
Date: Tue, 07 Oct 2003 13:08:09 -0400
Sorry I didn't mean to sound like a troll. I'll follow up with some information...guess I shouldn't have gone out to lunch after sending this email =) First, check out http://www.paulgraham.com. He goes into great detail to explain how probability-based filters work. He explains in the setting of 'bayesian' filtering, but this could be applied to Chi-Square and other similar types of filtering that use mathematical probabilities. Heuristic filters are based on a set of static rules which identify characteristics of spam. Some of the setbacks to this are: - The rules are not specific to each user's own behavior, which severely hampers accuracy - The rules require constant updating as spammers are always circumventing the latest rulesets - Most such filters, such as SpamAssassin, have no way of learning; they must be reprogrammed Here is a short excerpt from the DSPAM FAQ about the difference between DSPAM (my project) and SpamAssassin. I'm not knocking SpamAssassin; I think it's a great tool, and is good if you need out-of-the-box filtering...but there are several long term solutions that are much better. <snip> SpamAssassin is based primarily on a set of rules to detect the individual characteristics of spam. DSPAM, on the other hand, puts all of its weight primarily on tokenized Bayesian filtering. The advantage to using DSPAM's approach, I feel, is that almost all of the rules SpamAssassin uses to identify the characteristics of spam are automatically performed by DSPAM's approach. On top of this, because DSPAM's analysis is on a per-user basis, it is able to determine just how important each characteristic (or "rule" in SpamAssassin talk) is to each user, rather than collectively. For example, SpamAssassin's first rule is to identify if the MUA is pine. Many users receive more spams from a pine MUA than not. DSPAM performs this automatically as part of its Bayesian analysis and is able to calculate the probability on a per-user basis, so a user who receives a lot of innocent pine mail will get a more innocent probability than someone whose only pine mail are spams. This keeps DSPAM very lightweight and resource friendly. Out of SpamAssassin's 921 rules, only 133 rules were not performed by the advanced Bayesian filtering of DSPAM. Out of that 133, 39 were duplicates, range rules, or nearly identical rules. 33 were blackhole rules, 31 were rare, very low scoring, or unmeaningful rules, and 4 were illogical. This left a total of 26 good rules performed by SpamAssassin that were not performed by DSPAM. While these 26 remaining rules are good, they themselves do not positively identify spam, but only a few underlying characteristics that may or may not identify a particular message (innocent or spam) </snip> As far as other alternatives, there's DSPAM, BogoFilter, Spambayes, and several others. I can't speak much about the rest, but I can tell you that DSPAM uses a much more advanced approach implementing Chained Tokens for advanced language analysis, De-obfuscation techniques, etc. All of these are great tools...and probability-based filtering is why heuristic filters are obsolete...no trolling intended. On Tue, 2003-10-07 at 12:34, Gregory A. Gilliss wrote:
Okay, maybe this is a troll, but in case it isn't how about listing some recommendations for spam filters to replace spamassassin? I'm sure there's probably still a few people on list using it who would be interested in what works better.
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.netsys.com/full-disclosure-charter.html
Current thread:
- Re: Spam with PGP, (continued)
- Re: Spam with PGP Thomas Binder (Oct 07)
- Re: Spam with PGP Dave Howe (Oct 07)
- Re: Spam with PGP Florian Weimer (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Michael D Schleif (Oct 08)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- RE: Spam with PGP Kurt Weiske (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Bob Apthorpe (Oct 07)
- Re: Spam with PGP Shawn McMahon (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Devin Nate (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
- Re: Spam with PGP Devin Nate (Oct 07)
- Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)