Security Incidents mailing list archives
RE: A question for the list...
From: "Jonathan A. Zdziarski" <jonathan () networkdweebs com>
Date: Thu, 29 May 2003 10:19:57 -0400
Hi, Your arguments are definitely valid, but what I hear you saying is that spam filters generally do a poor job, and you're right...most do. There are only a few that adequately filter based on content. Approach is definitely an important factor, but my point is that a good spam filter is far more effective than network-based management. Static filters that search for certain phrases are useless for the reasons you mentioned, however dynamic filtering is much more accurate. Most Bayesian filters, while your mileage may vary, rarely have more than 0.05% risk of false positives - and that's at the upper boundary. SpamAssassin seems to have around 0.04% and DSPAM ranges from 0.01% to 0.03%.
From an empiric scalability perspective, spam is 'lots of the same thing',
or bulk messages, Out of the "lots", however, your network may only get hit with a few copies of the message at a time, making detection less accurate.
If you content filter the Postmaster mailbox along with your other
mailboxes So don't content-filter the postmaster box =) This is another good reason that network-based spam filtering isn't necessarily a good idea. Many blackholes are already very inaccurate, so adding spam-filtering ISPs to the list certainly isn't good for anyone. But please keep in mind spam was a muzzle-loader when this RFC was written, and is now more of an assault weapon. I think trying to place a single human element into the equation for spam filtering still results in the same effect - stolen resources.
Content filtering can also be bad because of context - it has been seen to
reject discussions
of: chicken breasts and thighs; Breast Cancer; Erectile Dysfunction; and
objectionable email. Agreed, we need to make sure our content filters aren't this dumb. Julia Childs needs to be able to discuss her chicken breasts, and lawyers need to discuss their erectile dysfunction...but if I receive any emails for either they will most likely be spam. I believe strongly in per-user corpus-based filtering for this very reason. The DSPAM project maintains a separate dictionary for each user based on their email behavior, which is one reason it's so effective at what it does.
While I use content-based rules (if you can call header fields content) to
process some of my
email, those rules only serve to sort and categorize my email, not to
reject it. I agree, I think filtering based on "Characteristics of Spam" is generally bad, because characteristics change. A great example is the MUA. Tools like SpamAssassin will make an email "more innocent" if the MUA is pine...so what did spammers do? Started using a pine MUA. Headers change, and many spammers are smart enough to send from valid stockpiled domains...the one thing that never changes, however, is the content of the message. ---------------------------------------------------------------------------- ----------------------------------------------------------------------------
Current thread:
- Re: A question for the list..., (continued)
- Re: A question for the list... Stephen P. Berry (May 23)
- Re: A question for the list... Jimi Thompson (May 23)
- Re: A question for the list... Chip Mefford (May 26)
- Re: A question for the list... Ray Stirbei (May 27)
- RE: A question for the list... Jonathan A. Zdziarski (May 28)
- RE: A question for the list... ktabic (May 29)
- RE: A question for the list... Rob Shein (May 29)
- RE: A question for the list... Russell Harding (May 30)
- RE: A question for the list... Russell Harding (May 30)
- Re: A question for the list... Chip Mefford (May 26)
- Re: A question for the list... Jeff (May 29)
- RE: A question for the list... Jonathan A. Zdziarski (May 29)