Full Disclosure mailing list archives

Re: Spam with PGP

From: Devin Nate <devin.nate () bridgecomm net>
Date: Tue, 07 Oct 2003 22:59:00 -0600

Jonathan A. Zdziarski wrote:

Bayesian filters have had some amazing successes. The problem we (thecompany I work for) continue to have, and the reason we continue tochoose SA, is that training a thousand users on how to use a Bayessystem is pretty much impossible (and we're small compared to many!)Assuming that I give you (I'm do not believe it, but will give it forthe sake of argument) that Bayes is the best theoretical solution, theBayes folks have a problem in implementation. Training users is noteasy; think about training your mother or grandmother but multiply by 1000.
This is why two features exist, both which I think are components of any
good Bayesian solution:

1. User groups. ...
2. A merge tool. ...

Excellent points. There is definitely R&D to be done in sharing Bayesinfo. Just as antivirus is able to LiveUpdate the braindead easy todefine viruses, so should Spam Software. Regrettably, one of the keypoints of Bayes is that it is individualized. A common 'Bayes' DB issomewhat more difficult.

Global tools are also an invaluable asset to fighting spam.  We're
working on a magical blacklisting tool that will capture source ips from
incoming spam...when a threshhold is exceeded, all incoming messages
from that source ip are marked/learned as spam for all users (system
wide) for whatever time period we specify.

Indeed, many are working on such a solution. We have a similar system inproduction for our users, and have commented on similar ideas for the SAsystem. Regrettably, the number of IP addresses is actually fairly largein terms of tracking spam status. And the variety of ways that spam canbe transmitted complicates matters. Nevertheless, a bug has been openedat SA to attack the IP addresses that spammers use. Also note that anumber of high profile anti-spam DNS services have been DoS'ed intooblivion (a couple in the last 2 months). So whatever solution needs tobe resilient (either by having a holy ton of bandwidth, or peer to peer).

One of our ideas is a probability based system relating to the'closeness' of an IP address to the subnet as a spammer. The closer tothe spammer, the more probable. As issue is, IPv4 has 2^32 addresses.Yes, I know many of those aren't used - lets assume that only 1/2 of theaddresses are internet valid. That leaves you with 2^31 addresses. Thatrequires a minimum of 2.1GB of disk space to represent. Again we get tocost/benefit. 2.1GB of disk space is not that expensive, but sysadmin,backup, etc of all that disk is. (How would you feel that your AntiSpamsolution just cost you 2.1GB?) That doesn't even account for the CPUrequired to access a 2.1 GB database. It also doesn't account for thefact that spammers are slime and rotate IP addresses, use relays, etc.Complicating matters is that 'once a spam relay' does not mean 'always aspam relay'. We've needed to retest IP addresses to verify their status.

Note, however, that the learning process does not need to be
tech-savvy.  For example, we specifically sculpted our tool to be brain
dead easy for grandma.  You get your mail like normal, and if you get a
spam you forward it to grandma-spam () yourdomain com.  There are even
tools such as SpamSource (for Outlook) that can make this process a
simple click of a button.  The signature mechanism we use stores the
original tokenset in binary format in a temporary database on the server
(or in the form of message attachments), which our tool will then use to
relearn the message as spam.

You've done better than us. How have you managed to train your users toforward the email as the full email, incl all headers, etc? We've foundmost forwarded messages do not include all headers, and thereforeforwarded messages train the spam database with semi legit emails (i.e.headers are legit because they are forwarded).

It sounds like you've moved about 8 steps beyond us, with some kind of aspam button interface. IMHO that's what SMTP really needs - a 'feedbackloop' protocol to teach the server. Such a protocol would be similar toPOP3 in the reverse direction (in particular, provide some form ofauthentication and then push a message), so that you could push a buttonand teach a central server (by whatever mechanism it chooses to learnby) that a message is SPAM. Nevertheless, we have not found a way totrain our users to appropriately forward messages- they usually don'tinclude the full headers, and therefore we miss the majority of the spamdata.

Anyhow, my point is, we're trying to improve the ease-of-use factor,
which is a big reason tools like SA are still useful...out-of-the-box
functionality...however that doesn't necessarily mean heuristics are not
obsolete from a scientific perspective.  I think we're getting to a
point where enough tools exist to make a deployment just as easy, and
hopefully if things continue at the rate they're going, companies like
yours that require this level of ease will be able to use Bayesian
solutions

I love it, increasing spam protection is great. My perspective is thatfiltering 90% of spam for 1000 users (via SA, or whatever) is betterthan filtering 99% of spam for 1 user. Yes, the individual number isbetter in terms of percentage, however by doing the whole group ofusers, we block several hundred to a few thousand spam messages a day.It remains a difficult problem.


--

____________________________________________________________

Devin Nate
Chief Consultant & General Manager
BridgeComm Corporation
http://www.bridgecomm.net/
mailto:devin.nate () bridgecomm net

____________________________________________________________

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Current thread:

Re: Spam with PGP, (continued)
- - Re: Spam with PGP Gregory A. Gilliss (Oct 07)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
    - RE: Spam with PGP Kurt Weiske (Oct 07)
  - Re: Spam with PGP Craig Pratt (Oct 08)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
    - Re: Spam with PGP Bob Apthorpe (Oct 07)
    - Re: Spam with PGP Shawn McMahon (Oct 07)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
    - Re: Spam with PGP Devin Nate (Oct 07)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
    - Re: Spam with PGP Devin Nate (Oct 07)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 08)
    - Re: Dealing with spam (was: Spam with PGP) Paul Russell (Oct 08)
    - Re: Spam with PGP Kiko Piris (Oct 07)
    - Re: Spam with PGP Jonathan A. Zdziarski (Oct 07)
    - RE: [inbox] Re: Spam with PGP Curt Purdy (Oct 08)
- Re: Spam with PGP Security Administrator (Oct 07)
  - Re: Spam with PGP Shawn McMahon (Oct 07)
    - Re: Spam with PGP Sebastian Niehaus (Oct 07)
- Break Macromedia Activation Alex (Oct 07)
- RE: Spam with PGP Andy Wood (Oct 07)

(Thread continues...)