funsec mailing list archives

Re: Im lovin google spam filter


From: <michael.blanchard () emc com>
Date: Thu, 7 Apr 2011 12:49:34 -0400

Just for the sake of argument alone, not that I doubt your findings by any means:


Numbers can be skewed to behave in a manner that would prove anyone's point too...

 If we take 100 people on this list, have them all look in their backyards and report if there is any paper or plastic 
blowing around, I'll bet we can come up with a fairly high percentage of us that don't have any paper or plastic 
blowing around.  I'll further say that I'll bet the number would be within a standard deviation of 4% error.  So, if 
96% of us don't have any paper or plastic blowing around in our yards, could we safely say that no-one litters? :-)

Mike B

Michael P. Blanchard
Senior Security Engineer, CISSP, GCIH, CCSA-NGX, MCSE
Office of Information Security & Risk Management
EMC ² Corporation
32 Coslin Drive
Southboro, MA 01772


-----Original Message-----
From: funsec-bounces () linuxbox org [mailto:funsec-bounces () linuxbox org] On Behalf Of Rich Kulawiec
Sent: Thursday, April 07, 2011 11:57 AM
To: funsec
Subject: Re: [funsec] Im lovin google spam filter

On Thu, Apr 07, 2011 at 10:04:49AM -0400, Patrick Laverty wrote:
I just checked my spam box for gmail and see 1500 messages.  A quick scan of
the "From" and I saw zero false positive.

Alternatively: "I looked in my own back yard and there's no paper
or plastic blowing around, therefore nobody litters."

Meaningful tests of FP (and FN) rates require large sample sets (in
the sense of number of messages and number of accounts); moreover, they
require careful attention to the composition of those sample sets, both in
terms of how the addresses are actively used, and how they're passively
used (by spammers).  They also require far more than a single snapshot;
one day's sample is meaningless.  They require more than casual analysis:
human eyeballs are far too unreliable to accurately process that much
data.  And so on: this isn't an easy or quick measurement to make, even
for those of us who have been studying the problem for a very long time.

I've done all that, which is how I know that Gmail's FP (and FN,
incidentally) classification performance is mediocre.

---rsk

_______________________________________________
Fun and Misc security discussion for OT posts.
https://linuxbox.org/cgi-bin/mailman/listinfo/funsec
Note: funsec is a public and open mailing list.


_______________________________________________
Fun and Misc security discussion for OT posts.
https://linuxbox.org/cgi-bin/mailman/listinfo/funsec
Note: funsec is a public and open mailing list.


Current thread: