Security Basics mailing list archives
Re: Where to get spam?
From: "Micheal Espinola Jr" <michealespinola () gmail com>
Date: Mon, 19 Feb 2007 22:33:17 -0500
That's where signatures and heuristics fail, and a properly tuned (truly randomized) Bayesian database succeeds. For instance: The anti-spam product ASSP keeps two directories of saved emails: ham and spam. These collections are referred to as the corpuses. All messages ham and spam are saved into the corpuses. On a scheduled basis, the corpuses are trimmed to a set amount of maximum messages in which to use to [re]build the Bayesian database with. BUT to prevent intentional spammer abuse, the cleanup of the corpus is randomized. Deletion of corpus messages are not based on date, thus leaving in the corpus messages that can be years old. This also helps prevent newer waves of spam from intentionally skewing the Bayesian ham/spam word tables. You might thing that this poses a problem of the database becoming "stale" to newer forms of spam, but this simply is not the case as demonstrated IRL. I would never claim its perfect; because I don't believe any anti-spam product is, but anything that does make it through is quickly corrected and compensated for once the user that received the false-negative spam properly reports it back to ASSP. ASSP is able to manage this appropriately by maintaining ~18,000 messages in each corpus. I can't speak for other products, but this illustrates where the Bayesian aspects of ASSP prevail over other products that rely more heavily on signatures and heuristics. On 2/19/07, Mark Teicher <mht3 () earthlink net> wrote:
This is a very interesting question. Why do you need spam from 2006/2007, SPAM TTL is <24, most SPAM engines will not detect SPAM > 30 days old. I have researched this problem for over a long period of time, most anti-spam products out there will have issues detecting any type of spam over 2 weeks old, since keeping signature/heuristic bases that huge will slow down the performance of the product, which is an interesting question in of itself. Why.. You are better off working with a university or local school that retains their mail for some period of time Mark At 04:16 PM 2/17/2007, secbasics () dusty ece cmu edu wrote: That was almost perfect. Unfortunately since I am correlating spam data against other traffic types, I need the spam to be from 2006/2007, and the most recent one there is 2005. Thanks anyway though. Aaron On Sat, Feb 17, 2007 at 01:23:39PM +1100, David West wrote: > Try the SpamAssassin public mail corpus.. > http://spamassassin.apache.org/publiccorpus/ > > David West > > On 2/16/07, secbasics () dusty ece cmu edu <secbasics () dusty ece cmu edu> wrote: > >Does anyone know organizations which give away spam captures? I mean, > >obviously I will get lots of spam just from posting on this list (;)) but > >I would like to > >get more to analyze. It seems like every couple months some student does a > >project which requires spam but they always have to start from ground > >zero. Isn't > >there anywhere which gives spam to security researchers? > > > >Thanks > > > >Aaron > >
-- ME2
Current thread:
- Where to get spam? secbasics (Feb 16)
- Re: Where to get spam? Kelly Martin (Feb 16)
- Re: Where to get spam? Richard Cox (Feb 19)
- Re: Where to get spam? secbasics (Feb 20)
- Re: Where to get spam? David West (Feb 19)
- Re: Where to get spam? secbasics (Feb 19)
- Re: Where to get spam? Jeffrey Rivero (Feb 19)
- Re: Where to get spam? peter e higgins (Feb 20)
- Re: Where to get spam? secbasics (Feb 19)
- <Possible follow-ups>
- Re: Where to get spam? Mark Teicher (Feb 19)
- Re: Where to get spam? secbasics (Feb 20)
- Re: Where to get spam? Micheal Espinola Jr (Feb 20)
- RE: Where to get spam? Weir, Jason (Feb 20)