Firewall Wizards mailing list archives

Re: regarding spam...


From: ant () notatla demon co uk (Antonomasia)
Date: Fri, 29 Mar 2002 23:55:16 +0000 (GMT)

From: Ryan Russell <ryan () securityfocus com>

I think the hard part becomes how do you tell if one piece of mail is the
same as another?  If they were absolutely identical, you could ship MD5
hashes around, and everything would be great.  One problem is that many
spam messages are unique in some small way to the recipient, i.e. they
contain tracking info.  Perhaps you then have an algorithm that can
produce a percentage match when two emails are compared?

Before you leap into this consider possible litigation from people
who says (truly or not) that you prevented their mail from arriving.
Public distribution of the spam characteristics may not be the best plan.

Suggestion for similarity checking:

1)  Feed all docs being processed through a formatter to
    remove all whitespace (space,tab,nl,cr).  Maybe squash cases.

2)  Use some form of chunk recognition to slice the doc into
    moderate sized chunks.  Something like paragraphs, but based
    on the data after step 1.  This could be delimiting chunks by
    short strings kept in a local file.  The strings might be a few
    characters long and selected from past postings at random to
    represent what is found in real traffic.
2b) If this is one-party replay detection the chunk delimiters can be
    secret and can undergo gradual change.

3) Hash each chunk and store the document description as the list
   of hashes, with an expiry period (say from now to now + 1 year).

4) Comparison of a new doc to the records of previous docs would
   result in rejection if some high-ish fraction of the chunks
   matched those of a previous doc.  (Order should probably matter.)

I downgrade incoming mail based on Received: headers (whole country codes,
whole ISP and non-resolving IP's) and for content (including HTML).
Frequently the downgrade is total - bouncing it unseen.

--
##############################################################
# Antonomasia   ant notatla.demon.co.uk                      #
# See http://www.notatla.demon.co.uk/                        #
##############################################################
_______________________________________________
firewall-wizards mailing list
firewall-wizards () nfr com
http://list.nfr.com/mailman/listinfo/firewall-wizards


Current thread: