Dailydave mailing list archives

Re: approximate string matching - Bloom filters


From: "Fausett, Mark (US SSA)" <mark.fausett () baesystems com>
Date: Fri, 1 Sep 2006 13:23:43 -0400

Bloom filters are approximate in a different sense though -- Think of
them as space efficient, but lossy token sets; you put a bunch of tokens
in, and subsequently can query whether a particular token was placed
into the set; to some degree of confidence.
Bloom filters are subject to false positives -- they'll sometimes
incorrectly tell you that a token is in the set -- but not false
negatives.  Because hashing functions are used to insert tokens into the
bloom filter, the false positives have nothing to do with approximate
string matches.

Cheers,

Mark Fausett
-----Original Message-----
From: dailydave-bounces () lists immunitysec com
[mailto:dailydave-bounces () lists immunitysec com] On Behalf Of Martin
Roesch
Sent: Friday, September 01, 2006 10:06 AM
To: Mateusz Berezecki
Cc: dailydave () lists immunitysec com
Subject: Re: [Dailydave] approximate string matching

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Have you taken a look at Bloom Filters?  I'm not sure what your exact
requirements are but Bloom Filters are fast and do probabilistic pattern
matching.  Check out:

http://en.wikipedia.org/wiki/Bloom_filter
http://www.arl.wustl.edu/~sarang/analysis.pdf

They're easy to implement and fast, so maybe that will work for you.   
Hope that helps!

      -Marty


...SNIP...

Attachment: Fausett, Mark (US SSA).vcf
Description: Fausett, Mark (US SSA).vcf

_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave

Current thread: