Full Disclosure mailing list archives

Re: [Tool] DeepToad 1.1.0


From: Dan Kaminsky <dan () doxpara com>
Date: Tue, 5 Jan 2010 16:08:44 +0100

Joxean's stuff is similar to Nilsimsa or (as he mentions) ssdeep, in that
it'll find mostly similar instances of the same underlying data, assuming
only small bit-level changes (such as from version shifts).  It's obviously
not a magic unpacker of any arbitrary virus, though.

His stuff, by its very nature, is a fuzzy similarity metric, meaning if you
run it on small chunks of a file sequentially you can get fuzzy diff.

Detecting multiple files of the same file type is actually a different
problem, and sort of an interesting one.  The best thing to do here is take
a large number of samples that *are* your file type, and then a large number
of samples that *are not* your file type (and are not the same other
not-the-right-type), and look for either strings or statistical patterns
that show up in the member set and not in the alternate.  These fingerprints
are then sought in other samples.

It's not terribly common that you actually need to do this though.  Browsers
need to do this a bit because MIME types are wonky.  They do this
optimization by hand though.


On Tue, Jan 5, 2010 at 3:56 PM, T Biehn <tbiehn () gmail com> wrote:

I can see what you're saying, it could be useful for finding
differences in different versions of the same binary but from what I
can see Joxean's app is meant to group files of the same 'type,' not
provide 'diff' capabilities.

-Travis

On Tue, Jan 5, 2010 at 9:51 AM, Dan Kaminsky <dan () doxpara com> wrote:
I looked into a fair amount of this sort of normalization back when I was
playing with dotplots.  The idea was to upgrade from simple Levenshtein
string comparison (with no knowledge of variable length x86 instructions,
pointers that shift from compile to compile, etc) to something with at
least
some domain specific knowledge.  What I found, somewhat surprisingly, was
that dumb string comparison was more than enough.  In fact, when I
compared
pre-patch and post-patch builds, it was easy to directly see when content
was added, removed, shifted in location, etc.  Joxean's going to have
much
the same result -- as basic as his similarity metric is, he'll get the
broad
strokes just fine.

Ultimately the best approach is to build a graph of how functions
interact
and measure graph isomorphism, but of course Halvar figured that out
years
ago :)

On Tue, Jan 5, 2010 at 3:41 PM, T Biehn <tbiehn () gmail com> wrote:

Hmm,
Wouldn't it be more useful to the sec community to have a algorithm
that abstracts at the -interpreted- content level? That is when
analyzing binaries I wouldn't think that this would classify two with
near identical functionality together, even though it is removing a
significant chunk of information during the hash pass.

I would largely assume that your algorithm, as is, works best on
uncompressed bitmaps. Is there something I'm missing?

-Travis

On Sun, Jan 3, 2010 at 6:37 AM, Joxean Koret <joxeankoret () yahoo es>
wrote:
Hi all,

I'm happy to announce the very first public release of the open source
project DeepToad, a tool for computing fuzzy hashes from files.

DeepToad can generate signatures, clusterize files and/or directories
and compare them. It's inspired in the very good tool ssdeep [1] and,
in
fact, both projects are very similar.

The complete project is written in pure python and is distributed
under
the LGPL license [2].

Links:
Project's Web Page http://code.google.com/p/deeptoad/
Download Web Page http://code.google.com/p/deeptoad/downloads/list
Wiki http://code.google.com/p/deeptoad/w/list

References:
[1] http://ssdeep.sourceforge.net/
[2] http://www.gnu.org/licenses/lgpl.html

Regards && Happy new year!
Joxean Koret


_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/




--
FD1D E574 6CAB 2FAF 2921  F22E B8B7 9D0D 99FF A73C

http://pgp.mit.edu:11371/pks/lookup?search=tbiehn&op=index&fingerprint=on
http://pastebin.com/f6fd606da

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/





--
FD1D E574 6CAB 2FAF 2921  F22E B8B7 9D0D 99FF A73C
http://pgp.mit.edu:11371/pks/lookup?search=tbiehn&op=index&fingerprint=on
http://pastebin.com/f6fd606da

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: