Dailydave mailing list archives

Re: Unknown Application Protocol Analysis


From: Jared DeMott <demottja () msu edu>
Date: Thu, 07 Sep 2006 07:23:16 -0400



Q. How do you run a quick one pass analysis of some proprietary
application
protocol?

I am certainly thinking about this problem.  My goal is to write a tool
that can "automatically" fuzz anything (tcp/udp network  application
protocols right now).  Rather it's ascii based (ftp), binary (dns),
includes hashes/encryption (ike), etc.  We should be able to do this in
both the client and server direction.  I'd like to:
- sniff real traffic
- tokenize that traffic into an internal format (for dns: sessionID,
Flags, ProtoID, ProtoID, ProtoID, len, ascii, len, ascii, etc.)
- fuzz with intelligence
    -fuzz the binary nums like ProtoID differently that we fuzz a len or
ascii feild
    -fix up things like the sessionID, len, etc. so they are correct
- monitor target app to fuzz with even more power
    -to perhaps auto increase code coverage, target certain naughty
functions, catch memory access violations, etc.

I'm taking a genetic algorithms class this semester.  I hope to use GA's
to help me do some of the above better.  You can download my tool (GPF)
from http://www.appliedsec.com/developers.html, but I'd wait a couple
weeks.  I've made substantials changes lately (but haven't uploaded it
yet, since it's not quite working) as I work more and more toward the
above goals.  I'll post to this list when I've got something worthy of
downloading. :)

I know it's fairly easy to look at small subsets of traffic manually,
looking for the \x00 and slowly guess-timate where fields begin and end,
what constitute a record, what are static offsets etc, but I'm imagining a
tool that would take in a batch of traffic and work out roughly what's
what,
seeing the big picture.

I'd imagine this tool would run a first check, looking for what might
constitute discrete units of information, (possibly all those bounded by
\x00).

I'd imagine this tool would then look for some of the basic layouts of TLV
protocols (which seem most common IMHO) by working out lengths of what
appear to be strings, and look for those ints before or after. Maybe even
looking for md5 or sha1 hashes that correspond to other data fields. Then
look for repeating byte patterns etc.

Once it understands the structure of a single packet, then compare it over
time with other packets between similar host, looking for which fields are
constant, which ones change randomly (signifying GUID or Message IDs) and
those that only change slightly (perhaps timing fields). This would be
where
the real knowledge would lie, as assumptions made about individual packets
(eg what is really static or dynamic) could be rectified over a larger
data-set.

Then print this out in a way like:

<static header><record 1><length><Unicode content><\x88\x88\x88><record
2><length><COMPUTER_NAME><record 3><CURRENT_TIME><unknown static crud>

Producing an Ethereal protocol definition file at the end would be
icing on
the cake!

I've had a look at:
[1]
http://research.microsoft.com/workshops/sysml/papers/sysml-Gopalratnam.pdf
[2] http://www.ub.utwente.nl/webdocs/ctit/1/000000ef.pdf

But can't seem to find any public code that has attempted to solve the
same
problem.
Has anyone else thought about this, or know of code I should look at?

Rhys



_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave




_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave


Current thread: