Full Disclosure mailing list archives

Re: [botnets] the world of botnets article and wrong numbers


From: Jose Nazario <jose () monkey org>
Date: Thu, 14 Sep 2006 17:16:21 -0400 (EDT)

On Thu, 14 Sep 2006, Dave "No, not that one" Korn wrote:

Can you go into detail about the methodology you're using here?  How do 
you "get to a number" of 15,000 from a number "between 200 and 800"? 
Is this a statistical extrapolation, or are you saying that your 
honeynet gets 200 to 800 unique samples a month, and so does that one 
over there, and that one, and that one.... and they all add up to 15000? 
Do you attempt to correct for variants that are simply re-packed using a 
different compressor, or other trivial changes?  Do you attempt to 
correct for complex polymorphic variants?

my numbers are based on unique MD5 values.

the bulk of those are minor variants on a theme, ie repackaged bots or 
reconfigured bots, maybe a new module thrown in or something. only a small 
handful, maybe a dozen or so, are really new bots every month. very rarely 
do we see new bots or new capabilities added. the last major change was 
the use of the MS06-040 netapi exploit.

the bulk of the bot binaries i see are derivatives of well known families. 
very few new families emerge in any given timeframe, but in the HTTP bot
world, we're starting to see people develop tools and reuse them.

unique bot samples, ~12-15k or higher a month. many independent teams can 
back that ballpark figure up. new bot samples, truly new like i outlined 
above, is far less. about three orders of magnitude less.

by the way, in this day and age the bulk of people do not bother with 
polymorphism. they achieve it not through the classic - and elegant - 
methods of self modifying code but instead by churning out new bots fast 
and furious. same end result, though: confuse the naive, static detection 
tools out thare.

Some kind of explanation for the huge disjunction between these numbers 
and our instinctive ideas about what's possible.  Of course, being 
un-worked-out intuitive estimates, such ideas are of course entirely 
likely to be off the mark, but off the mark by two orders of magnitude? 
Hence the request for more methodological details.

i guess i'm curious about your position, then, and what you're meaning by 
"our instinctive ideas about what's possible".

it sounds like we're on the same page, but you may feel it's hyping the 
problem to talk about new bots based on unique MD5 values. it's not my 
favorite way of thinking about it, but it is easily underscored by a 
real-world fact: many AV vendors fail to detect the same bot source simply 
repackaged or re-configured (ie a new IRC server, everything else the 
same). hence, each new MD5 means a new detection hit for them. so, hype 
has a real-world backing, namely AV detection issues.

________
jose nazario, ph.d.                 jose () monkey org
http://monkey.org/~jose/            http://monkey.org/~jose/secnews.html
                                    http://www.wormblog.com/

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/


Current thread: