Nmap Development mailing list archives

Re: Defeating a Nmap decoy scan using statistics


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Mon, 21 Jul 2008 22:20:04 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Comments inline.

On Sun, 20 Jul 2008 17:22:40 -0500
Kris Katterjohn <katterjohn () gmail com> wrote:


Brandon Enright wrote:
Hi Kris.  I don't have time for a detailed response right now
because I'm hacking some book chapters but I'll send you a much
more detailed response when the book deadline isn't so pressing.


I've been waiting for a response on -dev, but I can't resist
continuing on a bit more :) Feel free to quote this email and reply
on -dev (I'd prefer it to keep the discussion open).

Sorry about not sending a response out earlier.  I prefer open
discussion too so thanks for this note.


The main point to having a persistent offset that is unique for each
host is that each host will average out to a different TTL.


But if you're just adding a value to it, does it really need to be
persistent? It seems like adding a different random value every time
has more of an effect on the outcome (but see by concern below about
both of these options).

Okay, ignoring Fyodor's excellent insight on this for a moment, the
fundamental problem is that if you are able to determine the real
hop-distance of the attacker then you can figure out which decoy is the
real attacker.

Okay so how do we prevent one from learning the hop distance?

We have discussed 2 ways to do that:

1) Add a random value to the TTL to each outgoing packet you send.
Nmap currently does this.

2) Add a random value to *all* TTLs that persist for the entire scan
duration.

We know we can "factor out" randomness by collecting enough data and
then averaging.  So the question then becomes how much data do we have
to collect before an average produces reasonably accurate results?

For "1" we are adding a random value to the TTL for every packet that
is sent.  If you use more than a few decoy hosts or scan more than a
few ports you should the victim should have _a_lot_ of data to average
out to determine the real hop distance.

For "2" the randomness was picked at the start of the scan.  That means
that it doesn't matter if you scan 10 ports or 10,000 ports, you aren't
providing more useful data to average with.  To factor out this
randomness the attacker has to scan you multiple times before you can
start to average out the randomness.

To recap: with technique "1" every probe helps you average.  With
technique "2" every whole scan helps you average.


That is, without the offset, if you are 5 hops away the average TTL
for all the hosts comes to 43.  If you have a different offset for
each host, the average for one host may be 45 while another host
could be 51.

Because each host has a different average, you know know how far the
attacker is away because you don't know the expected original
average TTL.


I understand the point; however, the concern I expressed here:

Does adding another number from a uniform range really affect the
ability to find the attacker in the decoys?  You'll have to forgive
me for my lack of knowledge in this area, but it seems like you're
just making a bigger range to work with: instead of [37, 59], you
have [37, 74].  Is this not the case?


hasn't subsided.  If you're just adding another range (0-15 or not),
does that not just expand the possible range of TTLs and thus still
allow for averaging? Added separately or not, the overall "pool" of
values still seems susceptible to this (in my mind).

So for this code:

For each decoy, we generate a decoy-specific TTL offset.  We then
change the TTL generation code to look like this:

/* Time to live */
if (ttl == -1) {
  myttl = (get_random_uint() % 23) + 37 + decoy_offset;
} else {
  myttl = ttl;
}

Does the outcome not come out to simply a larger range to choose from?

(37->59):
|-----------------------|

(0->15):
|---------------|

(37->59) + (0->15)
|--------------------------------------|

The same averaging operations hold true for the larger range as for
the smaller (original) one, doesn't it?


Actually, even a non-uniform distribution (Gaussian, Poisson, etc) will
allow factoring out the randomness given enough data.

The point is that one must collect enough data before they can factor
out the distribution.

So we have two ranges, 37->59 + 0->15

That isn't directly 37->64 even though it may seem like it at first.
It would look more like this:

         <range 1>               <range2>
|--------------------------| + |----------|


Where range1 is determined randomly with _each_packet_ and range2 is
determined randomly for _each_scan_.  You can average out both but not
at the same time.  Each scan will allow you to fully average out range1
while it will take several scans before you can average out range2.



You're response though brought up a good point.  The per-host range
should be increased from 0-15 to give it more variance for small
samples.

More later or hit me up on AIM.

Brandon


Thanks,
Kris Katterjohn


Good discussion.  My proposal is somewhat more complicated than it
really needs to be though.  Fyodor had a few great points and a simpler
way to fix this problem with decoys.  I'll respond to his note in a
moment.

Brandon


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkiFC5oACgkQqaGPzAsl94J/BwCeKD087wggk+uKR3trbTHGwQn+
OikAoJ8OR8AFhMZtvsMIZKGWxLDcxMct
=p+uq
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: