Nmap Development mailing list archives

Re: IPv6 Hop Limit as feature in FPEngine


From: Alexandru Geana <alex () alegen net>
Date: Mon, 23 Mar 2015 18:15:06 +0100


Thanks for doing these tests. What you describe sounds more complicated
than necessary. What I get from your table is that the calculation
method doesn't matter a whole lot: you get concentrations around a few
values in any case, and the ML should be able to deal with that.

It is true that the hop limits are concentrated around the few default
values, but my intention was to split them in such a way as to maximize
the placement in the correct category. Linking back to the table in my
previous email, I made the following observations:
    1) in the fingerprint database, the hop limits are quite varied as
    seen in the raw column
    2) there are not enough entries with an available scan line such
    that the hop limits can be concentrated to the default values; for
    entries which there is no scan line, guessing is required
    3) even if guessing is used, there are (and will be) some values
    which are placed in the wrong categories (i.e. hop limits equal to
    1 placed in cat. 32, hop limits equal to 65 placed in cat. 128, hop
    limits equal to 260 which should not be possible)
My reasoning was to make such values equal to -1 since they are
erroneous and leave them to imputation (which I am currently focused on).

Furthermore, my strictness with placing hop limits into their rightful
categories was driven by the fact that the hop limit is a categorical
variable. It is treated as though it can only have values from the set
{32, 64, 128, 255}. As such I discarded the vectorization strategies
which belong to colums sl and sl||g from the aforementioned table.

The real test is cross validation. Which one of your calcuation methods
gives the best accuracy from train.py? If a simple technique works just
as well, do that.

The accuracy per method is as follows:
    1) using the value without any processing   66.037
    2) only the scan line                       66.037
    3) scan line or simple guessing             66.415
    4) scan line and simple guessing            66.037
    5) scan line and guessing with error limits 66.037
It seems that there is not enough data at the moment for this feature to
have a big impact, but I was curious as to why strategy #3 has a higher
accuracy. I found out that it all boils down to 6 packets in the group
"Equinox...", print 1 (of 2) which has have the hop limits set to 1 for
all probe responses except NS which is 255. The other print of the same
group has hop limits of responses set to 64 except for NS which is 255
again.

I believe that my reasoning for the smart guessing method is better
suited, even if it more complex than the other ones. The fact that it
places each hop limit in the correct category with a high degree of
accuracy and discard incorrect values (to be later filled in via
imputation) should increase accuracy in the long run.

Don't do that; no need to increase coupling between these different
functions.

Ok, I changed this with the new nmap.diff patch.

Best regards,
Alexandru Geana
alegen.net

Attachment: nmap.diff
Description:

Attachment: ipv6tests.diff
Description:

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread: