Nmap Development mailing list archives

Re: Ipv6 machine learning

From: Daniel Miller <bonsaiviking () gmail com>
Date: Mon, 21 Mar 2016 09:06:58 -0500

Tamim,

Thank you for your interest. In addition to the details that Mathias
provided, I'd like to point out some possible directions that research
could go in this area.

We do already have a logistic regression classifier for IPv6 fingerprints,
and it works well with our small set of classifications (just under 100 of
them), but we are starting to see some limitations. Specifically, the
classifications for Apple Mac OS X are not able to strongly match each
release of OS X separately, but they have lots of cross-matching between
versions. This is despite the fact that one feature in particular can
easily distinguish between versions: if the TCP_WSCALE feature is 5, for
instance, it means OS X 10.10 or 10.11.

Our thought at this point is to have a multi-stage classifier. The first
classifier would distinguish between major families of OS: Windows, OS X,
BSD, VxWorks, or others. The second stage would then determine within that
classification what the specific OS version is. We hope that this would
overcome the challenges of having so many different classifications: our
IPv4 OS fingerprint database is more complete and has over 1200 different
general classifications, and many more granular Fingerprint names.

We are open to other ideas on solving this problem. Last year, apart from
GSoC, Alex Geana worked with Mathias on adding new features to the
classifier and implementing imputation of missing features. The benefit of
the imputation work is hard to determine relative to the effort of
maintaining and running it, so it has not been integrated yet.

We are not machine learning experts, but we have a real-world problem, a
growing training corpus, and lots of domain-specific knowledge of what
works and what doesn't in classifying network stack fingerprints. We are
looking for an applicant with knowledge and experience with machine
learning techniques who can help us:

* choose an approach that works,
* measure the relative benefit of changes to the classifier,
* write code to implement these ideas, and
* clearly communicate the design and operation of his or her code so that
others can maintain and improve it.

Dan

On Fri, Mar 18, 2016 at 5:09 PM, Tamim Addari <tamim.tamim1382 () gmail com>
wrote:

Hi ,
I am Tamim and I am interested in ipv6 machine learning project .
I have the question , does nmap already uses logistic regression to
classify ipv6 ? Cause the page
https://nmap.org/book/osdetect-guess.html#osdetect-guess-ipv6 implies
seems to imply that it is already implemented.If  so then what would be the
project goal? If not so then I was wondering if there is a choice  between
logistic regression , support vector machine , decision trees etc other
techniques.
Thank you

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

Ipv6 machine learning Tamim Addari (Mar 20)
- Re: Ipv6 machine learning Mathias Morbitzer (Mar 20)
- Re: Ipv6 machine learning Daniel Miller (Mar 21)
  - Re: Ipv6 machine learning tamimcsedu19 (Mar 21)
    - Re: Ipv6 machine learning David Fifield (Mar 21)