Nmap Development mailing list archives

Re: GSoC IPv6 Machine Learning


From: David Fifield <david () bamsoftware com>
Date: Fri, 18 Mar 2016 14:09:26 -0700

On Fri, Mar 18, 2016 at 08:45:09PM +0000, João Godinho wrote:
Good evening,

I'm interested in applying for GSoC, specifically for the Machine Learning
IPv6 OS detection and I was wondering if I can get more information about
the task at hand, as well as share my thoughts on it.

The way IPv6 OS detection is implemented (as seen in
https://nmap.org/book/osdetect-guess.html#osdetect-guess-ipv6) seems pretty
straightforward, but I haven't seen information on how well the model fits
the data, is there any information relative to this?
About the data itself, how large is the current set? Is it easy to generate
new data? How were the features selected? This might be a good starting
point for the project itself.

For feature selection and more technical information, there's a paper
from 2015:
https://www.bamsoftware.com/papers/ipv6-os.pdf

The data set (checking now) is a text file of 6,000 lines and 500 KB.
There are 301 samples in 96 classes.

The training data have been moved into a private part of the SVN
repository, so unfortunately it's not easy to access them. There's a
slightly old version here:
https://svn.nmap.org/nmap-exp/luis/ipv6tests/?p=34606
The main data file is nmap.groups. There's a README and there's more
information on running the programs here:
https://secwiki.org/w/Nmap/IPv6_OS_Integration

I thought I had a script somewhere for converting the nmap.groups file
to ARFF, for easier experimentation with standard ML tools, but I can't
find it. Anyway, the vectorize.py program produces a feature vector for
a single training sample and it shouldn't be too hard to adapt to other
output formats.
_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread: