Nmap Development mailing list archives

Re: IPv6 fingerprint database imputation of missing values

From: Alexandru Geana <alex () alegen net>
Date: Mon, 7 Sep 2015 12:46:55 +0200

Hello list,

After some discussions on IRC, we decided to check what would make a
suitable value for the novelty threshold, provided the imputation
feature is included into the training stage.

In order to achieve this, I wrote a script which would train a
classifier 90 times (there are 90 OS groups in the training set), each
time excluding one group. Then, for each of the prints in the excluded
group, classification would be performed and the novelty score saved.
By finding the minimum of these scores, the idea was to be able to find
a new value for the threshold.

The results were the following:
*) without imputation, the smallest score is 3.95506283351 obtained by
the second print in group "Equinox CCM4850 ..." (index 14, 0-based).
*) with imputation, the smallest score is 4.68074027082 obtained by the
only print in group "HP OfficeJet 8500 printer" (index 40).

Both of these values are below the current threshold so I decided to
look a bit deeper to find a good value for a new threshold. I made two
histogram plots of the scores. While both plots have a similar mean, the
bins from the imputed scores follow the distribution more closely.

Based on the plot for the imputed scores, I would suggest a new novelty
threshold of 25, which is roughly just before the highest bins start.

I am attaching the script I used to perform the calculations, two python
pickle files with the results and two images of the plots.

Best regards,
Alexandru Geana
alegen.net

On 06/30, Alexandru Geana wrote:

Hello list,

Last time I was busy with finding the right parameters to apply
imputation. Today I am submitting a new set of patches (minor
modifications) and explaining some of my findings.

One of the early issues I discovered was that there is a lot of
variability with fingerprinting and even more with imputation. My
workflow was to generate an imputed feature matrix, train the model on
it, recompile nmap and fire a scan. This was not an optimal approach
since I later found out that scanning the same host in the same
environment twice may yield different results w.r.t. reported accuracy.
As a result, I changed my approach to reusing the same fingerprint(s)
and checking results with the predict.py script.

There was no straightforward way to search for what imputation method to
apply to which sets of features and the adequate number of imputed sets
plus iterations per set. An exhaustive search was too much so instead I
considered the following:

1) What is the value range for a feature and how many different values
can be found? The answer to this question would tell me if I could treat
the feature as either continuous or categorical. Based on this choice, I
would select the imputation method.

2) Post-imputation, what does predict.py print? There were multiple
things which influence my decision here. I discovered that running
predict.py with the same test fingerprint but different imputed feature
matrices, would yield different accuracy values and/or different
reported OS classes. My aim here was to "stabilize" the results, meaning
that if I run imputation 10 times and test the same print, I get 10
rougly similar results back. Each feature is a bit different and after
some educated trial and error, I would find the adequate parameters.

Imputing categorical variables (i.e. TC, HLIM) are easier to stabilize
than imputing continuous variables (i.e. TCP_WINDOW). When imputing the
latter, I can obtain either very bad decreases or impressive increases
in accuracy. While trying to decrease this variability for continuous
variables I tried two things: a) for the purpose of imputation, replace
MISSING with the average of the class values and b) when integrating the
labels into the imputed matrix, instead of having one column with values
ranging from 0 to $no_classes, have $no_classes columns with values of 0
and 1 depending on which class a print belongs to. While these did show
improvements, the overall performance was not satisfactory and the
variability was still too great.

The source of the variability lies with the mice library which performs
the actual imputation. I have not gone through the code itself, but
based on some papers I read (some of them I shared with my previous
email), there is an initialization step which I believe is the cause of
the randomness. I have not had time to go through the mice code and see
exactly where this takes place.

Enough talk, not for some results. I will show the results from applying
the complete imputation process a number of 10 times. This is to show
that the results are generally stable.

1) Fedora VM

Without imputation:
8.17% 19.12 Linux 3.12 - 3.18

With imputation:
63.14% 23.66 Linux 3.12 - 3.18
22.57% 23.66 Linux 3.12 - 3.18
22.48% 23.66 Linux 3.12 - 3.18
22.80% 23.66 Linux 3.12 - 3.18
22.46% 23.66 Linux 3.12 - 3.18
22.45% 23.66 Linux 3.12 - 3.18
22.69% 23.66 Linux 3.12 - 3.18
22.52% 23.66 Linux 3.12 - 3.18
22.64% 23.66 Linux 3.12 - 3.18
22.37% 23.66 Linux 3.12 - 3.18

Minor increase in the novelty factor, but a larger one with regards to
the accuracy. The first entry shows a much higher accuracy level than
the other as a result of the variability, but for the rest it is quite
stable.

2) scanme.nmap.org from a hetzner dedicated

Without imputation:
76.59% 5.58 Linux 3.13 - 3.19

With imputation:
85.50% 15.01 Linux 3.13 - 3.19
85.44% 15.01 Linux 3.13 - 3.19
85.35% 15.01 Linux 3.13 - 3.19
85.38% 15.01 Linux 3.13 - 3.19
86.92% 15.01 Linux 3.13 - 3.19
85.43% 15.01 Linux 3.13 - 3.19
85.47% 15.01 Linux 3.13 - 3.19
85.42% 15.01 Linux 3.13 - 3.19
97.34% 15.01 Linux 3.13 - 3.19
85.43% 15.01 Linux 3.13 - 3.19

This follows the same as the previous result.

3) Windows 8.1 VM

Without imputation:
99.67% 2.96 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview

With imputation:
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.55% 15.27 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.67% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview
99.66% 16.46 Microsoft Windows Vista SP2 or Windows 7 SP1 or Windows Server 2008 R2 SP1 or Windows 8 Consumer Preview

The accuracy is already rather high and only the novelty is slightly
increased.

You can see the results of these tests (e.g. nmap.model files, used
fingerprints) here:
Fedora: https://github.com/alegen/nmap/blob/7ce1a7791fd78ddf858824f2d6412021164ae7d1/ipv6tests/fedora_test.tar.gz
Scanme: https://github.com/alegen/nmap/blob/7ce1a7791fd78ddf858824f2d6412021164ae7d1/ipv6tests/scanme_test.tar.gz
Win 8: https://github.com/alegen/nmap/blob/7ce1a7791fd78ddf858824f2d6412021164ae7d1/ipv6tests/windows_test.tar.gz

Let me know what you think and if you have any further suggestions!

Best regards,
Alexandru Geana
alegen.net

Attachment: novelty_threshold.py
Description:

Attachment: imputed_novelty_score_statistics.pickle
Description:

Attachment: unimputed_novelty_score_statistics.pickle
Description:

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

Re: IPv6 fingerprint database imputation of missing values Alexandru Geana (Sep 07)