Nmap Development mailing list archives
Novelty detection for IPv6 OS detection
From: David Fifield <david () bamsoftware com>
Date: Tue, 17 Jan 2012 17:26:16 -0800
One of the goals behind using a new algorithm for IPv6 OS detection was to be more robust in the face of inexact matches, and not require a proliferation of many fingerprints for the same OS under slightly different configurations and networks. But this could work a little too well, in the sense that the engine would make a guess even when faced with an OS it had never been trained on. (Think of an algorithm trained to distinguish dogs and cats that suddenly sees an octopus--it may give you some answer, but it won't be meaningful.) I've added an additional check to the algorithm to see if an observed fingerprint is similar enough to the other fingerprints in the class that it matches. If it is too dissimilar, you won't get an OS guess, rather a fingerprint to submit. Here is a comment from FPEngine.cc that contains some more technical information. /* Return a measure of how much the given feature vector differs from the other members of the class given by label. This can be thought of as the distance from the given feature vector to the mean of the class in multidimensional space, after scaling. Each dimension is further scaled by the inverse of the sample variance of that feature. This is an approximation of the Mahalanobis distance (https://en.wikipedia.org/wiki/Mahalanobis_distance), which normally uses a full covariance matrix of the features. If we take the features to be pairwise independent (which they are not), then the covariance matrix is just the diagonal matrix containing per-feature variances, leading to the same calculation as is done below. Using only the per-feature variances rather than covariance matrices is to save space; it requires only n entries per class rather than n^2, where n is the length of a feature vector. It happens often that a feature's variance is undefined (because there is only one example in the class) or zero (because there are two identical values for that feature). Both these cases are mapped to zero by train.py, and we handle them the same way: by using a small default variance. This will tend to make small differences count a lot (because we probably want this fingerprint in order to expand the class), while still allowing near-perfect matches to match. */ David Fifield _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Novelty detection for IPv6 OS detection David Fifield (Jan 17)