Nmap Development mailing list archives

Ideas: Nmap Fingerprint Analyzer


From: "Zaid Aiman" <zaidaiman () gmail com>
Date: Tue, 1 Apr 2008 23:28:25 +0800

Abstract
------------
TCP/IP OS fingerprinting is a method to identify a machine's operating
system according to its protocol stack implementation.
Nmap contains this method as one of its option, which has its own OS
fingerprint or signature database for referencing purpose.
This database is a set of signature which represents many TCP/IP stack
implementations of operating system.

Even with an outstanding of signature database, Nmap sometimes fails
in detecting the targeted machine's operating system and output a
ridiculous result.
This happen not because of the insufficient information of database,
it is the design of the matching algorithm itself.

This project, proposed a new method to analyze the fingerprint,
obtained from after performing Nmap OS fingerprint.
There are two proposed method to deal with analyzing fingerprint:

1) Statistical field (correlation matrix & principal component analysis) and
neural network
2) Calculating distance (Euclidean distance) and neural network


The outcome of the project might include both method, depends on how
they perform in a real time situation.
This gives users freely to choose whichever method is the best.
In fact, any matching algorithm can be implement in this.

Ideas
--------
        The intention of this project is to improve the reliability of
operating system detection,
        not just based on Nmap's fingerprint analysis result.

        Here, I propose two new method to analyze the fingerprint and map it
to corresponding neural network.

        This gives users to confirm the result of Nmap's fingerprint analysis
with the project's method.
        Plus, if the UmitMapper is successfully been integrated in Umit,
while browsing through the radial map,
        users could use this project to analyze what operating system lies
beneath the machines in the map.

        The idea of using statistical field to analyze the fingerprint
originally came from one of
        Hack In The Box Security Conference 2006 Kuala Lumpur (HITB secconf
2006 KL) presentation.
        Titled "Using Neural Network for OS Detection" by Javier Burroni and
Carlos Sarraute.
        They have successfully integrate their idea into the commercial
product called "Core Impact".
        But the downside of this commercial software is that it is costly for
average people to buy it.
        Rather than purchasing it, I took up the challenge and create
fingerprint analyzer based on their
        method.

        As for Euclidean distance, it came from one of the Umit developer,
who proposed using Euclidean distance
        to calculate the distance between the targeted machine's fingerprint
and database signature and parse it to
        neural network.

        Before getting into how the method been implemented, I need to design
a universal
        framework to implement any method.

        The fingerprint used are Nmap 2nd generation fingerprinting.

        Once a fingerprint is obtained, it will be translate into an input
dimension, ranging
        from 0-1 and some other numerical values:

        There are 13 test been conducted by Nmap, and to translate the
fingerprint into input dimension:
                *This format applies to all test, which are SEQ, OPS, WIN, ECN,
T1-T7, U1 and IE tests*

                1) All field name does not include as one of input dimension, except
for OPS/option, which later explain
                        ex: SEQ(SP=C6%GCD=1%...), only SP's value and GCD's value will be translate,

                2) Any field's value have hex value should be converted into integer
                        ex: SEQ(SP=C6%GCD=1...) will be [198, 1,...]

                3) Any field have more than 1 responds will have a certain size of
input according to the numbers of responds
                        *Translation is based on sequence, based on Nmap's os-detect documentation*
                        ex: T1-T7's flags have 7 responds, which are [E,U,A,P,R,S,F]
                        ex: in T1(...F=AS...) will be [0,0,1,0,0,1,0]
                        ex: SEQ(...TI=Z%...), TI have [Z,RD,RI,BI,I,hex value]
                        ex: it will be [1,0,0,0,0,0]

                4) For OPS test, the same thing applied in 2) & 3), but with extra features.
                   OPS have 6 main value [L,N,M,W,T,S] which some have sub value of it [M,W,T]
                        *Translation is based on sequence of [L,N,M,W,T,S]*
                        ex: OPS(O1=M5B4ST11NW0%O2=M5B4ST11%...)
                        ex: in O1=M5B4ST11NW0, will be [0,1,1,1460,1,0,1,1,1,1]
                        ex: in O2=M5B4ST11, will be    [0,0,1,1460,0,0,1,1,1,1]

                5) in T2-T7, there is a field named O or option, the same thing applied in 4)
                        *Translation is based on sequence of [L,N,M,W,T,S]*
                        ex: T2(...O=M400CST11NW7%...)
                        ex: will be [0,1,1,16396,1,7,1,1,1,1]
                        ex: O=LNNT11 and O=T11LNN will be [1,1,0,0,0,0,1,1,1,0] and
[1,1,0,0,0,0,1,1,1,0]

        As for designing neural network topology, there are 3 levels or
hierarchy of it.
                Relevancy -> OS Family -> Corresponding OS version
                1) Relevancy: How relevant is the targeted machine running a known
operating system?
                2) OS Family: Which OS family does the targeted machine tied to?
                3) OS Version: Which OS version does the targeted machine tied to?

                Through relevancy, I could minus out anything that is not within my scope,
                ex: a router, firewall and etc.

        With that framework, it is possible to implement any other method
beside these two method that I proposed.

        A) Statistical field and neural network method
           Before carrying out any mathematical operation, the input
dimension from targeted machine will be stacked with
           other fingerprint, according to which levels it tied to and must
be normalize.
           ex: OS Family, I select Microsoft's Windows, Linux, Solaris, *BSD
and Apple's Mac OS.

                1) Correlation matrix
                        - By calculating relationship every single of the field of input dimension,
                          I can choose which field is not equal to 0, where 0 means no relationship.
                        - This greatly reduce my input dimension, what left out is the
important field.
                2) Principal component analysis
                        - This used for calculating a new vector after performing correlation matrix.
                        - The number of output dimension is depends on which level it tied to.
                        - ex: Relevancy have 96 output dimension
                        - ex: Each OS family have different output dimension
                3) Neural network
                        - The result of the principal component analysis will parse to
neural network directly
                          for final computation
                        - The number of input in input layer, hidden layer and output layer
is depends which
                          level of neural network it tied to.
                        - ex: Relevancy have 96 input, 20 hidden and 1 output
        B) Euclidean distance and neural network method
           Before carrying out this operation, the input dimension from
targeted machine and database will be normalize.

                1) Euclidean distance
                        - Calculating the distance between two input dimension, result will
be parse to neural network.
                2) Neural network
                        - The same thing applies in A).3).

        The training phase might need lots of data, probably around 10,000 data needed.
        But these numbers of data is just an assumption, depending on the
outcome of the project.

        The programming language used were Python.

        There are several dependencies that have been used for implementing
this method:
        A) Statistical field and neural network method
                1) Numerical Python (Numpy) - http://numpy.scipy.org/
                2) Modular toolkit for Data Processing (MDP) -
http://mdp-toolkit.sourceforge.net/
                3) Python Fast Artificial Neural Network (PyFANN) - http://leenissen.dk/fann/
        B) Euclidean distance and neural network method
                1) Numerical Python (Numpy) - http://numpy.scipy.org/
                2) Python Fast Artificial Neural Network (PyFANN) - http://leenissen.dk/fann/

        There are problems when installing PyFANN in Windows, as it required
Visual Studio 2003 to compile
        the source code to .dll file. When compiling the source code, I have
been encounter with numerous bug and
        decided to drop dependencies for Windows for a while in order to
implement, and prove that my idea could actually
        perform what it supposed to do. The idea is successfully integrate in
Linux, using Nmap 1st generation fingerprint.

        Above all, given the framework, it's possible to integrate any neural
network library in the system.
        This allowed freely to change neural network library easily, despict
dependencies problem in certain operating system.

        Since Umit used Python as their primary programming language, and my
idea too used Python, it is possible to integrate
        Umit as a plug-in. This has been done in d3vscan project
(http://d3vscan.sf.net) and the fingerprinter analysis has been
        successfully integrate as one of its plug-in.

        Screenshot: http://zaidaiman.meroketkebintang.com/?p=36

        Project limitations (for now):
                1) Provided that targeted machine does not running firewall
                2) Neural network library compatibility with operating system
                3) Vast amount of fingerprint needed to train the neural network

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: