Nmap Development mailing list archives
Ideas: Nmap Fingerprint Analyzer
From: "Zaid Aiman" <zaidaiman () gmail com>
Date: Tue, 1 Apr 2008 23:28:25 +0800
Abstract ------------ TCP/IP OS fingerprinting is a method to identify a machine's operating system according to its protocol stack implementation. Nmap contains this method as one of its option, which has its own OS fingerprint or signature database for referencing purpose. This database is a set of signature which represents many TCP/IP stack implementations of operating system. Even with an outstanding of signature database, Nmap sometimes fails in detecting the targeted machine's operating system and output a ridiculous result. This happen not because of the insufficient information of database, it is the design of the matching algorithm itself. This project, proposed a new method to analyze the fingerprint, obtained from after performing Nmap OS fingerprint. There are two proposed method to deal with analyzing fingerprint: 1) Statistical field (correlation matrix & principal component analysis) and neural network 2) Calculating distance (Euclidean distance) and neural network The outcome of the project might include both method, depends on how they perform in a real time situation. This gives users freely to choose whichever method is the best. In fact, any matching algorithm can be implement in this. Ideas -------- The intention of this project is to improve the reliability of operating system detection, not just based on Nmap's fingerprint analysis result. Here, I propose two new method to analyze the fingerprint and map it to corresponding neural network. This gives users to confirm the result of Nmap's fingerprint analysis with the project's method. Plus, if the UmitMapper is successfully been integrated in Umit, while browsing through the radial map, users could use this project to analyze what operating system lies beneath the machines in the map. The idea of using statistical field to analyze the fingerprint originally came from one of Hack In The Box Security Conference 2006 Kuala Lumpur (HITB secconf 2006 KL) presentation. Titled "Using Neural Network for OS Detection" by Javier Burroni and Carlos Sarraute. They have successfully integrate their idea into the commercial product called "Core Impact". But the downside of this commercial software is that it is costly for average people to buy it. Rather than purchasing it, I took up the challenge and create fingerprint analyzer based on their method. As for Euclidean distance, it came from one of the Umit developer, who proposed using Euclidean distance to calculate the distance between the targeted machine's fingerprint and database signature and parse it to neural network. Before getting into how the method been implemented, I need to design a universal framework to implement any method. The fingerprint used are Nmap 2nd generation fingerprinting. Once a fingerprint is obtained, it will be translate into an input dimension, ranging from 0-1 and some other numerical values: There are 13 test been conducted by Nmap, and to translate the fingerprint into input dimension: *This format applies to all test, which are SEQ, OPS, WIN, ECN, T1-T7, U1 and IE tests* 1) All field name does not include as one of input dimension, except for OPS/option, which later explain ex: SEQ(SP=C6%GCD=1%...), only SP's value and GCD's value will be translate, 2) Any field's value have hex value should be converted into integer ex: SEQ(SP=C6%GCD=1...) will be [198, 1,...] 3) Any field have more than 1 responds will have a certain size of input according to the numbers of responds *Translation is based on sequence, based on Nmap's os-detect documentation* ex: T1-T7's flags have 7 responds, which are [E,U,A,P,R,S,F] ex: in T1(...F=AS...) will be [0,0,1,0,0,1,0] ex: SEQ(...TI=Z%...), TI have [Z,RD,RI,BI,I,hex value] ex: it will be [1,0,0,0,0,0] 4) For OPS test, the same thing applied in 2) & 3), but with extra features. OPS have 6 main value [L,N,M,W,T,S] which some have sub value of it [M,W,T] *Translation is based on sequence of [L,N,M,W,T,S]* ex: OPS(O1=M5B4ST11NW0%O2=M5B4ST11%...) ex: in O1=M5B4ST11NW0, will be [0,1,1,1460,1,0,1,1,1,1] ex: in O2=M5B4ST11, will be [0,0,1,1460,0,0,1,1,1,1] 5) in T2-T7, there is a field named O or option, the same thing applied in 4) *Translation is based on sequence of [L,N,M,W,T,S]* ex: T2(...O=M400CST11NW7%...) ex: will be [0,1,1,16396,1,7,1,1,1,1] ex: O=LNNT11 and O=T11LNN will be [1,1,0,0,0,0,1,1,1,0] and [1,1,0,0,0,0,1,1,1,0] As for designing neural network topology, there are 3 levels or hierarchy of it. Relevancy -> OS Family -> Corresponding OS version 1) Relevancy: How relevant is the targeted machine running a known operating system? 2) OS Family: Which OS family does the targeted machine tied to? 3) OS Version: Which OS version does the targeted machine tied to? Through relevancy, I could minus out anything that is not within my scope, ex: a router, firewall and etc. With that framework, it is possible to implement any other method beside these two method that I proposed. A) Statistical field and neural network method Before carrying out any mathematical operation, the input dimension from targeted machine will be stacked with other fingerprint, according to which levels it tied to and must be normalize. ex: OS Family, I select Microsoft's Windows, Linux, Solaris, *BSD and Apple's Mac OS. 1) Correlation matrix - By calculating relationship every single of the field of input dimension, I can choose which field is not equal to 0, where 0 means no relationship. - This greatly reduce my input dimension, what left out is the important field. 2) Principal component analysis - This used for calculating a new vector after performing correlation matrix. - The number of output dimension is depends on which level it tied to. - ex: Relevancy have 96 output dimension - ex: Each OS family have different output dimension 3) Neural network - The result of the principal component analysis will parse to neural network directly for final computation - The number of input in input layer, hidden layer and output layer is depends which level of neural network it tied to. - ex: Relevancy have 96 input, 20 hidden and 1 output B) Euclidean distance and neural network method Before carrying out this operation, the input dimension from targeted machine and database will be normalize. 1) Euclidean distance - Calculating the distance between two input dimension, result will be parse to neural network. 2) Neural network - The same thing applies in A).3). The training phase might need lots of data, probably around 10,000 data needed. But these numbers of data is just an assumption, depending on the outcome of the project. The programming language used were Python. There are several dependencies that have been used for implementing this method: A) Statistical field and neural network method 1) Numerical Python (Numpy) - http://numpy.scipy.org/ 2) Modular toolkit for Data Processing (MDP) - http://mdp-toolkit.sourceforge.net/ 3) Python Fast Artificial Neural Network (PyFANN) - http://leenissen.dk/fann/ B) Euclidean distance and neural network method 1) Numerical Python (Numpy) - http://numpy.scipy.org/ 2) Python Fast Artificial Neural Network (PyFANN) - http://leenissen.dk/fann/ There are problems when installing PyFANN in Windows, as it required Visual Studio 2003 to compile the source code to .dll file. When compiling the source code, I have been encounter with numerous bug and decided to drop dependencies for Windows for a while in order to implement, and prove that my idea could actually perform what it supposed to do. The idea is successfully integrate in Linux, using Nmap 1st generation fingerprint. Above all, given the framework, it's possible to integrate any neural network library in the system. This allowed freely to change neural network library easily, despict dependencies problem in certain operating system. Since Umit used Python as their primary programming language, and my idea too used Python, it is possible to integrate Umit as a plug-in. This has been done in d3vscan project (http://d3vscan.sf.net) and the fingerprinter analysis has been successfully integrate as one of its plug-in. Screenshot: http://zaidaiman.meroketkebintang.com/?p=36 Project limitations (for now): 1) Provided that targeted machine does not running firewall 2) Neural network library compatibility with operating system 3) Vast amount of fingerprint needed to train the neural network _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- Ideas: Nmap Fingerprint Analyzer Zaid Aiman (Apr 01)