Nmap Development mailing list archives
RE: nmap+V
From: "Jay Freeman \(saurik\)" <saurik () saurik com>
Date: Sun, 27 Aug 2000 00:52:10 -0500
I was going to reply to a bunch of these messages individually, but decided to just build one large message (as some of my comments crossed different posts and I couldn't figure out how to organize it best). BTW, quick note about my e-mails: if you want to reply to some comment made in a paragraph, make sure I didn't retract said comment in a later paragraph. I find it much more informative to go through my entire thought process as opposed to laying down exactly what I want to say upfront and just making points out of it. Especially since I'm thinking up a lot of new stuff as I go along that I might not have taken into consideration earlier in the message, and then the fact that I kept forgetting many of the comments that were made and then went back and skimmed over the thread again... this e-mail is just a mess :) (but I hope extremely useful). First off, about getting nmap+V onto new versions of nmap (or just plain newer, better versions of nmap+V): been busy :(. I'm a college student who has been trying to start a business this summer, and have been rather busy working on code for our prototypes, working on business plans, etc. The process seems like it will be coming to an end soon, hopefully with a positive result. Positive or negative, I should have some more time. Going to work at an office every day and then coming home later will at least define a separation between what is work and what isn't work, and I will have much more time to work on stuff like this within the next month. If this all falls through, at college I had more time to work on things, period. My time is already freeing up, and what I will likely do is patch up the latest version of nmap+V with a few new version detections and bring it to the latest beta (pretty much what Max has already done, but I believe I made a few minor changes since I wrote 2.1 to the CVS server that I never released, and of course the new versions file). I'm using CVS as my development platform, and have Fyoder as a vender import, so it shouldn't be difficult for me to import the latest versions of nmap and have CVS do most of the work. I've mainly been putting it off as I hadn't released nmap had changed enough internally to break my existing patches to the point of totally not working :). Maybe I will see what Max was mentioning with the SMTP code in the process. Now on to the meat of the message: I would like to address is Paul's comments on "banner scanning". I was actually working on that a while back, but was trying to solve some more inherent problems in nmap+V before releasing it. Version 2.1 (which I never announced on the list) has that functionality through a new set of options I added for "extra" information. One thing I do every now and then is take an entire subnet block, scan it for open 80's, and then run a little TCL script I wrote which yanks out the <title/> tag from the document. This normally happens when I need to update the reverse DNS for my subnet (as the people who are building web pages and allocating IP addresses never bother telling me what web pages are where, and even move them around... *grumble*). I wanted to add something like this to nmap+V, but didn't want to clutter the output coming from the server. I was going to make it another switch, but finally decided on a second V, so -sVV (or I believe -sV -sV) will not only get the version information from the server, but also return some extra information. 2.1 had the following 2 "extra" extensions: HTTP: <title/> tag of document IRC: network identifier (such as EfNet) I liked the idea, but I didn't really like the end result. I finally decided to format it as extra lines that come after the port's information line... and, as I have mentioned in the past (and to some people outside the list), I didn't implement any of this for machine readable scans, as I have no idea how I would do that without breaking existing parsers. I would likely have to escape out things like "/" and ",", and then I can't be guaranteed that people didn't hardcode in the number of fields, etc. As I don't parse that output myself for anything I don't know it well, or what the documentation claims is fixed or not, and I have never used any of the nmap wrappers such as nmap-web which use it, either. What I really wanted was an XML format, so I started working on that (this is before nmap-dev was even around, or the thread had started here). I noted that Fyoder had mentioned in a couple places that he was contemplating it, so started working on it myself. It would have been very easy, but I refused to just sit down and add it as an output to the existing code base. I agreed with the messages in this thread that nmap isn't modular enough, and that adding these new features is messing too much with nmap's core. The example I especially like to bring up is the Cisco scan that was sent to nmap-hackers a while back. I usually end up dealing with Cisco routers, and this definitely would be useful for me, but I also agree with Fyoder that adding a lot of these specific scans is going to take the current nmap to the point of total chaos. A consideration here is what would run these XML documents. I wanted to use Apache's Xerces-C library. The power and flexibility of a complete XML DOM container seemed extremely worthwhile to having to require another library (as some dinky little nmap-xml parser likely wouldn't support XML quite right, or not have the richness of syntax DOM has for getting access to the existing data). Comments are definitely needed here.... The first thing I did was port nmap over to g++. I patched up a version of nmap that compiles using both gcc and g++ on Linux, FreeBSD, and Solaris (my common test platforms, as I have accounts on these machines with permission to run amok on my own dinky test server I scan all the time) and sent that off to Fyoder. This way nmap could slowly be molded into a C++ platform with objects for different scans. I finally decided that this didn't solve any of the inherent problems, but it _WAS_ a starting point. I finally decided to rewrite nmap from scratch entirely in C++ using modules from the ground up. This is where I was about 2 months ago. Reading the messages on the mailing list (hadn't been paying attention to it, good thing Fyoder sent me an e-mail or I likely never would have checked this folder) I was extremely surprised that people were talking about modularizing nmap. I'm reading it and thinking "WOW!!!! I'm already doing that :) !". When I stopped working on nmap+V and moved to the new version of nmap I decided an entirely new design was required, and started coding it. What I have is a working module loader (source code modules, separate source files but not separate binary files... but I built it in such a way that I hope it could be changed later) that allows modules to load up and register scan types and miscellaneous option flags with the main binary, which then parses the arguments and sends the information off to the modules. I have that much already. My architecture was designed to make it as easy as possible to build scans, used STL whenever possible to make the manipulations easier, and had a global reference counting memory service which let's you ask for memory regions, request memory regions, etc. What I was envisioning was that a version scan could be written as a scan module that would "require" a port scan. Each scan type would "require", "request", or "provide" information in this fashion. I wanted nmap to then take the option arguments for scans, run them through the system, have the modules activate the scans and tell nmap what it needed, and nmap would sort it. I have the loader, the parser, and the executer; I don't have the sorter (so far been making sure my arguments are in the right order on the command line to make sure the commands run in the right order). <outofplace> I was thinking that the current options that are generate lists of ports and lists of hosts to scan could be separated into modules that add to a global resource area that was created by the scan subsystem, etc. I added this paragraph here after I mentioned that I support sub-modules, so just keep this in mind and continue the trek forward... (man, this e-mail is long). </outofplace> I'm using the global memory allocation system to define what things look like. You can ask for memory regions from this system by name, so another module could provide the same information and you wouldn't realize that your port scan was coming from a file or some other such device. I then decided I needed a hook to get people using this new system when I finished it, so I started a process of porting it to NT as no one had a good version of nmap that worked on NT. I didn't spend much time on this, as pretty much at that point I got side tracked and started working on other things. I got back into working on all of this a couple weeks ago for a few days, and made some modifications, but then eEye released their NT port of nmap, which killed the project for the summer. During the 2 or 3 days I was thinking about it and setting up the required libraries I decided the modules really needed to pass around the information in XML. I had a good reason at the time; but I don't remember what it was. Regardless, thinking about it right now, the memory loader really needs to just be an XML document vector. Scans would then request XML DOM's from the memory loader, which it then can request by name later (same as my current system in that respect). That way a new scan module can support old modules requiring "Scan1.0" and also provide "Scan1.1" by adding new XML fields. On second thought, you could just do this in the way most API sets do this by having a size/version field and then tacking new information to the end of the struct, which then doesn't break the existing modules... especially considering right now they are all source modules compiled into a single binary. I also believe strongly in modularizing the version scan so that they don't clutter each other up, but the big problem I had with this is that one of the requirements Fyoder laid down both to me and more recently to the mailing list in general is that he is looking for something that can go parallel well. I haven't yet come across any way to separate out the version scan elements totally from each other, have an easily readable file which doesn't feel like it is linking the scans together, _and_ make it parallel without threads. I was thinking TCL for a while to give powerful logic (which would be really useful in some really complex protocols), but Fyoder wasn't exciting about requiring TCL for nmap to run (and neither was I, actually). With that in mind, I'd like more feedback/discussion on the actual format of the nmap-versions file. I implemented a really simple parser for it simply because I didn't want to spend _that_ much time on the parser as I wasn't even sure I could get it working in nmap to begin with. I agree with Fyoder that it looks like line noise. I wrote a parser for a simple language that looked like it had functions a while back (which allowed for recursive function execution) that I could merge into there, but this just lead me back to the whole "maybe just use TCL?" problem. It almost seemed easier to make a VIM syntax file and do highlighting on the existing format (which I haven't done, so don't ask for it :) ) than redo the parser to emulate an actual language. If it's just easier on the eyes people are looking for, then it isn't difficult at all for me to change those funny symbols into words, maybe add some ","'s, and come up with a BASIC like syntax: SetExtra 1, "<Title>", "<[tT][iI][tT][lL][eE]>(.*)</[tT][iI][tT][lL][eE]>" Instead of: & 1 {<Title>} <[tT][iI][tT][lL][eE]>(.*)</[tT][iI][tT][lL][eE]> What I really wanted to do is build these as sub-modules. My module loader either already has or was really close to having (I forgot which, I am 99% sure I already wrote this and tested it however, so I'm going to say "already has") a pluggable module loader (I got a little carried away). So not only can the main application load up the module loader and have modules request and provide stuff, but the actual modules can as well. I was planning on using this in two main areas at first: OS detection and version scan. The current OS fingerprinting would then be a module of the OS scan module, which would provide "fingerprint", and require nothing. Depending on what its result was, maybe another module would want to go forward and continue, so it could require "fingerprint" and do something based on that. The one thing I don't think I put in there already is a good way for the individual OS scan modules to provide something back up to the main layer, but I wasn't really planning on that happening anyway (as the OS scan really ends up returning what OS the computer is using, regardless of methods required, and then the other scans can use the OS string if they want to do something with that). Another sub-module that would go here would be the aforementioned "Cisco scan" that had been mentioned on the list. This sounds like it would be a great general solution to the problem, and allow for some _really_ powerful scans (as there could be sub-modules for doing the version scan that are written very differently and have a large amount of logic code)... but it isn't parallel. Fyoder mentioned that the custom file formats at least could be extended to being parallel using many of the methods already in nmap for doing the port scans parallel: open some ports, send some data, wait with select() for all of them, when data comes in deal with it, and when the connections are done with open new ones in their place. Luckily, the two ways of going about it aren't that disconnected. One of the sub-modules could be a rather powerful scanner that uses a custom file format and does most of the work, and then it would need to deal with separation in its file format. Version scan information would be nice to get into the main module, but if it can't be for some reason it could be separated into different sub-modules. This may actually be the best approach for dealing with building a simple file format, _and_ supporting binary formats. A quick note about binary protocols: nmap+V was trying to be really general and through regex has support for binary information. People have brought this up in the past, and maybe I broke something with the ability to escape hex expressions in my parser (actually, I think I did...), but the file format itself supports it. You can scan off the binary data, and send it, using escaped hex. For the output, use 0 for the sub-expression to key off of, and just specify actual information. It's rather annoying, but I do believe it's fully capable. Maybe a command like "!" that just appends to the end of the existing buffer is needed (that way you don't have to have all of the information you need at once, so if you need to check 3 binary flags for different information you don't end up with a couple thousand different permutations :) ). I mentioned that to a few people before. Version 2.1 actually does do what the nmap-web guy had been asking me for. I added a date conversion command ("d": more of these would be needed, so you could do integer conversion, stuff like that). What would be _really_ nice is if you could really verify that even was a semi-valid date, though. This is where a nice programming language really starts getting handy. Or doing what I believe nmap-web does (can't remember) and do some time zone conversion. By the time you let a person do all of that you have a rather powerful programming language with variables and comparisons and you might as well trade up for an existing supported language (like TCL). The sub-modules can also thread. While the one module is doing its select() calls off a custom file format, there isn't anything that should keep the Cisco scan from doings its job, as they don't (or at least, I'm currently assuming for the sake of this point, maybe they would for some reason) rely on each other. I was originally looking at just going until someone found a match, but finally decided that the main OS scan would register some memory with a variable array (I was throwing classes in the memory subsystem, so maybe a struct/class with a vector in it or some such... needs more testing). The modules would then add their thoughts into the list along with how confident they were. Something that was very deterministic (the Cisco scan seemed like this, but I didn't examine it that much) might be able to state that without a doubt this is correct, while the OS scan is only somewhat sure of it. This isn't quite enough, however. Next step: make that shared memory area a struct of a vector that holds smart pointers (oh, and since I couldn't find a cross-compiler smart pointer in STL I built my own) to a struct which contains a string with the guess, an integer tolerance (with some special flag values #define'd somewhere for standard tolerance levels), and then a vector of strings which contains names of modules that this new module definitely obsoletes (hehe). That way if a certain Cisco router could get some more information out of it (maybe a service patch level of the version, something like that), the new module could require the Cisco scan module, make sure that it is that router that was found, then obsolete the old entry and continue from that point. In this specific case it isn't that important as you could just unregister the Cisco scan's guess as it was required to happen before the new one, anyway; but there might be cases where you want to override the default modules (or someone else's, no matter how sure it was about what it was doing) but still want to run parallel to it in another thread. This also brings up the point about what kind of information does the version scan need to get. I would definitely look for (separately) the name of the software, the protocol, and the version. Right now I am combining the version and the software name internally, but would like to get rid of that. A vector of patches/modules would also be useful (which would also cut out the need for either truncating the full HTTP module response, or even having to relegate it to some generic "extra information" system). This would all come out rather nice in a resultant XML document that described it (assuming that route was taken), and should be rather extensible. When I get back to working on nmap more in 2 or 3 weeks, this is probably what I am going to be concentrating on: building the module loader, and porting the scans over to the modules. Will need to learn a few things: file parsing, regular sockets, regular expressions, and elaborate schemes for code/memory encapsulation through modules... these are things I am good at. I have never done any work with libpcap. I understand what the packets look like, and work with tcpdump and hping2 a bunch, so I hope libpcap doesn't become my sticking point. As I don't fully understand how much of the parallel parts of nmap work I might just need to get the architecture on a public CVS server and get some help with the actual initial modules. I don't like the idea of releasing a worthless program (something which is listed as an exception to the "release early, release often" principal) for outside modification, but if people are interested in working on it this will likely become a necessity. Not sure how much Fyoder is into this idea, either. I like to keep my hopes down, so I have worst case scenario set at: "never in a million years, and I don't want the nmap name associated with it in any way, not even a link to my website" :). I don't really expect that response, of course. Tentatively I have called it nmap++ until I get some notice to rename it away from nmap, rename it to something related to nmap but not nmap++, rename it _to nmap_ (a best case scenario if the program starts to work and generate results as well as nmap does), or to totally stop working on it as people don't seem to be interested for one reason or another. I do envision a problem with requiring a few extra libraries, and a large problem from some people on the idea of using threads, so I am definitely keeping that last option on my radar. As I mentioned, it would be useful to keep a custom file format around through all of this for version detection for purposes of easy expandability into at least the simpler protocols. I don't think anyone wants to have thousands of modules for all different protocols, and I also don't believe protocols such as SMTP or FTP currently need that much logic... unless we started scanning all sorts of information on what is there, but that could be done in a post processing module: first the port scan, which leads to the protocol scan, which is then consumed by an FTP scan which pulls out what kind of files are on the server (this one doesn't sound that sane, does it...). HTTP was a better example, instead of having extra information such as the <Title/> (or what the company is trying to sell, as was pointed out :) earlier in the thread) that could be a post-processing module. So if I really needed some information like that, I could add the command line option to load that module and have it rescan all the ports that had a web server on them for the <Title/>. This is going to require more server connections, as holding onto the connections for work such as this is going to lead to all sorts of problems passing data around, but two things are working for us here: hopefully you have permission to do the scan (hehe), and now that we _know_ it's a web server, we probably won't set off many bells or whistles connecting to it a second time and getting some more information. HTTP, FTP, IRC, SMTP... these servers tend to get hit a lot: that way, when we connect back in with well formed responses (not randomly sending data trying to look for an SQL server or something), it will likely not be any more noticeable than the protocol scan alone. Well, I've pretty much exhausted all the points I can think of offhand, as well as most of my energy for the night. I'm going to end this here for now :). Sincerely, Jay Freeman (saurik) saurik () saurik com --------------------------------------------------------------------- For help using this (nmap-dev) mailing list, send a blank email to nmap-dev-help () insecure org . List run by ezmlm-idx (www.ezmlm.org).
Current thread:
- Re: nmap+V Paul Tod Rieger (Aug 23)
- Re: nmap+V H D Moore (Aug 23)
- Re: nmap+V Fyodor (Aug 24)
- Re: nmap+V Ryan Permeh (Aug 24)
- Re: nmap+V Fyodor (Aug 24)
- RE: nmap+V Jay Freeman (saurik) (Aug 26)
- nmap output & processing modules H D Moore (Aug 27)
- <Possible follow-ups>
- Re: nmap+V Paul Tod Rieger (Aug 24)
- Re: nmap+V H D Moore (Aug 23)