Nmap Development mailing list archives

RE: nmap+V

From: "Jay Freeman \(saurik\)" <saurik () saurik com>
Date: Sun, 27 Aug 2000 00:52:10 -0500
I was going to reply to a bunch of these messages individually, but decided
to just build one large message (as some of my comments crossed different
posts and I couldn't figure out how to organize it best).  BTW, quick note
about my e-mails: if you want to reply to some comment made in a paragraph,
make sure I didn't retract said comment in a later paragraph.  I find it
much more informative to go through my entire thought process as opposed to
laying down exactly what I want to say upfront and just making points out of
it.  Especially since I'm thinking up a lot of new stuff as I go along that
I might not have taken into consideration earlier in the message, and then
the fact that I kept forgetting many of the comments that were made and then
went back and skimmed over the thread again... this e-mail is just a mess :)
(but I hope extremely useful).

First off, about getting nmap+V onto new versions of nmap (or just plain
newer, better versions of nmap+V): been busy :(.  I'm a college student who
has been trying to start a business this summer, and have been rather busy
working on code for our prototypes, working on business plans, etc.  The
process seems like it will be coming to an end soon, hopefully with a
positive result.  Positive or negative, I should have some more time.  Going
to work at an office every day and then coming home later will at least
define a separation between what is work and what isn't work, and I will
have much more time to work on stuff like this within the next month.  If
this all falls through, at college I had more time to work on things,
period.

My time is already freeing up, and what I will likely do is patch up the
latest version of nmap+V with a few new version detections and bring it to
the latest beta (pretty much what Max has already done, but I believe I made
a few minor changes since I wrote 2.1 to the CVS server that I never
released, and of course the new versions file).  I'm using CVS as my
development platform, and have Fyoder as a vender import, so it shouldn't be
difficult for me to import the latest versions of nmap and have CVS do most
of the work.  I've mainly been putting it off as I hadn't released nmap had
changed enough internally to break my existing patches to the point of
totally not working :).  Maybe I will see what Max was mentioning with the
SMTP code in the process.

Now on to the meat of the message: I would like to address is Paul's
comments on "banner scanning".  I was actually working on that a while back,
but was trying to solve some more inherent problems in nmap+V before
releasing it.  Version 2.1 (which I never announced on the list) has that
functionality through a new set of options I added for "extra" information.
One thing I do every now and then is take an entire subnet block, scan it
for open 80's, and then run a little TCL script I wrote which yanks out the
<title/> tag from the document.  This normally happens when I need to update
the reverse DNS for my subnet (as the people who are building web pages and
allocating IP addresses never bother telling me what web pages are where,
and even move them around... *grumble*).  I wanted to add something like
this to nmap+V, but didn't want to clutter the output coming from the
server.

I was going to make it another switch, but finally decided on a second V,
so -sVV (or I believe -sV -sV) will not only get the version information
from the server, but also return some extra information.  2.1 had the
following 2 "extra" extensions:

HTTP: <title/> tag of document
IRC: network identifier (such as EfNet)

I liked the idea, but I didn't really like the end result.  I finally
decided to format it as extra lines that come after the port's information
line... and, as I have mentioned in the past (and to some people outside the
list), I didn't implement any of this for machine readable scans, as I have
no idea how I would do that without breaking existing parsers.  I would
likely have to escape out things like "/" and ",", and then I can't be
guaranteed that people didn't hardcode in the number of fields, etc.  As I
don't parse that output myself for anything I don't know it well, or what
the documentation claims is fixed or not, and I have never used any of the
nmap wrappers such as nmap-web which use it, either.

What I really wanted was an XML format, so I started working on that (this
is before nmap-dev was even around, or the thread had started here).  I
noted that Fyoder had mentioned in a couple places that he was contemplating
it, so started working on it myself.  It would have been very easy, but I
refused to just sit down and add it as an output to the existing code base.
I agreed with the messages in this thread that nmap isn't modular enough,
and that adding these new features is messing too much with nmap's core.
The example I especially like to bring up is the Cisco scan that was sent to
nmap-hackers a while back.  I usually end up dealing with Cisco routers, and
this definitely would be useful for me, but I also agree with Fyoder that
adding a lot of these specific scans is going to take the current nmap to
the point of total chaos.

A consideration here is what would run these XML documents.  I wanted to use
Apache's Xerces-C library.  The power and flexibility of a complete XML DOM
container seemed extremely worthwhile to having to require another library
(as some dinky little nmap-xml parser likely wouldn't support XML quite
right, or not have the richness of syntax DOM has for getting access to the
existing data).  Comments are definitely needed here....

The first thing I did was port nmap over to g++.  I patched up a version of
nmap that compiles using both gcc and g++ on Linux, FreeBSD, and Solaris (my
common test platforms, as I have accounts on these machines with permission
to run amok on my own dinky test server I scan all the time) and sent that
off to Fyoder.  This way nmap could slowly be molded into a C++ platform
with objects for different scans.  I finally decided that this didn't solve
any of the inherent problems, but it _WAS_ a starting point.  I finally
decided to rewrite nmap from scratch entirely in C++ using modules from the
ground up.  This is where I was about 2 months ago.

Reading the messages on the mailing list (hadn't been paying attention to
it, good thing Fyoder sent me an e-mail or I likely never would have checked
this folder) I was extremely surprised that people were talking about
modularizing nmap.  I'm reading it and thinking "WOW!!!! I'm already doing
that :) !".  When I stopped working on nmap+V and moved to the new version
of nmap I decided an entirely new design was required, and started coding
it.

What I have is a working module loader (source code modules, separate source
files but not separate binary files... but I built it in such a way that I
hope it could be changed later) that allows modules to load up and register
scan types and miscellaneous option flags with the main binary, which then
parses the arguments and sends the information off to the modules.  I have
that much already.  My architecture was designed to make it as easy as
possible to build scans, used STL whenever possible to make the
manipulations easier, and had a global reference counting memory service
which let's you ask for memory regions, request memory regions, etc.

What I was envisioning was that a version scan could be written as a scan
module that would "require" a port scan.  Each scan type would "require",
"request", or "provide" information in this fashion.  I wanted nmap to then
take the option arguments for scans, run them through the system, have the
modules activate the scans and tell nmap what it needed, and nmap would sort
it.  I have the loader, the parser, and the executer; I don't have the
sorter (so far been making sure my arguments are in the right order on the
command line to make sure the commands run in the right order).

<outofplace>
I was thinking that the current options that are generate lists of ports and
lists of hosts to scan could be separated into modules that add to a global
resource area that was created by the scan subsystem, etc.  I added this
paragraph here after I mentioned that I support sub-modules, so just keep
this in mind and continue the trek forward... (man, this e-mail is long).
</outofplace>

I'm using the global memory allocation system to define what things look
like.  You can ask for memory regions from this system by name, so another
module could provide the same information and you wouldn't realize that your
port scan was coming from a file or some other such device.

I then decided I needed a hook to get people using this new system when I
finished it, so I started a process of porting it to NT as no one had a good
version of nmap that worked on NT.  I didn't spend much time on this, as
pretty much at that point I got side tracked and started working on other
things.  I got back into working on all of this a couple weeks ago for a few
days, and made some modifications, but then eEye released their NT port of
nmap, which killed the project for the summer.

During the 2 or 3 days I was thinking about it and setting up the required
libraries I decided the modules really needed to pass around the information
in XML.  I had a good reason at the time; but I don't remember what it was.
Regardless, thinking about it right now, the memory loader really needs to
just be an XML document vector.  Scans would then request XML DOM's from the
memory loader, which it then can request by name later (same as my current
system in that respect).  That way a new scan module can support old modules
requiring "Scan1.0" and also provide "Scan1.1" by adding new XML fields.  On
second thought, you could just do this in the way most API sets do this by
having a size/version field and then tacking new information to the end of
the struct, which then doesn't break the existing modules... especially
considering right now they are all source modules compiled into a single
binary.

I also believe strongly in modularizing the version scan so that they don't
clutter each other up, but the big problem I had with this is that one of
the requirements Fyoder laid down both to me and more recently to the
mailing list in general is that he is looking for something that can go
parallel well.  I haven't yet come across any way to separate out the
version scan elements totally from each other, have an easily readable file
which doesn't feel like it is linking the scans together, _and_ make it
parallel without threads.  I was thinking TCL for a while to give powerful
logic (which would be really useful in some really complex protocols), but
Fyoder wasn't exciting about requiring TCL for nmap to run (and neither was
I, actually).

With that in mind, I'd like more feedback/discussion on the actual format of
the nmap-versions file.  I implemented a really simple parser for it simply
because I didn't want to spend _that_ much time on the parser as I wasn't
even sure I could get it working in nmap to begin with.  I agree with Fyoder
that it looks like line noise.  I wrote a parser for a simple language that
looked like it had functions a while back (which allowed for recursive
function execution) that I could merge into there, but this just lead me
back to the whole "maybe just use TCL?" problem.  It almost seemed easier to
make a VIM syntax file and do highlighting on the existing format (which I
haven't done, so don't ask for it :) ) than redo the parser to emulate an
actual language.  If it's just easier on the eyes people are looking for,
then it isn't difficult at all for me to change those funny symbols into
words, maybe add some ","'s, and come up with a BASIC like syntax:

SetExtra 1, "<Title>", "<[tT][iI][tT][lL][eE]>(.*)</[tT][iI][tT][lL][eE]>"

Instead of:

& 1 {<Title>} <[tT][iI][tT][lL][eE]>(.*)</[tT][iI][tT][lL][eE]>

What I really wanted to do is build these as sub-modules.  My module loader
either already has or was really close to having (I forgot which, I am 99%
sure I already wrote this and tested it however, so I'm going to say
"already has") a pluggable module loader (I got a little carried away).  So
not only can the main application load up the module loader and have modules
request and provide stuff, but the actual modules can as well.  I was
planning on using this in two main areas at first:  OS detection and version
scan.

The current OS fingerprinting would then be a module of the OS scan module,
which would provide "fingerprint", and require nothing.  Depending on what
its result was, maybe another module would want to go forward and continue,
so it could require "fingerprint" and do something based on that.  The one
thing I don't think I put in there already is a good way for the individual
OS scan modules to provide something back up to the main layer, but I wasn't
really planning on that happening anyway (as the OS scan really ends up
returning what OS the computer is using, regardless of methods required, and
then the other scans can use the OS string if they want to do something with
that).  Another sub-module that would go here would be the aforementioned
"Cisco scan" that had been mentioned on the list.

This sounds like it would be a great general solution to the problem, and
allow for some _really_ powerful scans (as there could be sub-modules for
doing the version scan that are written very differently and have a large
amount of logic code)... but it isn't parallel.  Fyoder mentioned that the
custom file formats at least could be extended to being parallel using many
of the methods already in nmap for doing the port scans parallel: open some
ports, send some data, wait with select() for all of them, when data comes
in deal with it, and when the connections are done with open new ones in
their place.  Luckily, the two ways of going about it aren't that
disconnected.

One of the sub-modules could be a rather powerful scanner that uses a custom
file format and does most of the work, and then it would need to deal with
separation in its file format.  Version scan information would be nice to
get into the main module, but if it can't be for some reason it could be
separated into different sub-modules.  This may actually be the best
approach for dealing with building a simple file format, _and_ supporting
binary formats.

A quick note about binary protocols:  nmap+V was trying to be really general
and through regex has support for binary information.  People have brought
this up in the past, and maybe I broke something with the ability to escape
hex expressions in my parser (actually, I think I did...), but the file
format itself supports it.  You can scan off the binary data, and send it,
using escaped hex.  For the output, use 0 for the sub-expression to key off
of, and just specify actual information.  It's rather annoying, but I do
believe it's fully capable.  Maybe a command like "!" that just appends to
the end of the existing buffer is needed (that way you don't have to have
all of the information you need at once, so if you need to check 3 binary
flags for different information you don't end up with a couple thousand
different permutations :) ).  I mentioned that to a few people before.

Version 2.1 actually does do what the nmap-web guy had been asking me for.
I added a date conversion command ("d": more of these would be needed, so
you could do integer conversion, stuff like that).  What would be _really_
nice is if you could really verify that even was a semi-valid date, though.
This is where a nice programming language really starts getting handy.  Or
doing what I believe nmap-web does (can't remember) and do some time zone
conversion.  By the time you let a person do all of that you have a rather
powerful programming language with variables and comparisons and you might
as well trade up for an existing supported language (like TCL).

The sub-modules can also thread.  While the one module is doing its select()
calls off a custom file format, there isn't anything that should keep the
Cisco scan from doings its job, as they don't (or at least, I'm currently
assuming for the sake of this point, maybe they would for some reason) rely
on each other.  I was originally looking at just going until someone found a
match, but finally decided that the main OS scan would register some memory
with a variable array (I was throwing classes in the memory subsystem, so
maybe a struct/class with a vector in it or some such... needs more
testing).  The modules would then add their thoughts into the list along
with how confident they were.  Something that was very deterministic (the
Cisco scan seemed like this, but I didn't examine it that much) might be
able to state that without a doubt this is correct, while the OS scan is
only somewhat sure of it.

This isn't quite enough, however.  Next step: make that shared memory area a
struct of a vector that holds smart pointers (oh, and since I couldn't find
a cross-compiler smart pointer in STL I built my own) to a struct which
contains a string with the guess, an integer tolerance (with some special
flag values #define'd somewhere for standard tolerance levels), and then a
vector of strings which contains names of modules that this new module
definitely obsoletes (hehe).  That way if a certain Cisco router could get
some more information out of it (maybe a service patch level of the version,
something like that), the new module could require the Cisco scan module,
make sure that it is that router that was found, then obsolete the old entry
and continue from that point.  In this specific case it isn't that important
as you could just unregister the Cisco scan's guess as it was required to
happen before the new one, anyway; but there might be cases where you want
to override the default modules (or someone else's, no matter how sure it
was about what it was doing) but still want to run parallel to it in another
thread.

This also brings up the point about what kind of information does the
version scan need to get.  I would definitely look for (separately) the name
of the software, the protocol, and the version.  Right now I am combining
the version and the software name internally, but would like to get rid of
that.  A vector of patches/modules would also be useful (which would also
cut out the need for either truncating the full HTTP module response, or
even having to relegate it to some generic "extra information" system).
This would all come out rather nice in a resultant XML document that
described it (assuming that route was taken), and should be rather
extensible.

When I get back to working on nmap more in 2 or 3 weeks, this is probably
what I am going to be concentrating on: building the module loader, and
porting the scans over to the modules.  Will need to learn a few things:
file parsing, regular sockets, regular expressions, and elaborate schemes
for code/memory encapsulation through modules... these are things I am good
at.  I have never done any work with libpcap.  I understand what the packets
look like, and work with tcpdump and hping2 a bunch, so I hope libpcap
doesn't become my sticking point.  As I don't fully understand how much of
the parallel parts of nmap work I might just need to get the architecture on
a public CVS server and get some help with the actual initial modules.  I
don't like the idea of releasing a worthless program (something which is
listed as an exception to the "release early, release often" principal) for
outside modification, but if people are interested in working on it this
will likely become a necessity.

Not sure how much Fyoder is into this idea, either.  I like to keep my hopes
down, so I have worst case scenario set at: "never in a million years, and I
don't want the nmap name associated with it in any way, not even a link to
my website" :).  I don't really expect that response, of course.
Tentatively I have called it nmap++ until I get some notice to rename it
away from nmap, rename it to something related to nmap but not nmap++,
rename it _to nmap_ (a best case scenario if the program starts to work and
generate results as well as nmap does), or to totally stop working on it as
people don't seem to be interested for one reason or another.  I do envision
a problem with requiring a few extra libraries, and a large problem from
some people on the idea of using threads, so I am definitely keeping that
last option on my radar.

As I mentioned, it would be useful to keep a custom file format around
through all of this for version detection for purposes of easy expandability
into at least the simpler protocols.  I don't think anyone wants to have
thousands of modules for all different protocols, and I also don't believe
protocols such as SMTP or FTP currently need that much logic... unless we
started scanning all sorts of information on what is there, but that could
be done in a post processing module:  first the port scan, which leads to
the protocol scan, which is then consumed by an FTP scan which pulls out
what kind of files are on the server (this one doesn't sound that sane, does
it...).

HTTP was a better example, instead of having extra information such as the
<Title/> (or what the company is trying to sell, as was pointed out :)
earlier in the thread) that could be a post-processing module.  So if I
really needed some information like that, I could add the command line
option to load that module and have it rescan all the ports that had a web
server on them for the <Title/>.  This is going to require more server
connections, as holding onto the connections for work such as this is going
to lead to all sorts of problems passing data around, but two things are
working for us here:  hopefully you have permission to do the scan (hehe),
and now that we _know_ it's a web server, we probably won't set off many
bells or whistles connecting to it a second time and getting some more
information.  HTTP, FTP, IRC, SMTP... these servers tend to get hit a lot:
that way, when we connect back in with well formed responses (not randomly
sending data trying to look for an SQL server or something), it will likely
not be any more noticeable than the protocol scan alone.

Well, I've pretty much exhausted all the points I can think of offhand, as
well as most of my energy for the night.  I'm going to end this here for now
:).

Sincerely,
Jay Freeman (saurik)
saurik () saurik com


---------------------------------------------------------------------
For help using this (nmap-dev) mailing list, send a blank email to 
nmap-dev-help () insecure org . List run by ezmlm-idx (www.ezmlm.org).
Current thread:

Re: nmap+V Paul Tod Rieger (Aug 23)
- Re: nmap+V H D Moore (Aug 23)
  - Re: nmap+V Fyodor (Aug 24)
  - Re: nmap+V Ryan Permeh (Aug 24)
- Re: nmap+V Fyodor (Aug 24)
- RE: nmap+V Jay Freeman (saurik) (Aug 26)
  - nmap output & processing modules H D Moore (Aug 27)
- <Possible follow-ups>
- Re: nmap+V Paul Tod Rieger (Aug 24)