Nmap Development mailing list archives

Re: [RFC] Improve NSE HTTP architecture.

From: Fyodor <fyodor () insecure org>
Date: Thu, 16 Jun 2011 17:17:50 -0700

On Tue, Jun 14, 2011 at 02:46:55PM +0100, Djalal Harouni wrote:


We have started to think about this (me and Henri) at the beginning of
GSoC, then I wrote this proposal, to discuss and to address the current
limitations of the NSE HTTP architecture, and how we can improve it,
taking into consideration Nmap and NSE properties.


Hi Djalal, thanks for writing this up.  This covers a number of areas,
and there are definitely some good ideas here.  Regarding your
specific text:

Currently there are more than 20 HTTP scripts, most of them are discovery
scripts that perform checks/tests in order to identify the HTTP
applications. These tests can be incorporated into the http-enum script to
reduce the size of the loaded and running code, and to achieve better
performance. Of course this will reduce the number of the HTTP scripts,
but writing an entire NSE script for a simple check that can be done in
5-10 Lua instructions is not the best solution either.


Reducing the total code size and optimizing performance is indeed very
important.  But of course we also have to keep user interface factors
in mind.  Right now, many http discovery scripts such as html-title
and http-robots.txt run by default with -A or -sC.  If we moved them
into http-enum and users had to know about them and specify special
arguments, I think that would dramatically reduce usage of the
functionality.

This proposal relies on some of the Nmap information that should be
exported to NSE scripts:

* User specified script categories selection "--script='categories'".


That would be easy to add, but I worry about what scripts would do
with the information.  For example, suppose we have http-enum do vuln
checks if the 'vuln' category was selected.  Well, then what if the
user just specified script names specifically (which may or may not be
in vuln category)?  What if user specified --script=all?  Maybe rather
than try to reimplement the category selection functionality, the
script(s) could be made to work with it.  For example, if the shared
work is done in a library anyway, maybe you could have a small
http-enum-vuln script which users could enable by name or category or
whatever.

* Nmap --version-intensity: version scan intensity.


This is another one which would be easy to implement, but I'm not sure
if it is desirable.  If a user specifies a --version-intensity, they
probably don't expect it to affect which tests are performed by an
http enumeration script.  As you note later, a different value like
http-enum.intensity could be used for this.  We also have timing
templates like -T3, -T4, and -T5 which already do affect scan
intensity in a more general way.

By looking at these scripts we can see that most of them are discovery
scripts, however there are also bruteforce and vulnerability scripts.

Some of these scripts can be included in http-fingerprints.lua and used by
the http-enum.


To the extent that the combination reduces code size and complexity
and can be done without harming the user interface, I'm all for that.

5) Crawler and http-enum:
-------------------------
(We assume that there is a crawler).


Yes, we hope to write on this summer.

Note: If caching is available


I do think it should be made available.  We don't want to spider the
same server over and over again.

then http-enum with its matching code and other HTTP scripts can be
in a situation when they will not yield since there are no network
operations.  A solution in the http-enum matching code (this is the
big code) would be to use coroutines and make them yield explicitly.


Have you experienced this problem or is it just speculation?  It is
probably worth trying to reproduce it (if you haven't already) before
spending much time trying to fix it.

its threads. This will depend on the crawler internals design which is
not discussed here, but perhaps we can perform context switch between the
crawler threads and http-enum threads or other scripts based on the
recursion depth level, I mean that crawler threads can signal and yield.


Yeah, the crawler internals design is going to be a big issue.  We're
hoping to take that on next month though.

So currently we consider that the crawler which is a discovery
script and other discovery scripts like http-enum must run in the
same dependency level.


For what it is worth, I had been assuming that the crawler would be a
library.  A script which needs spidering services would activate the
library and tell it what information is needed.  The spider library
would store (probably up to some limit) results so that it may not
have to make as many (or even any) requests when the next script asks
for similar information.

6) Improve HTTP fingerprints and http-enum:
-------------------------------------------


This one seems pretty independent from some of your other suggestions.
So, if this is desired, at least it could be implemented at any time.
I do agree with you that it is often best to combine many similar http
tasks in one script and that there is room to enhance http-enum to do
a lot of that.

I do think we should try to avoid bloating things such that users need
to specify extra arguments to effectively use scripts.  At least
important/common scripts like http-enum stuff.  Required options are
more reasonable for obscure/special-purpose scripts.

* http-brute: the design of this script can be improved a lot.
  If the crawler and http-enum script are running, then a dynamically
  registred match table by the http-brute script that checks the returned
  status code and the 'www-authenticate' header field, will be used by the
  http-enum script, to discover multiple protected paths, which can be saved
  in the registry by the match misc handler, and later the http-brute script
  will try to brute force them.
  So in this situation the http-brute will depend on the http-enum script.


I agree that it would be great for http-brute to be able to use
information from enumeration/spidering scripts/libraries.  Though of
course the user should be able to use it to brute force a specific
page instead if desired.

There are actually a pretty huge number of existing and planned (on
the script ideas page) scripts which could benefit from the spidering
system.  I'm really looking forward to that.

* http-auth: we have already said that this can be converted into a general
  match in the http-matchers.lua file. The downside of this is that we will
  remove this script. If we don't want to remove the script we can modify it
  to make it register that match dynamically.


Well, a key feature of that script is that it runs by default and
includes a piece of information which is quickly and easily determined
(whether authentication is required at the root of the given web
server).  So we wouldn't want to remove this script until we have a
way to replicate that behavior, I think.  So the combined script would
have to run by default, I guess.

* http-date: we can also convert this script to a simple general fingerprint
  or make the script register the fingerprint dynamically.
  fingerprint {
      categories = {'discovery', 'safe'},
      probes = {path='/', method='HEAD'},
      matches = {
          status_code = 200,
          header['date'] = "(.+)",
          output_handler = function(#header.date_1#)
            -- parse #header.date_1#
          end,


Well, besides being default, http-date offers some nice features such
as telling the user how much the remote time differs from local time.
And we don't win much from eliminating this script since it is only 44
lines long (including documentation and empty lines).

I guess deciding when it is better to split or combine scripts is a
very tough decision.  We faced that last week with Gorjan's
ip-geolocation script.  At first he combined several geolocation
providers into one script, but later split it into five scripts.
Which is better?  I don't know.  Each approach has advantages and
drawbacks.  I guess a key is to identify the general factors we should
use when deciding whether to split or combine scripts.  Because if we
have some folks busily combining scripts while others are busy
splitting them up, we don't make much progress.

Thanks for your suggestions!

Cheers,
Fyodor
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread:

[RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 14)
- Re: [RFC] Improve NSE HTTP architecture. Patrik Karlsson (Jun 15)
  - Re: [RFC] Improve NSE HTTP architecture. Ron (Jun 16)
    - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
  - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
- Re: [RFC] Improve NSE HTTP architecture. Fyodor (Jun 16)
  - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 19)
    - Re: [RFC] Improve NSE HTTP architecture. Patrick Donnelly (Jun 20)
    - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 20)