Nmap Development mailing list archives
Re: [RFC] Improve NSE HTTP architecture.
From: Fyodor <fyodor () insecure org>
Date: Thu, 16 Jun 2011 17:17:50 -0700
On Tue, Jun 14, 2011 at 02:46:55PM +0100, Djalal Harouni wrote:
We have started to think about this (me and Henri) at the beginning of GSoC, then I wrote this proposal, to discuss and to address the current limitations of the NSE HTTP architecture, and how we can improve it, taking into consideration Nmap and NSE properties.
Hi Djalal, thanks for writing this up. This covers a number of areas, and there are definitely some good ideas here. Regarding your specific text:
Currently there are more than 20 HTTP scripts, most of them are discovery scripts that perform checks/tests in order to identify the HTTP applications. These tests can be incorporated into the http-enum script to reduce the size of the loaded and running code, and to achieve better performance. Of course this will reduce the number of the HTTP scripts, but writing an entire NSE script for a simple check that can be done in 5-10 Lua instructions is not the best solution either.
Reducing the total code size and optimizing performance is indeed very important. But of course we also have to keep user interface factors in mind. Right now, many http discovery scripts such as html-title and http-robots.txt run by default with -A or -sC. If we moved them into http-enum and users had to know about them and specify special arguments, I think that would dramatically reduce usage of the functionality.
This proposal relies on some of the Nmap information that should be exported to NSE scripts: * User specified script categories selection "--script='categories'".
That would be easy to add, but I worry about what scripts would do with the information. For example, suppose we have http-enum do vuln checks if the 'vuln' category was selected. Well, then what if the user just specified script names specifically (which may or may not be in vuln category)? What if user specified --script=all? Maybe rather than try to reimplement the category selection functionality, the script(s) could be made to work with it. For example, if the shared work is done in a library anyway, maybe you could have a small http-enum-vuln script which users could enable by name or category or whatever.
* Nmap --version-intensity: version scan intensity.
This is another one which would be easy to implement, but I'm not sure if it is desirable. If a user specifies a --version-intensity, they probably don't expect it to affect which tests are performed by an http enumeration script. As you note later, a different value like http-enum.intensity could be used for this. We also have timing templates like -T3, -T4, and -T5 which already do affect scan intensity in a more general way.
By looking at these scripts we can see that most of them are discovery scripts, however there are also bruteforce and vulnerability scripts. Some of these scripts can be included in http-fingerprints.lua and used by the http-enum.
To the extent that the combination reduces code size and complexity and can be done without harming the user interface, I'm all for that.
5) Crawler and http-enum: ------------------------- (We assume that there is a crawler).
Yes, we hope to write on this summer.
Note: If caching is available
I do think it should be made available. We don't want to spider the same server over and over again.
then http-enum with its matching code and other HTTP scripts can be in a situation when they will not yield since there are no network operations. A solution in the http-enum matching code (this is the big code) would be to use coroutines and make them yield explicitly.
Have you experienced this problem or is it just speculation? It is probably worth trying to reproduce it (if you haven't already) before spending much time trying to fix it.
its threads. This will depend on the crawler internals design which is not discussed here, but perhaps we can perform context switch between the crawler threads and http-enum threads or other scripts based on the recursion depth level, I mean that crawler threads can signal and yield.
Yeah, the crawler internals design is going to be a big issue. We're hoping to take that on next month though.
So currently we consider that the crawler which is a discovery script and other discovery scripts like http-enum must run in the same dependency level.
For what it is worth, I had been assuming that the crawler would be a library. A script which needs spidering services would activate the library and tell it what information is needed. The spider library would store (probably up to some limit) results so that it may not have to make as many (or even any) requests when the next script asks for similar information.
6) Improve HTTP fingerprints and http-enum: -------------------------------------------
This one seems pretty independent from some of your other suggestions. So, if this is desired, at least it could be implemented at any time. I do agree with you that it is often best to combine many similar http tasks in one script and that there is room to enhance http-enum to do a lot of that. I do think we should try to avoid bloating things such that users need to specify extra arguments to effectively use scripts. At least important/common scripts like http-enum stuff. Required options are more reasonable for obscure/special-purpose scripts.
* http-brute: the design of this script can be improved a lot. If the crawler and http-enum script are running, then a dynamically registred match table by the http-brute script that checks the returned status code and the 'www-authenticate' header field, will be used by the http-enum script, to discover multiple protected paths, which can be saved in the registry by the match misc handler, and later the http-brute script will try to brute force them. So in this situation the http-brute will depend on the http-enum script.
I agree that it would be great for http-brute to be able to use information from enumeration/spidering scripts/libraries. Though of course the user should be able to use it to brute force a specific page instead if desired. There are actually a pretty huge number of existing and planned (on the script ideas page) scripts which could benefit from the spidering system. I'm really looking forward to that.
* http-auth: we have already said that this can be converted into a general match in the http-matchers.lua file. The downside of this is that we will remove this script. If we don't want to remove the script we can modify it to make it register that match dynamically.
Well, a key feature of that script is that it runs by default and includes a piece of information which is quickly and easily determined (whether authentication is required at the root of the given web server). So we wouldn't want to remove this script until we have a way to replicate that behavior, I think. So the combined script would have to run by default, I guess.
* http-date: we can also convert this script to a simple general fingerprint or make the script register the fingerprint dynamically. fingerprint { categories = {'discovery', 'safe'}, probes = {path='/', method='HEAD'}, matches = { status_code = 200, header['date'] = "(.+)", output_handler = function(#header.date_1#) -- parse #header.date_1# end,
Well, besides being default, http-date offers some nice features such as telling the user how much the remote time differs from local time. And we don't win much from eliminating this script since it is only 44 lines long (including documentation and empty lines). I guess deciding when it is better to split or combine scripts is a very tough decision. We faced that last week with Gorjan's ip-geolocation script. At first he combined several geolocation providers into one script, but later split it into five scripts. Which is better? I don't know. Each approach has advantages and drawbacks. I guess a key is to identify the general factors we should use when deciding whether to split or combine scripts. Because if we have some folks busily combining scripts while others are busy splitting them up, we don't make much progress. Thanks for your suggestions! Cheers, Fyodor _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 14)
- Re: [RFC] Improve NSE HTTP architecture. Patrik Karlsson (Jun 15)
- Re: [RFC] Improve NSE HTTP architecture. Ron (Jun 16)
- Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
- Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
- Re: [RFC] Improve NSE HTTP architecture. Ron (Jun 16)
- Re: [RFC] Improve NSE HTTP architecture. Fyodor (Jun 16)
- Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 19)
- Re: [RFC] Improve NSE HTTP architecture. Patrick Donnelly (Jun 20)
- Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 20)
- Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 19)
- Re: [RFC] Improve NSE HTTP architecture. Patrik Karlsson (Jun 15)