Nmap Development mailing list archives
Re: Web App Scanner - GSoC 2009
From: João <3rd.box () gmail com>
Date: Tue, 31 Mar 2009 12:45:21 -0300
Thanks again for the feedback. I'm sure that all this discussion will help me a lot in writting the proposal and developing it. On Mon, Mar 30, 2009 at 3:52 AM, Fyodor <fyodor () insecure org> wrote:
On Sat, Mar 28, 2009 at 02:15:17AM -0300, João wrote:The idea is developing a Web app scanner. Before scanning a host and finding a web server running on it, it would be very interesting that you could have a way to discover which applications are running in this web server.Hi João! I agree that could be very useful! In fact, I wrote this in my Nmap book: "Nmap may also grow in its ability to handle web scanning. When Nmap was first developed, different services were often provided as separate daemons identified by the port number they listen on. Now, many new services simply run over HTTP and are identified by a URL path name rather than port number. Scanning for known URL paths is similar in many ways to port scanning (and to the SunRPC scanning which Nmap has also done for many years). Nmap already does some web scanning using the Nmap Scripting Engine (see Chapter 9, Nmap Scripting Engine), but it would be faster and more efficient if basic support was built into Nmap itself." --http://nmap.org/book/history-future.htmlI mean, we could scan for installations of wordpress, php-myadmin, wikis, web-repos, webmin, OSSIM server, webmail services, and many other applications.Yep!I really would appreciate some feedback.I think it is a very promising idea. If possible, it should be done as an NSE module which scripts can call. If desired performance levels can't be reached in LUA for some reason, it could be made into an NSE C module. And if even that isn't performant enough, I suppose it could be built into Nmap (like RPC scan and the like). But the first two options are preferred for maintainability reasons.
Right.
Also, there may be many NSE scripts which want to see the results of spidering a web site. There should probably be a way for each script to either see all the pages as they are downloaded, or perhaps the web tree should be saved somewhere (in the filesystem or a database or maybe a limited tree in RAM) so that scripts can look through them.
The problem about using RAM is the one mentioned... it sometimes won't be enough for bigger tests. Using databases is a nice solution, but it introduces dependencies that could be avoided. Supporting databases is a good option, but shouldn't be the only one. Using files seems to be the most simple and obvious solution. Maybe it won't be as efficient as using ram or databases, but considering the usual operations performed by the scripts on the files, I think it won't be a problem. Another good point of saving the results in files or database is having the data in persistent memory. This can be helpful to avoid downloading everything again in future analysis.
Some things which need to be considered and/or discussed: o Performance is obviously critical. Think about how you can make the best use of web technology such as pipelining, keepalive, head requests when you don't need page content, etc. At the same time, it should be configurable so you can say things like "don't request more than 1 URI per second" in order to avoid floodign a web server.
I'm not used yet to NSE scripting. For this reason I don't know yet how deep I can get only using NSE. If we can't use such resources (eg.: if we can't make multiple requests for pipelining) from the NSE scripts, spending sometime in writting such support is going to be very important. Some basic stuff will also be good to improve performance, such as avoiding image files downloads.
o Some scripts only check for URLs (such as app names), whereas others such as SQL injection detection scripts need the page content. You need to think about how to handle that. Maybe the library can have a list of interested scripts and pages to each of them when they are retrieved. Or maybe you need to store the whole web tree (with some limits so you don't download gigabytes by default or get stuck in infinite loops with dynamically generated pages) o You sent your ideas to the Umit list too. Note that if it is done in Nmap/NSE, Umit can make use of the functionality too.
I'm sure about it. Someway, both proposals that were originally the same (nmap and umit) are getting very different while I research and discuss it. My objective is developing the best work I can during GSoC (and after it), but for that I need to consider every option and the chances of being accepted.
o We already have some web code available (the url module). It probably makes sense to enhance this rather than start over.
Yes. I'm already trying it out. Hope I can submit some code using it before GSoC application deadline.
o High speed web authentication brute force cracking might be welcome too.
Groovy =].
o I love the idea of fingerprinting web applications, though you'll need to think about how to collect the DB of URLs, how to deal with URLs which can be relocated (in a different directory, say), how to deal with determining the right name for the "host" header, etc.
Yes. The basic idea is having external files with a list of web applications profiles and the usual paths/subdmains used. This file can be easily managed, and we can develop a simple script for easy updates. We'll need to deal with the different directories issue someway. First of all we can test some things that are not default, but are usual (such as placing phpmyadmin at phpmyadmin.domain.org instead of domain.org/phpmyadmin). Discovering what is usual will certainly be a problem at first, but with time we can improve this list. We can try to collect information from the downloaded files, like on the basic spidering, and check this informations (mainly links) for known files that could lead to a web app identification. Maybe checking files contents and not only its name or path could be helpful, but we would need a huge database of files. We should also try bruteforcing paths and files. Maybe a little bit o google can also be useful.
o I'm wondering if it might make sense to apply as some sort of NSE Web maven. Then you could have a list of concrete web-related NSE goals. These might take the form of improvements to the NSE infrastructure, module updates, and/or new scripts.
Hm. Sounds interesting. Having a list of goals is important for scheduling things right. If possible, could you provide more details?
Cheers, -F
Cheers, João _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- Web App Scanner - GSoC 2009 João (Mar 27)
- Re: Web App Scanner - GSoC 2009 Patrick Donnelly (Mar 27)
- Re: Web App Scanner - GSoC 2009 Fyodor (Mar 29)
- Re: Web App Scanner - GSoC 2009 João (Mar 31)
- <Possible follow-ups>
- Re: Web App Scanner - GSoC 2009 Rob Nicholls (Mar 28)
- Re: Web App Scanner - GSoC 2009 João (Mar 28)
- Re: Web App Scanner - GSoC 2009 Fyodor (Mar 30)