Nmap Development mailing list archives

Re: Web App Scanner - GSoC 2009

From: João <3rd.box () gmail com>
Date: Tue, 31 Mar 2009 12:45:21 -0300

Thanks again for the feedback. I'm sure that all this discussion will
help me a lot in writting the proposal and developing it.

On Mon, Mar 30, 2009 at 3:52 AM, Fyodor <fyodor () insecure org> wrote:

On Sat, Mar 28, 2009 at 02:15:17AM -0300, João wrote:

The idea is developing a Web app scanner. Before scanning a host and
finding a web server running on it, it would be very interesting that
you could have a way to discover which applications are running in
this web server.


Hi João!  I agree that could be very useful!  In fact, I wrote this in
my Nmap book:

 "Nmap may also grow in its ability to handle web scanning. When Nmap
 was first developed, different services were often provided as
 separate daemons identified by the port number they listen on. Now,
 many new services simply run over HTTP and are identified by a URL
 path name rather than port number. Scanning for known URL paths is
 similar in many ways to port scanning (and to the SunRPC scanning
 which Nmap has also done for many years). Nmap already does some web
 scanning using the Nmap Scripting Engine (see Chapter 9, Nmap
 Scripting Engine), but it would be faster and more efficient if
 basic support was built into Nmap itself."
 --http://nmap.org/book/history-future.html

I mean, we could scan for installations of wordpress,
php-myadmin, wikis, web-repos, webmin, OSSIM server, webmail services,
and many other applications.


Yep!

I really would appreciate some feedback.


I think it is a very promising idea.  If possible, it should be done
as an NSE module which scripts can call.  If desired performance
levels can't be reached in LUA for some reason, it could be made into
an NSE C module.  And if even that isn't performant enough, I suppose
it could be built into Nmap (like RPC scan and the like).  But the
first two options are preferred for maintainability reasons.


Right.


Also, there may be many NSE scripts which want to see the results of
spidering a web site.  There should probably be a way for each script
to either see all the pages as they are downloaded, or perhaps the web
tree should be saved somewhere (in the filesystem or a database or
maybe a limited tree in RAM) so that scripts can look through them.

The problem about using RAM is the one mentioned... it sometimes won't
be enough for bigger tests. Using databases is a nice solution, but it
introduces dependencies that could be avoided. Supporting databases is
a good option, but shouldn't be the only one. Using files seems to be
the most simple and obvious solution. Maybe it won't  be as efficient
as using ram or databases, but considering the usual operations
performed by the scripts on the files, I think it won't be a problem.

Another good point of saving the results in files or database is
having the data in persistent memory. This can be helpful to avoid
downloading everything again in future analysis.

Some things which need to be considered and/or discussed:

o Performance is obviously critical.  Think about how you can make the
 best use of web technology such as pipelining, keepalive, head
 requests when you don't need page content, etc.  At the same time,
 it should be configurable so you can say things like "don't request
 more than 1 URI per second" in order to avoid floodign a web server.

I'm not used yet to NSE scripting. For this reason I don't  know yet
how deep I can get only using NSE. If we can't use such resources
(eg.: if we can't make multiple requests for pipelining) from the NSE
scripts, spending sometime in writting such support is going to be
very important.

Some basic stuff will also be good to improve performance, such as
avoiding image files downloads.

o Some scripts only check for URLs (such as app names), whereas others
 such as SQL injection detection scripts need the page content.  You
 need to think about how to handle that.  Maybe the library can have
 a list of interested scripts and pages to each of them when they are
 retrieved.  Or maybe you need to store the whole web tree (with some
 limits so you don't download gigabytes by default or get stuck in
 infinite loops with dynamically generated pages)

o You sent your ideas to the Umit list too.  Note that if it is done
 in Nmap/NSE, Umit can make use of the functionality too.


I'm sure about it. Someway, both proposals that were originally the
same (nmap and umit) are getting very different while I research and
discuss it. My objective is developing the best work I can during GSoC
(and after it), but for that I need to consider every option and the
chances of being accepted.


o We already have some web code available (the url module).  It
 probably makes sense to enhance this rather than start over.


Yes. I'm already trying it out. Hope I can submit some code using it
before GSoC application deadline.


o High speed web authentication brute force cracking might be welcome
 too.


Groovy =].


o I love the idea of fingerprinting web applications, though you'll
 need to think about how to collect the DB of URLs, how to deal with
 URLs which can be relocated (in a different directory, say), how to
 deal with determining the right name for the "host" header, etc.

Yes. The basic idea is having external files with a list of web
applications profiles and the usual paths/subdmains used. This file
can be easily managed, and we can develop a simple script for easy
updates.

We'll need to deal with the different directories issue someway.

First of all we can test some things that are not default, but are
usual (such as placing phpmyadmin at phpmyadmin.domain.org instead of
domain.org/phpmyadmin). Discovering what is usual will certainly be a
problem at first, but with time we can improve this list.

We can try to collect information from the downloaded files, like on
the basic spidering, and check this informations (mainly links) for
known files that could lead to a web app identification. Maybe
checking files contents and not only its name or path could be
helpful, but we would need a huge database of files.

We should also try bruteforcing paths and files. Maybe a little bit o
google can also be useful.

o I'm wondering if it might make sense to apply as some sort of NSE
 Web maven.  Then you could have a list of concrete web-related NSE
 goals.  These might take the form of improvements to the NSE
 infrastructure, module updates, and/or new scripts.

Hm. Sounds interesting. Having a list of goals is important for
scheduling things right. If possible, could you provide more details?

Cheers,
-F


Cheers,
João

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread:

Web App Scanner - GSoC 2009 João (Mar 27)
- Re: Web App Scanner - GSoC 2009 Patrick Donnelly (Mar 27)
- Re: Web App Scanner - GSoC 2009 Fyodor (Mar 29)
  - Re: Web App Scanner - GSoC 2009 João (Mar 31)
- <Possible follow-ups>
- Re: Web App Scanner - GSoC 2009 Rob Nicholls (Mar 28)
  - Re: Web App Scanner - GSoC 2009 João (Mar 28)
  - Re: Web App Scanner - GSoC 2009 Fyodor (Mar 30)