Nmap Development mailing list archives

Re: Thoughts about http-spider.nse


From: Ryan Giobbi <ryan () tgbemail com>
Date: Fri, 22 Oct 2010 09:00:48 -0400

I think it'd be a great feature to allow the user to control where the
spider goes and if it fills out forms.


Something like

--script-args=spider-domain=[0,1,2],spider-forms=[0,1],spider-depth=[0,1,2,3,4]

spider-domain options
0 follow only links that are on the original domain and protocol of
the first request. http://www.foo.com will spider to www.foo.com/foo
but not foo.com or www2.foo.com or https://www.foo.com
1 follow only links that are on the original domain.http://www.foo.com
will spider to www.foo.com/foo or https://www.foo.com
2 follow all links.

spider-forms options
0 don't fill out forms
1 fill out and submit forms

spider-depth
0 don't spider beyond first page
1 spider one link
2 spider two links
3 spider three links
4 spider until out of links


The options for burp pro might be a good reference:
http://portswigger.net/burp/help/spider.html#engine


On Mon, Oct 18, 2010 at 11:22 PM, Ron <ron () skullsecurity net> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey all,

I've been putting some work this week into a spider script. Right now it works with very basic functionality 
(basically finds all the href='..' and src='...' arguments, parses them, and stores them in the registry), but I'm 
hoping to get some comments.

First, at the moment, the script has no output of its own. I think that's a good thing, because of the amount of 
information that people may or may not want to see, having other scripts that display the results might be a better 
plan. This leads us to required dependencies, though -- right now, dependencies only modify the order that requests 
happen, but for this to work we need dependencies that actually turn on other scripts.

Second, I'm making heavy use of the registry to store information. I'm trying to give later scripts as much 
information as possible. So, I'm storing the same data in many different ways to make it easy to find exactly what 
you want. Here are some types of data, and some short examples (mostly from nmap.org). I've attached a full registry 
dump of scanning nmap.org at a depth of 2.

* All pages, and all pages with their full querystring
|       all_pages:
|         1: "/"
|         2: "/shared/css/insecdb.css"
|         3: "/book/man.html"
|         4: "/book/install.html"
|         5: "/download.html"

* All directories, and all files
|       directories:
|         1: "/"
|         2: "/book/"
|         3: "/presentations/BHDC10/"
|         4: "/nsedoc/"
|       files:
|         1: "/shared/css/insecdb.css"
|         2: "/book/man.html"
|         3: "/book/install.html"
|         4: "/download.html"

* All files indexed by extension
|       extensions:
|         html:
|           1: "/book/man.html"
|           2: "/book/install.html"
|           3: "/download.html"
|           4: "/changelog.html"
|           5: "/docs.html"
|           6: "/book/nse.html"
|           7: "/movies.html"
|           8: "/book/man-bugs.html"
|         css:
|           1: "/shared/css/insecdb.css"

* All pages that have arguments, as well as their arguments (can have multiple copies for pages we see linked with 
different arguments)
|       cgi_args:
|         /index.cfm:
|           1:
|             pageID: "12"
|           2:
|             pageID: "13"
|           3:
|             pageID: "249"
|           4:
|             pageID: "1"
|       cgi_querystring:
|         /index.cfm:
|           1: "pageID=12"
|           2: "pageID=13"
|           3: "pageID=249"
|           4: "pageID=1"
|           5: "pageID=2"
|       cgi_full_query:
|         1: "/index.cfm?pageID=12"
|         2: "/index.cfm?pageID=13"
|         3: "/index.cfm?pageID=249"
|         4: "/index.cfm?pageID=1"
|       cgi:
|         1: "/index.cfm"
|         2: "/display.cfm"

* All pages we've seen a specific page link to (or linked from)
|       links_to:
|         /docs.html:
|           1: "/shared/css/insecdb.css"
|           2: "http://g.adspeed.net/ad.php?do=clk&amp;zid=14678&amp;wd=728&amp;ht=90&amp;pair=as";
|           3: "http://nmap.org/";
|           4: "http://nmap.org/book/man.html";
|           5: "http://nmap.org/book/install.html";
|           6: "http://nmap.org/download.html";
|           7: "http://nmap.org/changelog.html";
|         /book/nse.html:
|           1: "/shared/css/insecdb.css"
|           2: "/book/toc.html"
|           3: "/book/osdetect-unidentified.html"
|           4: "/book/nse-usage.html"
|           5: "/book/preface.html"
|           6: "/book/intro.html"

* All pages indexed by content-type
|       content-type:
|         text/html; charset=UTF-8:
|           1: "/"
|           2: "/book/man.html"
|           3: "/book/install.html"
|           4: "/download.html"
|           5: "/changelog.html"
|           6: "/book/"
|         text/css:
|           1: "/shared/css/insecdb.css"
|         text/html; charset=iso-8859-1:
|           1: "/favicon"
|           2: "/data/COPYING"
|           3: "/fb"

Opinions would be great!

Ron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)

iEYEARECAAYFAky9DuIACgkQ2t2zxlt4g/RPRACfSfK2Kgh4zRLsjmTNu+LGDxn9
+F0AoLBkFZ2EzOnW+BXuSndp8zP0N1A3
=t0DG
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: