Nmap Development mailing list archives

Re: httpspider lib and hostnames with special characters


From: Patrik Karlsson <patrik () cqure net>
Date: Tue, 6 Mar 2012 16:51:16 +0100

On Mon, Mar 5, 2012 at 11:28 PM, David Fifield <david () bamsoftware com>wrote:

On Mon, Mar 05, 2012 at 08:19:28PM +0100, Patrik Karlsson wrote:
On Mon, Mar 5, 2012 at 6:04 PM, Djalal Harouni <tixxdz () opendz org>
wrote:

On Mon, Mar 05, 2012 at 03:30:43PM +0100, Gutek wrote:
Thanks Djalal.
It sounds to me like a weakness in httpspider's efficiency. Let's
consider a practical example with h-online.com and a
httpspider-dependant script, let's say http-backup-finder.
With a simple command like nmap -v -Pn -p80 -n --script
http-backup-finder www.h-online.com  it (silently) won't work
because a
debug reveals that every link will be discarded, maybe fooling the
user
into thinking that no backup was found:
- ----------
NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20;
withinhost=www.h-online.com
...
NSE: httpspider: Link is not within host:
http://www.h-online.com/nettools/tools/spam-list-query
NSE: httpspider: Link is not within host:


http://www.h-online.com/security/services/Reserved-IPv4-addresses-732899.html
NSE: httpspider: Link is not within host:
http://www.h-online.com/Contact-273335.html
NSE: httpspider: Link is not within host:
http://www.h-online.com/Privacy-Policy-of-h-online-com-273337.html
- -----------

Now, with a script arg to override this withinhost issue, it will
work
as intended:
nmap -v -Pn -p80 -n --script http-backup-finder --script-args
http-backup-finder.withindomain=www.h-online.com -d2
www.h-online.com

- -----------
NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20;
withindomain=h-online.com
- -----------

As I understand it, a withindomain argument is mandatory when users
want
to deal with hyphened hostnames ? if it's an intented behavior and
not a
bug, maybe this should be explicitely stated in the documentation ?
Gutek this is clearly a bug, this should work by default.

httpspider must handle it, from rfc952:
<name> ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]

but it was updated later in rfc1123 to allow names to start with
digits.

Thanks.

--
tixxdz
http://opendz.org


Does this patch look sane enough?

--- nselib/httpspider.lua       (revision 28198)
+++ nselib/httpspider.lua       (working copy)
@@ -92,6 +92,8 @@
                                        domain_match =
("^%s://.*%s/"):format(o.base_url:getProto(), o.base_url:getDomain() )
                                end
                        end
+                       host_match = Options.escape(host_match)
+                       domain_match = Options.escape(domain_match)
                        -- set up the appropriate matching functions
                        if ( o.withinhost ) then
                                o.withinhost = function(url) return
string.match(tostring(url), host_match)     end
@@ -107,6 +109,12 @@
        addWhitelist = function(self, func) table.insert(self.whitelist,
func) end,
        addBlacklist = function(self, func) table.insert(self.blacklist,
func) end,

+       escape = function(str)
+               if ( str ) then
+                       return str:gsub("%-", "%%-"):gsub("%.", "%%.")
+               end
+       end,
+
 }

I don't like it. Doesn't gsub("%.", "%%.") break domain_match
"^%s://.*%s/" because it has a magic dot in it? There may be other weird
cases with magic characters that can appear (rightly or wrongly) in a
domain name.

How about parsing the URL into components, and then using plain string
matching? Something like this:

function match_domain(test, domain)
       return test == domain or endswith(test, "." .. domain)
end

function is_within_host(url)
       -- parse url
       return u.proto == o.base_url:getProto() and
match_domain(u.authority, o.base_url:getDomain())
       -- plus port handling if appropriate
end

As it stands, it looks like there can be some false matches, for example
"http://notnmap.org"; would pass the domain_match ("^%s://.*%s/") for
"http://nmap.org";, and "nmap.org.not.com" would pass the first
host_match ("^%s://%s") for "http://nmap.org";.

David Fifield


I just committed a more decent fix. I would greatly appreciate if you find
the time to give it some testing.

Cheers,
Patrik

-- 
Patrik Karlsson
http://www.cqure.net
http://twitter.com/nevdull77
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: