Nmap Development mailing list archives
Re: httpspider lib and hostnames with special characters
From: David Fifield <david () bamsoftware com>
Date: Mon, 5 Mar 2012 14:28:02 -0800
On Mon, Mar 05, 2012 at 08:19:28PM +0100, Patrik Karlsson wrote:
On Mon, Mar 5, 2012 at 6:04 PM, Djalal Harouni <tixxdz () opendz org> wrote:On Mon, Mar 05, 2012 at 03:30:43PM +0100, Gutek wrote:Thanks Djalal. It sounds to me like a weakness in httpspider's efficiency. Let's consider a practical example with h-online.com and a httpspider-dependant script, let's say http-backup-finder. With a simple command like nmap -v -Pn -p80 -n --script http-backup-finder www.h-online.com it (silently) won't work because a debug reveals that every link will be discarded, maybe fooling the user into thinking that no backup was found: - ---------- NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20; withinhost=www.h-online.com ... NSE: httpspider: Link is not within host: http://www.h-online.com/nettools/tools/spam-list-query NSE: httpspider: Link is not within host:http://www.h-online.com/security/services/Reserved-IPv4-addresses-732899.htmlNSE: httpspider: Link is not within host: http://www.h-online.com/Contact-273335.html NSE: httpspider: Link is not within host: http://www.h-online.com/Privacy-Policy-of-h-online-com-273337.html - ----------- Now, with a script arg to override this withinhost issue, it will work as intended: nmap -v -Pn -p80 -n --script http-backup-finder --script-args http-backup-finder.withindomain=www.h-online.com -d2 www.h-online.com - ----------- NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20; withindomain=h-online.com - ----------- As I understand it, a withindomain argument is mandatory when users want to deal with hyphened hostnames ? if it's an intented behavior and not a bug, maybe this should be explicitely stated in the documentation ?Gutek this is clearly a bug, this should work by default. httpspider must handle it, from rfc952: <name> ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>] but it was updated later in rfc1123 to allow names to start with digits. Thanks. -- tixxdz http://opendz.orgDoes this patch look sane enough? --- nselib/httpspider.lua (revision 28198) +++ nselib/httpspider.lua (working copy) @@ -92,6 +92,8 @@ domain_match = ("^%s://.*%s/"):format(o.base_url:getProto(), o.base_url:getDomain() ) end end + host_match = Options.escape(host_match) + domain_match = Options.escape(domain_match) -- set up the appropriate matching functions if ( o.withinhost ) then o.withinhost = function(url) return string.match(tostring(url), host_match) end @@ -107,6 +109,12 @@ addWhitelist = function(self, func) table.insert(self.whitelist, func) end, addBlacklist = function(self, func) table.insert(self.blacklist, func) end, + escape = function(str) + if ( str ) then + return str:gsub("%-", "%%-"):gsub("%.", "%%.") + end + end, + }
I don't like it. Doesn't gsub("%.", "%%.") break domain_match "^%s://.*%s/" because it has a magic dot in it? There may be other weird cases with magic characters that can appear (rightly or wrongly) in a domain name. How about parsing the URL into components, and then using plain string matching? Something like this: function match_domain(test, domain) return test == domain or endswith(test, "." .. domain) end function is_within_host(url) -- parse url return u.proto == o.base_url:getProto() and match_domain(u.authority, o.base_url:getDomain()) -- plus port handling if appropriate end As it stands, it looks like there can be some false matches, for example "http://notnmap.org" would pass the domain_match ("^%s://.*%s/") for "http://nmap.org", and "nmap.org.not.com" would pass the first host_match ("^%s://%s") for "http://nmap.org". David Fifield _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- httpspider lib and hostnames with special characters Gutek (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Gutek (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Patrik Karlsson (Mar 05)
- Re: httpspider lib and hostnames with special characters David Fifield (Mar 05)
- Re: httpspider lib and hostnames with special characters Patrik Karlsson (Mar 06)
- Re: httpspider lib and hostnames with special characters Gutek (Mar 06)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)