Nmap Development mailing list archives
Re: httpspider lib and hostnames with special characters
From: Patrik Karlsson <patrik () cqure net>
Date: Tue, 6 Mar 2012 16:51:16 +0100
On Mon, Mar 5, 2012 at 11:28 PM, David Fifield <david () bamsoftware com>wrote:
On Mon, Mar 05, 2012 at 08:19:28PM +0100, Patrik Karlsson wrote:On Mon, Mar 5, 2012 at 6:04 PM, Djalal Harouni <tixxdz () opendz org>wrote:On Mon, Mar 05, 2012 at 03:30:43PM +0100, Gutek wrote:Thanks Djalal. It sounds to me like a weakness in httpspider's efficiency. Let's consider a practical example with h-online.com and a httpspider-dependant script, let's say http-backup-finder. With a simple command like nmap -v -Pn -p80 -n --script http-backup-finder www.h-online.com it (silently) won't workbecause adebug reveals that every link will be discarded, maybe fooling theuserinto thinking that no backup was found: - ---------- NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20; withinhost=www.h-online.com ... NSE: httpspider: Link is not within host: http://www.h-online.com/nettools/tools/spam-list-query NSE: httpspider: Link is not within host:http://www.h-online.com/security/services/Reserved-IPv4-addresses-732899.htmlNSE: httpspider: Link is not within host: http://www.h-online.com/Contact-273335.html NSE: httpspider: Link is not within host: http://www.h-online.com/Privacy-Policy-of-h-online-com-273337.html - ----------- Now, with a script arg to override this withinhost issue, it willworkas intended: nmap -v -Pn -p80 -n --script http-backup-finder --script-args http-backup-finder.withindomain=www.h-online.com -d2www.h-online.com- ----------- NSE: httpspider: Spidering limited to: maxdepth=3; maxpagecount=20; withindomain=h-online.com - ----------- As I understand it, a withindomain argument is mandatory when userswantto deal with hyphened hostnames ? if it's an intented behavior andnot abug, maybe this should be explicitely stated in the documentation ?Gutek this is clearly a bug, this should work by default. httpspider must handle it, from rfc952: <name> ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>] but it was updated later in rfc1123 to allow names to start withdigits.Thanks. -- tixxdz http://opendz.orgDoes this patch look sane enough? --- nselib/httpspider.lua (revision 28198) +++ nselib/httpspider.lua (working copy) @@ -92,6 +92,8 @@ domain_match =("^%s://.*%s/"):format(o.base_url:getProto(), o.base_url:getDomain() )end end + host_match = Options.escape(host_match) + domain_match = Options.escape(domain_match) -- set up the appropriate matching functions if ( o.withinhost ) then o.withinhost = function(url) returnstring.match(tostring(url), host_match) end@@ -107,6 +109,12 @@ addWhitelist = function(self, func) table.insert(self.whitelist,func) end,addBlacklist = function(self, func) table.insert(self.blacklist,func) end,+ escape = function(str) + if ( str ) then + return str:gsub("%-", "%%-"):gsub("%.", "%%.") + end + end, + }I don't like it. Doesn't gsub("%.", "%%.") break domain_match "^%s://.*%s/" because it has a magic dot in it? There may be other weird cases with magic characters that can appear (rightly or wrongly) in a domain name. How about parsing the URL into components, and then using plain string matching? Something like this: function match_domain(test, domain) return test == domain or endswith(test, "." .. domain) end function is_within_host(url) -- parse url return u.proto == o.base_url:getProto() and match_domain(u.authority, o.base_url:getDomain()) -- plus port handling if appropriate end As it stands, it looks like there can be some false matches, for example "http://notnmap.org" would pass the domain_match ("^%s://.*%s/") for "http://nmap.org", and "nmap.org.not.com" would pass the first host_match ("^%s://%s") for "http://nmap.org". David Fifield
I just committed a more decent fix. I would greatly appreciate if you find the time to give it some testing. Cheers, Patrik -- Patrik Karlsson http://www.cqure.net http://twitter.com/nevdull77 _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- httpspider lib and hostnames with special characters Gutek (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Gutek (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Patrik Karlsson (Mar 05)
- Re: httpspider lib and hostnames with special characters David Fifield (Mar 05)
- Re: httpspider lib and hostnames with special characters Patrik Karlsson (Mar 06)
- Re: httpspider lib and hostnames with special characters Gutek (Mar 06)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)
- Re: httpspider lib and hostnames with special characters Djalal Harouni (Mar 05)