Nmap Development mailing list archives
Re: Proposed improvements on httpspider.lua
From: George Chatzisofroniou <sophron () latthi com>
Date: Mon, 24 Jun 2013 14:00:18 +0300
On Sun, Jun 23, 2013 at 04:59:42PM -0700, David Fifield wrote:
Even better, can't you use a design where you provide a callback function that is called for each resource? The function returns true if spidering should continue or false if not. A script that just wants to list all the external resources can make note of them in the callback, and return false for them. The default callback would just exclude other domains, the way it works now.
I don't think that a boolean callback function can do the job. There are two different operations that need approval: spidering - If this is true, the crawler will return the resource. scraping - If this is true, the crawler will scrape the resource for any links. There are cases where we want to spider a link, but we don't want to scrape it. To make this work using a callback function, the function should return a tuple of two boolean values (one for each operation). For example, [true, true] would mean, both spidering and scraping are enabled for this resource. Or we could use two different callback functions (one for each operation), or provide the user with the method "urlqueue:add(links)" to scrape the links and add them to url queue manually by choice. But isn't this making things more complicated? -- George Chatzisofroniou http://sophron.latthi.com
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Sent through the dev mailing list http://nmap.org/mailman/listinfo/dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Proposed improvements on httpspider.lua George Chatzisofroniou (Jun 23)
- Re: Proposed improvements on httpspider.lua David Fifield (Jun 23)
- Re: Proposed improvements on httpspider.lua George Chatzisofroniou (Jun 24)
- Re: Proposed improvements on httpspider.lua David Fifield (Jun 23)