Nmap Development mailing list archives
Re: Proposed improvements on httpspider.lua
From: George Chatzisofroniou <sophron () latthi com>
Date: Sat, 13 Jul 2013 23:41:05 +0300
On Mon, Jun 24, 2013 at 02:00:18PM +0300, George Chatzisofroniou wrote:
There are two different operations that need approval: spidering - If this is true, the crawler will return the resource. scraping - If this is true, the crawler will scrape the resource for any links.
Progress has been slow on this issue, because me and Patrick were trying to design a callback API. Eventualy, things went too complicated so we took a step back and i implemented a sigle callback mechanism. To achieve backwards compatibility, the default callbacks are the current behavior. The user may use, as always, the boolean options (withinhost and withindomain) to adjust crawler's behaviour for spidering. But for any advanced case, the user can choose to replace the default behaviour of withinhost and withindomain callbacks. He can easily define his new callbacks using some ultity functions defined by the httpspider library. For example, consider the following sample code: crawler.options.withinhost = function(url) if crawler:iswithinhost(url) and not crawler:isresource(url, "js") and not crawler:isresource(url, "css") then return true end end In this example, we override the default withinhost method and we allow spidering only on resources within the host that they are not "js" or "css". We make use of two ultity functions (iswithinhost and isresource). There is also doscraping callback that can be replaced to adjust crawler's scraping behaviour. A full working example is my http-referer-checker script [1]. Apart from this feature, i made some more changes, like fixing a syntax mistake, replacing annoying tabs with spaces and rewriting some parts. I've attached the new upgraded httpspider. If you have developed a script that makes use of this library, please use the upgraded version and let me know if it still works as expected. I've done some tests, but i may missed something. [1]: https://svn.nmap.org/nmap-exp/sophron/nse-support/scripts/http-referer-checker.nse -- George Chatzisofroniou
Attachment:
httpspider.lua
Description:
_______________________________________________ Sent through the dev mailing list http://nmap.org/mailman/listinfo/dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Re: Proposed improvements on httpspider.lua George Chatzisofroniou (Jul 06)
- <Possible follow-ups>
- Re: Proposed improvements on httpspider.lua George Chatzisofroniou (Jul 13)
- Re: Proposed improvements on httpspider.lua George Chatzisofroniou (Jul 18)