Nmap Development mailing list archives
Re: Web crawling library proposal
From: Patrick Donnelly <batrick () batbytes com>
Date: Wed, 19 Oct 2011 15:45:06 -0400
On Wed, Oct 19, 2011 at 3:25 AM, Paulino Calderon <paulino () calderonpale com> wrote:
Hi list, I'm attaching my working copies of the web crawling library and a few scripts that use it. It would be great if I can get some feedback.
For the library itself: o I'm not convinced a Queue implementation is necessary. I'd prefer just using table.insert/table.remove until evidence is presented it is a performance block. o Libraries should not use the registry. Provide an interface to access private data instead. o is_url_absolute should anchor the pattern search to the beginning of the URI o Make get_sitemap return an iterator instead of a table of results. o Does get_sitemap return the URI for every site that's been crawled? Shouldn't it return what we requested it to crawl? It would appear if two scripts try to crawl at the same time, bad things happen with the global queue structures (among other things). -- - Patrick Donnelly _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Re: Web crawling library proposal Paulino Calderon (Oct 18)
- Re: Web crawling library proposal Patrick Donnelly (Oct 19)
- Re: Web crawling library proposal Paulino Calderon (Oct 19)
- Re: Web crawling library proposal Patrick Donnelly (Oct 19)
- Re: Web crawling library proposal Paulino Calderon (Oct 19)
- Re: Web crawling library proposal Paulino Calderon (Oct 19)
- Re: Web crawling library proposal Patrick Donnelly (Oct 19)
- Re: Web crawling library proposal Patrik Karlsson (Oct 19)
- Re: Web crawling library proposal Fyodor (Nov 01)
- Re: Web crawling library proposal David Fifield (Nov 05)
- Re: Web crawling library proposal Paulino Calderon (Nov 07)
- Re: Web crawling library proposal Patrik Karlsson (Nov 30)