PaulDotCom mailing list archives
Re: Looking for a good web spider
From: Dennis Lavrinenko <dennis.lavrinenko () gmail com>
Date: Sat, 25 Sep 2010 19:27:22 -0500
Although I've used it mostly for mirroring sites, I believe httrack has --spider option. On Sat, Sep 25, 2010 at 7:46 AM, Robin Wood <robin () digininja org> wrote:
On 25 September 2010 02:46, Adrian Crenshaw <irongeek () irongeek com> wrote:Hi all, I'm looking at some of the tools in BT4R1, and will be looking atwhatSamurai WTF has to offer once I finish downloading the latest version.I'mlooking for some sort of spider that lets me do the following: 1. Follow every link on a page, even onto other domains, as long as thetoplevel domain name is the same (edu, com, cn, whatever) 2. For every page it visits, it collect the file names of all resources. 3. The headers so I can see the server version. 4. Grab the robots .txt if possible. Any ideas on the best tool for the job, or do I need to roll my own?If you want to roll your own you can take my CeWL code and check the spider, I do a full spider and check whether you are on the same site or off and grab all the documents, you should easily be able to modify this to do what you want. Robin _______________________________________________ Pauldotcom mailing list Pauldotcom () mail pauldotcom com http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom Main Web Site: http://pauldotcom.com
_______________________________________________ Pauldotcom mailing list Pauldotcom () mail pauldotcom com http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom Main Web Site: http://pauldotcom.com
Current thread:
- Looking for a good web spider Adrian Crenshaw (Sep 25)
- Re: Looking for a good web spider Antonios Atlasis (Sep 25)
- Re: Looking for a good web spider Jon Schipp (Sep 25)
- Re: Looking for a good web spider Matt Erasmus (Sep 25)
- Re: Looking for a good web spider Robin Wood (Sep 25)
- Re: Looking for a good web spider Adrian Crenshaw (Sep 26)
- Re: Looking for a good web spider Antonios Atlasis (Sep 27)
- Re: Looking for a good web spider Jim Halfpenny (Sep 27)
- Re: Looking for a good web spider Dennis Lavrinenko (Sep 26)
- Re: Looking for a good web spider Adrian Crenshaw (Sep 26)
- Re: Looking for a good web spider Bugtrace (Sep 26)
- Re: Looking for a good web spider Xander Solis (Sep 27)
- Re: Looking for a good web spider Antonios Atlasis (Sep 27)
- Looking for a good web spider Daniel Holiday (Sep 27)