PaulDotCom mailing list archives

Re: Looking for a good web spider


From: Antonios Atlasis <antonios.atlasis () gmail com>
Date: Mon, 27 Sep 2010 15:51:47 +0300

There is also a Linux version...

2010/9/27 Xander Solis <xrsolis () gmail com>:
httrack is free. but only runs on windows.

On Sat, Sep 25, 2010 at 8:46 AM, Adrian Crenshaw <irongeek () irongeek com>
wrote:

Hi all,
    I'm looking at some of the tools in BT4R1, and will be looking at what
Samurai WTF has to offer once I finish downloading the latest version. I'm
looking for some sort of spider that lets me do the following:

1. Follow every link on a page, even onto other domains, as long as the
top level domain name is the same (edu, com, cn, whatever)
2. For every page it visits, it collect the file names of all resources.
3. The headers so I can see the server version.
4. Grab the robots .txt if possible.

Any ideas on the best tool for the job, or do I need to roll my own?

Thanks,
Adrian

_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com


_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com

_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com


Current thread: