PaulDotCom mailing list archives

Re: Looking for a good web spider


From: Jon Schipp <jonschipp () gmail com>
Date: Sat, 25 Sep 2010 00:21:19 -0400

I'm pretty sure you can do all this with wget( --spider,
--save-headers)Also, the list-urls(/pentest/enumeration/list-urls)
script in BT can list all the URL's from a single web page. Scripting time?
I'm sure there is a more efficient way of doing this.

On 09/24/2010 08:46 PM, Adrian Crenshaw wrote:
Hi all,
    I'm looking at some of the tools in BT4R1, and will be looking at
what Samurai WTF has to offer once I finish downloading the latest
version. I'm looking for some sort of spider that lets me do the following:

1. Follow every link on a page, even onto other domains, as long as the
top level domain name is the same (edu, com, cn, whatever)
2. For every page it visits, it collect the file names of all resources.
3. The headers so I can see the server version.
4. Grab the robots .txt if possible.

Any ideas on the best tool for the job, or do I need to roll my own?

Thanks,
Adrian



_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com


-- 
- Jon
-- 
------------------------------------------------------------------
Do you OpenGPG? Search the MIT key server with string "jon schipp"
@insightbb.com.

I prefer encrypted mail, when dealing with sensitive data.

Fax & VMB: 206-426-1406

Dubois County Linux User Group - http://www.dclinux.org
BloomingLabs -  http://www.bloominglabs.org
ISSA-Kentuckiana  -  http://issa-kentuckiana.org
_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com


Current thread: