Full Disclosure mailing list archives

Re: Re: Re: Re: Links to Google's cache of626FrSIRTexploits


From: Valdis.Kletnieks () vt edu
Date: Thu, 23 Mar 2006 14:41:03 -0500

On Thu, 23 Mar 2006 15:15:00 GMT, Dave Korn said:

difference?  robots.txt is enforced (or ignored) by the client.  If a server 
returns a 403 or doesn't, depending on what UserAgent you specified, then 
how could making the client ignore robots.txt somehow magically make the 
server not return a 403 when you try to fetch a page?

It *can*, however, make the client *issue* a request it would otherwise not have.

If the client had snarfed a robots.txt, and it said "don't snarf anything
under /dontlook/here/", and a link pointed there. it wouldn't follow the link.

If you tell it 'robots=off', then it *would* follow the link.

Remember - robots.txt *isn't* for the pages that would 403.  It's for the pages
that *would* work, that you don't want automatically snarfed by things like
wget and googlebots....

Attachment: _bin
Description:

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: