Full Disclosure mailing list archives

Re: Google's robot.txt handling


From: Hurgel Bumpf <l0rd_lunatic () yahoo com>
Date: Tue, 11 Dec 2012 22:42:11 +0000 (GMT)

Hi guys,

thank you for your valuable feedback.

The question was raised, what prevents somebody to build a script to scan for the robots.txt manually. Seriously, let's 
call it just common sense. The time and effort invested does not pay off very well. 

This is why google is very useful in that way. Thousands of servers indexing the files in no time for free. 


Thanks,

Coman the Intensivecarian




________________________________
 Von: Jeffrey Walton <noloader () gmail com>
An: Mario Vilas <mvilas () gmail com> 
CC: full-disclosure () lists grok org uk 
Gesendet: 22:38 Dienstag, 11.Dezember 2012
Betreff: Re: [Full-disclosure] Google's robot.txt handling
 
On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas <mvilas () gmail com> wrote:
I think we can all agree this is not a vulnerability. Still, I have yet to
see an argument saying why what the OP is proposing is a bad idea. It may be
a good idea to stop indexing robots.txt to mitigate the faults of lazy or
incompetent admins (Google already does this for many specific search
queries) and there's not much point in indexing the robots.txt file for
legitimate uses anyway.
I kind of agree here. The information is valuable for the
reconnaissance phase of an attack, buts its not a vulnerability per
se. But what is to stop the attacker from fetching it himself/herself
since its at a known location for all sites? In this case, Google
would be removing aggregated search results (which means the attacker
would have to compile it himself/herself).

Google removed other interesting searches, such as social security
numbers and credit card numbers (or does not provide them to the
general public).

Jeff

On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
<scott.ferguson.it.consulting () gmail com> wrote:

If I understand the OP correctly, he is not stating that listing
something
in robots.txt would make it inaccessible, but rather that Google indexes
the robots.txt files themselves,

<snipped>

Well, um, yeah - I got that.

So you are what, proposing that moving an open door back a few
centimetres solves the (non) problem?

Take your proposal to it's logical extension and stop all search engines
(especially the ones that don't respect robots.txt) from indexing
robots.txt. Now what do you do about Nutch or even some perl script that
anyone can whip up in 2 minutes?

Security through obscurity is fine when couple with actual security -
but relying on it alone is just daft.

Expecting to world to change so bad habits have no consequence is
dangerously naive.

I suspect you're looking to hard at finding fault with Google - who are
complying with the robots.txt. Read the spec. - it's about not following
the listed directories, not about not listing the robots.txt.  Next
you'll want laws against bad weather and furniture with sharp corners.

Don't put things you don't want seen to see in places that can be seen.



On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson <
scott.ferguson.it.consulting () gmail com> wrote:


     /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
     /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)

------------------------------------------------------------------------
     Hi list,


     i tried to contact google, but as they didn't answer my email,  i do

forward this to FD.

     This "security" feature is not cleary a google vulnerability, but

exposes websites informations that are not really

     intended to be public.

     Conan the bavarian

Your point eludes me - Google is indexing something which is publicly
available. eg.:- curl http://somesite.tld/robots.txt
So it seems the solution to the "question" your raise is, um,
nonsensical.

If you don't want something exposed on your web server *don't publish
references to it*.

The solution, which should be blindingly obvious,  is don't create the
problem in the first place. Password sensitive directories (htpasswd) -
then they don't have to be excluded from search engines (because listing
the inaccessible in robots.txt is redundant).  You must of missed the
first day of web school.

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: