Full Disclosure mailing list archives

Re: Google's robots.txt handling


From: Philip Whitehouse <philip () whiuk com>
Date: Thu, 13 Dec 2012 12:52:27 +0000

I restate my email's second point.

Google is indexing robots.txt because (from all the examples I can see) robots.txt doesn't contain a line to disallow 
indexing of robots.txt

It is possible that some web sites provide actual content in a file that happens to be called robots.txt (e.g a website 
concerned with AI development).

Could Google do better by removing the file? Sure. But as webmasters haven't told them not to, even though they have 
provided other files not to index, Google is doing exactly what they were asked.

Maybe the R.E.S. should state that a valid robots.txt should not be indexed.

Incidentally Bing shows the same behaviour - in fact the Google file is the 4th hit even without any of the file type 
classifiers.

Philip Whitehouse

On 13 Dec 2012, at 11:40, Mario Vilas <mvilas () gmail com> wrote:

That paragraph says pretty much the exact opposite of what you understood.

Also, could we please stop refuting points nobody even made in the first place? OP never claimed this to be a 
vulnerability, nor ever said robots.txt is a proper security mechanism to hide files in public web directories.

All OP said was the way robots.txt is indexed allows for some Google dorks to be made, and it may be a good idea to 
avoid that. Clearly it's not the discovery of the century, but it seems fairly reasonable to me... I don't get what 
all this fuzz is about.

On Wed, Dec 12, 2012 at 12:18 PM, Christoph Gruber <list () guru at> wrote:
On 12.12.2012 at 00:23 "Lehman, Jim" <jim.lehman () interactivedata com> wrote:

It is possible to use white listing for robots.txt. Allow what you want google to index and deny everything else. 
That way google doesn't make you a goole dork target and someone browsing to your robots.txt file doesn't glean 
any sensitive files or folders. But this will not stop directory bruting to discover your publicly exposed 
sensitive data, that probably should not be exposed to the web in the first place.

Maybe I misunderstood something, but do you really think that "sensitive" can be hidden in "secret" directories on 
publicly reachable web servers?
--
Christoph Gruber
By not reading this email you don't agree you're not in any way affiliated with any government, police, ANTI- Piracy 
Group, RIAA, MPAA, or any other related group, and that means that you CANNOT read this email.
By reading you are not agreeing to these terms and you are violating code 431.322.12 of the Internet Privacy Act 
signed by Bill Clinton in 1995.
(which doesn't exist)

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/



-- 
“There's a reason we separate military and the police: one fights the enemy of the state, the other serves and 
protects the people. When the military becomes both, then the enemies of the state tend to become the people.”

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: