IDS mailing list archives

Re: robots.txt access rules


From: Krzysztof Zaraska <kzaraska () student uci agh edu pl>
Date: Thu, 22 Jan 2004 12:43:53 +0100 (CET)

On Wed, 21 Jan 2004, Federico Petronio wrote:

Hi all...

I have installed snort-inline and I am customizing rulesets.


My cuestion is about the rule sid:1852 which match accesses to 
/robots.txt files. The goal of this rule is to not let access to 
information about sensitive areas of the webserver (which can be use to 
achive knowledge about restricted areas, etc), but if they are not 
present Google, etc. would intent to index those areas... 

Interesting. From what I have heard, one of the goals of /robots.txt is to
prevent the situation that a spider starts looping forever indexing the
same dynamically generated pages, causing resource exhaustion on the
server. Thus blocking /robots.txt can lead to problems...

Not to mention that IIRC every well-behaved spider should read and respect
/robots.txt, so it will generate alerts in innocent cases (including
someone launching wget -r on the site).

The restriccted areas should be RESTRICTED and not just "hidden" so...
the rule make no sence?

Ahem. Certain people believed that /robots.txt can be used for access
control, and learned that it was a bad idea the hard way:

http://www.theregister.co.uk/content/6/27230.html

// Krzysztof Zaraska * kzaraska (at) student.uci.agh.edu.pl
// http://mops.uci.agh.edu.pl/~kzaraska/ * http://www.prelude-ids.org/
// A dream will always triumph over reality, once it is given the chance.
//              -- Stanislaw Lem




---------------------------------------------------------------------------
---------------------------------------------------------------------------


Current thread: