WebApp Sec mailing list archives

Re: Hit Throttling - Content Theft Prevention


From: "Kurt Seifried" <bt () seifried org>
Date: Wed, 19 Oct 2005 04:11:56 -0600

If their is anything worth harvesting off the site that is publicly
available and one blocks the IP address Google cache is always an
option.
Hidden links in white text: What will happen if a legitimate spider
such as Google (again) comes across this link? shall they be blocked
also. Is that what you want to do?

Google is easy to deal with since they are well behaved (in my experience with a site that has over 100,000 pages google is the best to index timely and sanely without bringing my servers to their knees). google cache can be disabled in the document with a directive. Google uses a well known user agent that can be white listed (but wait.. wouldn't this cause attackers to use the same user agent....) and they also come (for my site anyways) from a few well defined class C's. Whitelisting legitimate crawlers isn't to hard (user agent string, network blocks, reverse DNS, etc.).

-Eoin

-Kurt

Current thread: