WebApp Sec mailing list archives

Hit Throttling - Content Theft Prevention


From: Nik Cubrilovic <cubrilovic () gmail com>
Date: Wed, 19 Oct 2005 16:04:40 +1000

Hello,

I would like to hear advice from others about implementing some
application-layer firewall rules to prevent site scraping by bots. A
lot of companies out there base their business models off of content
on a website, thus preventing somebody from just crawling past and
downloading all that information is important. At the moment, my
strategy has been to use firewall rules and/or  apache ACL's, but as
most of you may well know this simply turns into a game of
wack-a-mole. For one client we even created a web interface for them
to be able to add IP addresses or hostnames to block from the site for
a certain time period, but the problem is much larger than that. Also
with ACL's the untrained may well be adding large ISP proxy servers to
the blacklist and closing out a large slice of their clients at the
same time as trying to prevent bots scraping their site.

The other strategy has been to implement some simple logic within the
actual web application that says 'this users session has accessed our
site 30 times in the last minute, so block him' and some simple rules
like that, which seems to work ok but sometimes can't be used to
protect images or PDF files (I would have to stream the binary content
through the web app which is not an option on many occasions). So, in
short, what other options are there? To me, this sort of protection
sounds like DoS protection but with much lower thresholds. I need a
solution (I am sure many others do as well) that analyses web traffic
at the application layer, and drops users who are exceeding common use
cases for that website. Any thoughts?

Regards,
Nik

--
Nik Cubrilovic  -  <http://www.nik.com.au>


Current thread: