WebApp Sec mailing list archives

Re: Combatting automated download of dynamic websites?


From: Eoin Keary <eoinkeary () gmail com>
Date: Wed, 31 Aug 2005 14:08:17 +0000

Should the webapp not have a session variable to record a users
progress. So a isSessionValid() method would see if the user has gone
through the correct steps in order to access the info?
Each session entry could be time stamped in order to implement
throttle functionality.
Also limiting by IP may cause issues for people behind firewalls?

Eoin



On 30/08/05, Javier Fernandez-Sanguino <jfernandez () germinus com> wrote:
Matthijs R. Koot wrote:

Thanks for your reply zeno! But actually, referer-based anti leeching
won't do it for me and mod_throttle isn't suitable for Apache 2. I'm in
need of a throttling function based on something more advanced like a
'request history stack' to check the order in which pages were
requested, probably within a certain time period, et cetera. Maybe it'd
be better to move such security measures into the actual web application
itself, but I'm still hoping someone knows of a service-based solution
(i.e. like the beforementioned Apache module).

Several web-oriented proxy firewalls implement a "request history
stack" like you mentioned to prevent IP address from going directly
against a given resource without following the "flow" established by
the webapp programmer.

You could implement this by way of session handling, tying session
identifiers to the client (through IP or user-agent) and then
checking, application-wise, if the session is being handled as you
would normally expect. Don't use referer information, stick session
information to some kind of finite state machine that tells you if the
user went through your defined procedure. In your Amazon example:
first look at the book, then at the book details and then allow him to
browse contents.

Of course, a user can try to reuse his session ID and spoof the
identifiers (User-Agent) in alternative download technologies to be
able to retrieve the content in the end. But it might raise the bar
somewhat. I'm not aware of the capabilities of Teleport Pro or other
software but I would defeat those checks by implementing a targeted
web crawler with Perl's LWP::UserAgent.

If you want to stop even a determined (malicious?) user from
retrieving the content then you will want to impose resource limits as
suggested in the thread. Problem is, you can only tie that to the IP
address (all other browser presented information is spoofable) but
then you have the issue that some IP address (dynamic ranges from
ISPs) only have one "client" behind while others (ISP's transparent
proxies, companies proxies) might have more than one "client" behind.
So either you monitor that, investigate deviations and tailor it for
those IP address that might be more resource intensive  or you might
be blocking legitimate users from accessing the content in the second
situation (i.e. proxies being used by a large number of users).

My 2c.

Javier




Current thread: