WebApp Sec mailing list archives

Re: Combatting automated download of dynamic websites?

From: Javier Fernandez-Sanguino <jfernandez () germinus com>
Date: Tue, 30 Aug 2005 13:24:53 +0200

Matthijs R. Koot wrote:

Thanks for your reply zeno! But actually, referer-based anti leeching
won't do it for me and mod_throttle isn't suitable for Apache 2. I'm in
need of a throttling function based on something more advanced like a
'request history stack' to check the order in which pages were
requested, probably within a certain time period, et cetera. Maybe it'd
be better to move such security measures into the actual web application
itself, but I'm still hoping someone knows of a service-based solution
(i.e. like the beforementioned Apache module).

Several web-oriented proxy firewalls implement a "request historystack" like you mentioned to prevent IP address from going directlyagainst a given resource without following the "flow" established bythe webapp programmer.

You could implement this by way of session handling, tying sessionidentifiers to the client (through IP or user-agent) and thenchecking, application-wise, if the session is being handled as youwould normally expect. Don't use referer information, stick sessioninformation to some kind of finite state machine that tells you if theuser went through your defined procedure. In your Amazon example:first look at the book, then at the book details and then allow him tobrowse contents.

Of course, a user can try to reuse his session ID and spoof theidentifiers (User-Agent) in alternative download technologies to beable to retrieve the content in the end. But it might raise the barsomewhat. I'm not aware of the capabilities of Teleport Pro or othersoftware but I would defeat those checks by implementing a targetedweb crawler with Perl's LWP::UserAgent.

If you want to stop even a determined (malicious?) user fromretrieving the content then you will want to impose resource limits assuggested in the thread. Problem is, you can only tie that to the IPaddress (all other browser presented information is spoofable) butthen you have the issue that some IP address (dynamic ranges fromISPs) only have one "client" behind while others (ISP's transparentproxies, companies proxies) might have more than one "client" behind.So either you monitor that, investigate deviations and tailor it forthose IP address that might be more resource intensive or you mightbe blocking legitimate users from accessing the content in the secondsituation (i.e. proxies being used by a large number of users).


My 2c.

Javier

Current thread:

Combatting automated download of dynamic websites? Matthijs R. Koot (Aug 29)
- Re: Combatting automated download of dynamic websites? Jayson Anderson (Aug 29)
  - Re: Combatting automated download of dynamic websites? Serg Belokamen (Aug 29)
- Re: Combatting automated download of dynamic websites? bugtraq (Aug 29)
  - Re: Combatting automated download of dynamic websites? Matthijs R. Koot (Aug 29)
    - Re: Combatting automated download of dynamic websites? Javier Fernandez-Sanguino (Aug 30)
    - Re: Combatting automated download of dynamic websites? Eoin Keary (Aug 31)
    - Re: Combatting automated download of dynamic websites? Javier Fernandez-Sanguino (Sep 05)
    - Re: Combatting automated download of dynamic websites? Michael Boman (Aug 30)
    - Re: Combatting automated download of dynamic websites? Paul M. (Sep 05)
    - Re: Combatting automated download of dynamic websites? Eoin Keary (Sep 07)
- Re: Combatting automated download of dynamic websites? Achim Hoffmann (Aug 31)
- <Possible follow-ups>
- Re: Combatting automated download of dynamic websites? Tony Stahler (Aug 30)
  - Message not available
    - Fwd: Combatting automated download of dynamic websites? Mark Quinn (Aug 31)