funsec mailing list archives

Re: covert crawlers: I wonder how "nobody" came up with


From: "Dr. Neal Krawetz" <hf () hackerfactor com>
Date: Sun, 15 Jan 2006 08:25:10 -0700 (MST)

On Sun Jan 15 01:09:39 2006, Gadi Evron wrote:
Subject: [funsec] covert crawlers: I wonder how "nobody" came up with
this before?

http://www.wired.com/news/technology/0,70016-0.html?tw=rss.index

Actually, people have come up with it before.  :-)

Here's some readings on human vs bot web crawling behaviors:
  January 2005:  http://grid.ucy.ac.cy/Papers/comcomCrawl05.pdf
  August 2005:   http://www.myrtlebeachonline.com/mld/myrtlebeachonline/business/technology/12403171.htm
(I think there was a paper back in 2002-2003 on simulating human behavior
with a web crawler, but I couldn't find it in 3 minutes.  Anyone know
know the reference?)

I remember back in 1998, a few people began adding random delays and
depth-first queueing to their web bots.  This prevented automatic blocking
by bot detectors.
(My own web bot only supports random delays -- added around 1998.  But
it still uses breadth-first.)

Most web crawlers today support impersonation: pretending to be IE,
Netscape, etc.

And a few bots supported multiple proxies (multiple IP addresses) to prevent
single IP address blocks.  (Imagine tying TOR into a bot... hmmm...)

But the folks at SPI Dynamics seem to have taken the concept to the extreme.
(Defined personalities per thread... Good for them!)

                                        -Neal
--
Neal Krawetz, Ph.D.
Hacker Factor Solutions
http://www.hackerfactor.com/

_______________________________________________
Fun and Misc security discussion for OT posts.
https://linuxbox.org/cgi-bin/mailman/listinfo/funsec
Note: funsec is a public and open mailing list.


Current thread: