PaulDotCom mailing list archives

Looking for a good web spider


From: Daniel Holiday <dehaul () gmail com>
Date: Mon, 27 Sep 2010 15:24:34 -0600

I once wrote a multi-threaded spider in C++ using libcurl.
Unfortunately I wrote the code in service of my present employer and
don't own it. :)

It was very fast - we had one server that could pull down at least 50
MB/s or so on one dual core server. We completely tapped out the small
ISP's pipe where we were running the spider from - and we left this
spider on for an entire weekend, costing them a bunch of money in
bandwidth overages.

It was awesome.

We had to add some bandwidth limiting code the following week.

If you roll your own and need extreme performance, libcurl will serve you well.

dehaul
_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com


Current thread: