Interesting People mailing list archives

IP: Size of the web, and attempts to filter it


From: Dave Farber <farber () cis upenn edu>
Date: Wed, 26 Jan 2000 09:17:24 -0500



Date: Wed, 26 Jan 2000 08:39:36 -0500
From: Jamie McCarthy <jamie () mccarthy org>
Subject: Size of the web, and attempts to filter it
To: farber () cis upenn edu
X-Mailer: Mailsmith 1.1.4 (Bluto)

Hi Dave,

The Censorware Project today released a "dynamic essay" on the
size of the world-wide web.  Accurate reports on its size are few
and far-between.  But using what we do know and applying a bit of
extrapolation, our Michael Sims has set up a webpage that gives a
daily estimate -- and uses it to put into context the concept of
"filtering the web."

http://censorware.org/web_size/

Here are some excerpts...

   So, as of today (these figures are dynamically generated on a
   daily basis), the web has roughly:

         1,570,000,000 pages;
    29,400,000,000,000 bytes of text;
           353,000,000 images; and
     5,880,000,000,000 bytes of image data.

   In just the last 24 hours, the web has added:

             3,180,000 new pages;
        59,700,000,000 new bytes of text;
               716,000 new images; and
        11,900,000,000 new bytes of image data.

   And of course, any web page can be changed or removed or any time.
   Changes may be minor, major, or total. According to Alexa, which
   is striving valiantly to create archive snapshots of major
   portions of the web, the average lifespan of a webpage is about 44
   days, which means that in the last 24 hours, about:

            35,600,000 pages changed; and
             8,020,000 images changed.

   ...

   Okay, so we've established the target that censorware companies
   have to shoot at. Millions of pages being created and changing
   every single day. In fact, to keep up with the changes, you'd need
   to download about 873,000,000,000 bytes of information per day,
   which would mean you'd need a connection capable of downloading
   10,100,000 bytes per second.

   ...

   ...you'd need just 20,200 reviewers working every day to keep up.
   If your company kept five-day work weeks, you'd need 28,300
   reviewers working Monday through Friday, no vacations, no
   holidays, no coffee breaks.

   Even at a measly seven dollars per hour, this is going to cost
   your company hundreds of millions of dollars per year
   ($396,000,000 just for straight salaries, if you're keeping track),
   just for personnel costs. To compare, N2H2, the company behind
   Bess, had only $3.1 million in total revenues for 1998, just a
   little bit short of $396,000,000. The very concept is a joke -
   the censorware companies employ anywhere from zero (several
   companies) to 2 full-time (Logon Data/makers of X-Stop) to 15
   fulltime and 58 part-time (N2H2/makers of Bess) website reviewers,
   nowhere near enough to review all the pages that changed on any
   given day, let alone the rest of the web. None of them employs
   even one one-thousandth of the number of workers required.

   So what do they do?
--
        Jamie McCarthy
        jamie () mccarthy org
 http://jamie.mccarthy.org/



Current thread: