Interesting People mailing list archives

HOW BIG IS THE INTERNET?

From: David Farber <farber () central cis upenn edu>
Date: Fri, 19 Aug 1994 14:55:47 -0400
From: aa () wired com (Andrew Anker)


Wired has posted to its Web site (http://www.wired.com/) the following
article, written by Donna L. Hoffman and Thomas P. Novak, Associate
Professors of Management at the Owen Graduate School of Management at
Vanderbilt University.


For the hyperlinked version, please check out our site.



---


Copyright (c) 1994 by Donna L. Hoffman and Thomas P. Novak


August 18, 1994


HOW BIG IS THE INTERNET?


Peter Lewis of the New York Times caused a stir on the Internet with his
August 10, front-page, article "Doubts are Raised on Actual Number of
Internet's Users." Lewis cast doubt upon the commonly cited number of 20 to
30 million Internet users, quoting John Quarterman as saying "Suppose there
were really only two million or three million." A deflation of market size
by a magnitude of ten is certainly cause for alarm. But are there grounds
to sound the alarm?


Quarterman's lower estimate is explained in a June 1994 FAQ in which he
makes the following points:


* the "best figures there are" are from *his* company's
January 1994 Internet Demographic Survey, rather than competitor Mark
Lottor's Internet Domain Survey,


* reachable hosts as determined from a survey should be
used as the baseline count of Internet hosts, rather than Lottor's estimate
of reachable hosts calculated from a sample of hosts in the Domain Name
System,


* the "real factor for users per Internet host" is about 3.5,
rather than 7.5 or even 10 users/host as is assumed by other researchers.


Let's look at each point in turn.




1) The best figures there are.


Quarterman's survey was sent to postmasters of nearly 5000 Internet
domains. Now, no offense to our local postmaster, but since he doesn't
respond to our emails, we can't imagine him taking the time to respond to
Quarterman's survey! Thus, we suspect that Vanderbilt University may not be
represented in the Internet Demographic Survey. Indeed, Quarterman's FAQ
notes that only 13% of received responses were useable. This is not very
encouraging, indicating that our postmaster would be in good company if he
did not respond. For such a high involvement product category, this
response rate is *way* too low, and introduces bias of unknown magnitude
and direction. The results are simply not projectable.


But, assuming for the moment that our postmaster *did* muster up the effort
to respond, we are concerned how he would have reacted to the survey.
Unfortunately, Quarterman's survey violates just about every rule of survey
design! For example, a basic rule of survey research is "don't ask people
questions they cannot answer." The Vanderbilt postmaster is a terrific guy
(even though he doesn't answer our emails), but we think he would have a
tough time with:


* total people in your organization: _________


* network users who send mail outside your domain: _________


* computers reachable with ICMP ECHO (ping) from the Internet: ________


* percentages of your users in the following age categories: (list of eight
age categories)


There is an expression for the requests above -- GIGO or "Garbage In,
Garbage Out." Frankly, we can't imagine postmasters, let alone anyone else,
answering these questions with anything better than wild guesses (unless of
course they've done their own surveys -- which, as far as we know, they
haven't).


Face it, these are *tough* questions that require serious legwork to
answer. We really must question the quality of the data received, and we
are certainly not convinced that in comparison to Lottor's estimates,
Quarterman's are the "best figures there are."




2) Reachable hosts.


Quarterman insists that Lottor's raw host numbers are too high because "a
lot of hosts on networks...are deliberately firewalled so you can't get
there from the Internet proper." Thus, only reachable (i.e., "pingable")
hosts should be used. Sound reasonable? Let's think about it.


A colleague across the hall has a Mac on the Internet, but he is not
pingable. A co-author of ours at the University of Pittsburgh who connects
to the Internet and uses Mosaic from his 486 machine is also not pingable.
Vanderbilt University has 100 Apple Remote Access users who are not
pingable, although they are using Mosaic and other Internet services from
home. The Owen Graduate School of Management has 400 full-time MBA
students, who are not pingable when using the Mac and Pentium machines in
our computer lab to access the Internet. Our guess is that there are a
whole lot of machines on the Internet which are not pingable - but which
are also not behind firewalls or serving as routers. If this is the case,
using the reachable hosts methodology will grossly underestimate the number
of people on the Internet.


The problem is that a focus on "reachable hosts" is biased toward server
applications rather than client applications. Surveys of Internet usage
need to focus on the end user. Unfortunately, neither Lottor's nor
Quarterman's survey focuses upon the end user *customer*.




3) Users per host.


Quarterman's FAQ claims the real factor for users per host is about 3.5.
This is apparently based upon the numbers from his Internet Demographic
Survey. As you might guess, we have some problem believing the numbers from
that survey, since it includes open-ended, excruciatingly-detailed
questions addressed to overburdened postmasters. Lottor, in reporting the
results of his January 1993 Internet Domain Survey, says that some people
have suggested 10 per host. Quarterman throws around other suggestions of 5
and 7.5. What's the "real number?" Face it - no one knows!


At this point, the amazing thing about the size of the Internet in our
minds is that *no one* really has a very good idea how large it is!
Approaches such as those taken by Lottor and Quarterman attempt to derive
the number of users by making two assumptions: 1) the number of hosts
("reachable" or not) and 2) the number of users per host.


Improvement in measurement methodology is needed to nail down both of these
numbers. For the number of hosts, we need a better definition of a valid
host than "pingable." For the number of users per host, we really need to
obtain *distributions* of users per host for various host segments (like
.edu and .com, for starters). Quite likely, the mean will be a very poor
measure of central tendency when hosts such as aol.com -- with a million
users having limited Internet access -- are lumped together with our
single-user workstations.


Current approaches to estimating the size of the Internet are akin to
estimating the number of people in the United States by sampling the number
of buildings, without regard to their function or contents. There is
another way to go  measure the usage of the Internet. A way that is
market-driven and customer-oriented. Rather than inferring the number of
users by counting and sampling machines, sample the *users*.


This opens up the question, "what is a user?" Our anecdotal evidence
suggests that users go through a progression of adoption stages, starting
with email, moving on to Usenet news groups and other text-based Internet
services, and graduating to hypermedia applications such as Mosaic. All of
these types of usage need to be tracked.


The Internet has evolved dramatically in size and economic importance.  It
is high time for the first Internet Users Sample Survey.  This survey
should include the larger group of individuals with any kind of network
access.  Note that we're not talking about a proprietary survey where
information is sold to those firms willing and able to pay, but a
large-scale global sample survey of the current market size of individuals
with network access. Such a survey should be conducted on a regular (at a
minimum, annual) basis. This information is critical for the development of
electronic commerce. It is foolhardy to base strategic business decisions
upon the numbers currently available.


Thus, Lewis' article is, indeed, cause for alarm. Not because there are
"only" two or three million users of the Internet, but because it is clear
that we don't really have a clue *how many* users there really are.



---


Donna L. Hoffman and Thomas P. Novak are Associate Professors of Management
at the Owen Graduate School of Management at Vanderbilt University, where
they research the marketing implications of commercializing the Internet.



---


-aa


Andrew Anker
Vice President and Chief Technology Officer, Wired Ventures Ltd
aa () wired com                            415/222-6333 (v)
http://www.wired.com/Staff/aa/          415/222-6369 (f)
Current thread:

HOW BIG IS THE INTERNET? David Farber (Aug 19)
- <Possible follow-ups>
- Re: HOW BIG IS THE INTERNET? David Farber (Aug 19)