nanog mailing list archives

Re: Limits of reliability or is 99.999999999% realistic


From: Robert Cooper <rcbc () ibnets com>(by way of Robert Cooper <rcbc () ibnets com>)
Date: Mon, 27 Nov 2000 14:48:42 -0500


At 08:24 PM 11/25/00 -0800, Sean Donelan <sean () donelan com> wrote:

But back to my question.  What is the real requirement?  Amazon.COM had
system problems on Friday, and their site was unusuable for 30 minutes,
definitely not 99.999%.  But what did that really mean?  The FAA loses
its radar for several hours in various parts of the country.  What did
that really mean?  Essentially every system given as an example of "high-
availability, high-reliability" I've looked at, doesn't hold up under
close examination.

Is 99.999% just F.U.D. created by consultants?

Instead of pretending we can build systems which will never fail, should
we work on a realistic understanding of what can be delivered?

For some actual *data* on reliability in the US phone system take a look at: 

"Sources of Failure in the Public Switched Telephone Network" 
D. Richard Kuhn
IEEE Computer Magazine, April 1997, Vol 30, No 4. pp31-36. 

OK, it's not actual data but a summary of it. It's culled from data that the phone companies have to supply to the feds 
for every outage that affects more than 30,000 subscribers -- about the number supported by a central office. Measured 
in user-outage-minutes (duration of outage * number of subscribers) he shows availability in the range of 99.999% for 
data from the 1992 and 1993. 

More interesting is the summary of the sources of outages and how much they contribute to the total picture. For 
instance, if you were one of the average of 250,000 customers who lost service for an average of 500 minutes due to 
vandalism, you might not think 99.999% means very much. 

On the other hand the data is somewhat conservative since it counts total number of subscribers, not the total number 
of active or would-be active users during the outage. The data does include overloads which in the PSTN manifest 
themselves through call admission control (e.g. network busy signal). 

What data exists for the Internet?

[ Standard disclaimer ] 

Robert Cooper
Ironbridge Networks
55 Hayden Ave, Lexington MA 02421
www.ironbridgenetworks.com






Current thread: