nanog mailing list archives
Re: HE.net, Fremont-2 outage?
From: "Robert Mathews (OSIA)" <mathews () hawaii edu>
Date: Thu, 05 Nov 2009 00:32:02 -0500
Alex Rubenstein wrote:
Yup. Related: "100% availability" is a marketing person's dream; it sounds good in theory but is unattainable in practice, and is a reliable sign of non-100%-reliability.You are confusing two different things. Availability != Reliability.
Pardon the interruption... In the aforementioned statement, there appears an intense/flagrant - compartmentalization/separation of terms without sufficient explanation. Note that in being available, 'a' criteria to ensure reliability is met. If one has the desire to delve into some of the nuanced operational perspective, see: http://ow.ly/zmQg (pdf) or http://ow.ly/zmTB (web friendly). The article is also available through the IEEE Portal at http://ow.ly/zn3a (if one of the other links appear to be unavailable, anytime).
For instance, an airplane is designed to be 100% reliable, but much less available. To keep a 747 from not crashing (100% reliability) it needs significant downtime (not 100% available).
This explanation, aside from being unsatisfactory, is misleading. Operating times and maintenance times are very much separate quantities.
And even for those who follow best practices... You can inspect and maintain things until you're blue in the face. One day a contractor will drop a wrench into a PDU or UPS or whatever and spectacular things will happen.That's were policies, procedures and methods come in (read: SAS70)
For the operationally minded -- on one hand, there is an assumption here that 'accidents' are not preventable; on the other hand, there is at least an assumption being made here that SAS 70 is the curative for 'accidents.' To be brief, accounting for human behavior as an underlying contributor to accidents can be a backbreaking and immensely messy endeavor. In this respect, SAS 70 can only be assistive. All the best, Robert Mathews. --
Current thread:
- Re: HE.net, Fremont-2 outage?, (continued)
- Re: HE.net, Fremont-2 outage? Seth Mattinen (Nov 04)
- Re: HE.net, Fremont-2 outage? Joe Greco (Nov 04)
- RE: HE.net, Fremont-2 outage? Alex Rubenstein (Nov 04)
- Re: HE.net, Fremont-2 outage? Joe Greco (Nov 04)
- Re: HE.net, Fremont-2 outage? Raphael Carrier (Nov 04)
- Re: HE.net, Fremont-2 outage? Scott Howard (Nov 04)
- Re: HE.net, Fremont-2 outage? Joe Greco (Nov 04)
- Re: HE.net, Fremont-2 outage? Scott Howard (Nov 04)
- Re: HE.net, Fremont-2 outage? Owen DeLong (Nov 04)
- RE: HE.net, Fremont-2 outage? Bryan King (Nov 04)
- Re: HE.net, Fremont-2 outage? Robert Mathews (OSIA) (Nov 04)
- Re: HE.net, Fremont-2 outage? Joe Greco (Nov 05)
- Human Factors and Accident reduction/mitigation Owen DeLong (Nov 05)
- Re: Human Factors and Accident reduction/mitigation Robert Boyle (Nov 05)
- Re: Human Factors and Accident reduction/mitigation Michael Peddemors (Nov 05)
- Re: Human Factors and Accident reduction/mitigation Owen DeLong (Nov 05)
- Re: Human Factors and Accident reduction/mitigation JC Dill (Nov 06)
- Re: Human Factors and Accident reduction/mitigation Owen DeLong (Nov 07)
- Re: Human Factors and Accident reduction/mitigation JC Dill (Nov 07)
- Re: Human Factors and Accident reduction/mitigation Anton Kapela (Nov 08)
- Re: Human Factors and Accident reduction/mitigation JC Dill (Nov 08)