nanog mailing list archives

RE: HE.net, Fremont-2 outage?


From: Alex Rubenstein <alex () corp nac net>
Date: Wed, 4 Nov 2009 14:06:03 -0500

Yup.  Related: "100% availability" is a marketing person's dream; it
sounds good in theory but is unattainable in practice, and is a
reliable sign of non-100%-reliability.

You are confusing two different things.

Availability != Reliability.

For instance, an airplane is designed to be 100% reliable, but much less available. To keep a 747 from not crashing 
(100% reliability) it needs significant downtime (not 100% available).



And even for those who follow best practices...  You can inspect and
maintain things until you're blue in the face.  One day a contractor
will drop a wrench into a PDU or UPS or whatever and spectacular things
will happen.  

That's were policies, procedures and methods come in (read: SAS70)


Or a battery develops a strange fault.

Get more than one string, one more than one UPS, with monitoring. Batteries are NOT the Achilles heel everyone wants to 
make you believe they are.




"Question everything, assume nothing, discuss all, and resolve quickly."

-- Alex Rubenstein, AR97, K2AHR, alex () nac net, latency, Al Reuben --
--    Net Access Corporation, 800-NET-ME-36, http://www.nac.net   --


Current thread: