nanog mailing list archives

RE: availability and resiliency


From: "Roeland M.J. Meyer" <rmeyer () MHSC com>
Date: Fri, 29 Sep 2000 15:59:01 -0700


Hosts meeting three nines, or better, typically have redundant power
supplies and integrated UPS, bootable RAID for the OS, 
redundant NICs,
and SMP CPU configurations.

um...is an smp cpu configuration really going to help your uptime?  or
are there operating systems or hardware out there that can say to
themselves "hmph!  cpu 2 seems not to be working correctly...i'd
better spin it down."

That is a natural function these days. Fail-safe to the "off" state.
Most SMP OS's can recognise when one, out of an SMP CPU set, goes down.

just for fun a few years back i decided to check if the sun e4000 we
had had hot-swappable cpus (i figured it didn't, but why not try it?)
and i pulled one of the boards.  it didn't like it too much.

None of what I claimed is required to be hot-swappable to get 99.9%.
Enough online hot-spares will keep the system up long enough so that you
can replace the entire box. 99.9% allows over 8 hours per year outage.
You should be able to swap out a host in less than 0.5 hours.



Current thread: