nanog mailing list archives

Re: Quick question.


From: "Alexei Roudnev" <alex () relcom net>
Date: Tue, 3 Aug 2004 22:54:39 -0700


No need.

Remove disk. Insert isk to spare. Start spare server. Allow techs to analyze
broken server next day.

1 minute. But in reality, 2 CPU servers are redundant to most COPU failures
(had a few cases). Anyway, CPU faiolure is not major reason for server
failures (and never was).





On Sun, Aug 01, 2004 at 09:44:13AM -0700, Michel Py wrote:
In other words, I don't really care if the second processor reduces the
MTBF from 200k hours to 60k hours, but I do care if the second processor
reduces the time to restore service from 24 hours to 20 minutes (7.5
minutes for SNMP to fail the query twice, 1.5 minute for the tech to
find out that either it's frozen or there's a BSOD, 6 minutes to have
someone go there and reset, 5 minutes to reboot).

With the right form factor (nice easy-to-open rackmount unit) it will take
just as little time to swap in an on-site cold-spare. That way you get the
nice MTBF and the short restore time. Also, if you have multiple similar
machines, you drastically reduce your spares inventory.

Unsignificant in my experience, and does not balance what Alexei
mentioned yesterday: a duallie will keep the system up when a faulty
process hogs 100% CPU, because the second one is still available. That
also increases availability ratio.

These days you can achieve the same using hyper-threading for example,
and keep the long MTBF :)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp () stdlib net


Current thread: