nanog mailing list archives
Re: Data Center testing
From: Jack Bates <jbates () brightok net>
Date: Wed, 26 Aug 2009 09:22:12 -0500
James Hess wrote:
Config checking can't say much about silent hardware failures. Unanticipated problems are likely to arise in failover systems, especially complicated ones. A failover system that has not been periodically verified may not work as designed.
I've seen 3-4 failover failures in the last year alone on the sonet transport gear. In almost each case, the backup cards were dead when the primary either died or induced errors causing telco to switch to the backup card. I have no doubts that they haven't been testing. While it didn't effect most of my network, I have a few customers that aren't multihomed, and it wiped them out in the middle of the day up to 3 hours.
There can be other types of errors: Possibly there is a damaged patch cable, dying port, failing power supply, or other hardware on the warm spare that has silently degraded and its poor condition won't be detected (until it actually tries to take a heavy workload, blows a fuse, eats a transceiver, and everything just falls apart).
Lots of weird things to test for. I remember once rebooting a c5500 that had been cruising along for 3 years and the bootup diag detected 1/2 a linecard as bad, which had been running decently up until the reload. Over the years, I think I've seen or detected everything you mentioned either during routine testing or in production "oh crap" events.
Jack
Current thread:
- Data Center testing Dan Snyder (Aug 24)
- RE: [SPAM-HEADER] - Data Center testing - Email has different SMTP TO: and MIME TO: fields in the email addresses Rod Beck (Aug 24)
- Re: Data Center testing Ken Gilmour (Aug 24)
- Re: Data Center testing Dan Snyder (Aug 24)
- Re: Data Center testing Jack Bates (Aug 24)
- Re: Data Center testing eric clark (Aug 25)
- Re: Data Center testing Jeff Aitken (Aug 25)
- RE: Data Center testing Frank Bulk - iName.com (Aug 25)
- Re: Data Center testing Jeff Aitken (Aug 26)
- Re: Data Center testing Dan Snyder (Aug 24)
- Re: Data Center testing James Hess (Aug 25)
- Re: Data Center testing Jack Bates (Aug 26)
- Re: Data Center testing Ross Vandegrift (Aug 26)
- RE: Data Center testing Dylan Ebner (Aug 26)
- RE: Data Center testing Deepak Jain (Aug 26)
- Re: Data Center testing Matthew Palmer (Aug 27)
- Re: Data Center testing Warren Kumari (Aug 26)
- Re: Data Center testing Seth Mattinen (Aug 24)
- RE: Data Center testing Deepak Jain (Aug 24)