nanog mailing list archives

Re: Data Center testing


From: Warren Kumari <warren () kumari net>
Date: Wed, 26 Aug 2009 17:39:57 -0400


On Aug 24, 2009, at 9:38 AM, Dan Snyder wrote:

We have done power tests before and had no problem. I guess I am looking for someone who does testing of the network equipment outside of just power tests. We had an outage due to a configuration mistake that became apparent
when a switch failed.

So, one of the better ways to make sure that your failover system is working when you need it is just to do away with the concept of a failover system and make your "failover" system be part of your "primary" system
.
This means that your failover system is always passing traffic and you know that it is alive and well -- it also helps mitigate the pain when a device fails (you are sharing the load over both systems and so only half as much traffic gets disrupted). Scheduled maintenance is also simpler and less stressful as you already know that your other path is alive and well.

Your design and use case dictates how exactly you implement this, but in general it involves things like tuning your IGP so you are using all your links, staggering VLANs if you rely on them, multiple VRRP groups per subnet, etc.

This does require a tiny bit more planning during the design phase, and also requires that you check every now and then to make sure that you are actually using both devices (and didn't, for example, shift traffic to one device and then forget to shift it back :-)). It also requires that you keep capacity issues in mind -- in a primary and failover scenario you might be able to run devices fairly close to capacity, but if you are sharing the load you need to keep things under 50% (so when you *do* have a failure the remaining device can handle the full load) -it's important to make this clear to the finance folks before going down this path :-)

W

 It didn't cause a problem however when we did a power
test for the whole data center.

-Dan


On Mon, Aug 24, 2009 at 9:31 AM, Ken Gilmour <ken.gilmour () gmail com> wrote:

I know Peer1 in vancouver reguarly send out notifications of
"non-impacting" generator load testing, like monthly. Also InterXion
in Dublin, Ireland have occasionally sent me notification that there
was a power outage of less than a minute however their backup
successfully took the load.

I only remember one complete outage in Peer1 a few years ago... Never
seen any outage in InterXion Dublin.

Also I don't ever remember any power failure at AiNet (Deepak will
probably elaborate)

2009/8/24 Dan Snyder <sliplever () gmail com>:
Does any one know of any data centers that do failure testing of their
networking equipment
regularly? I mean to verify that everything fails over properly after
changes have been made over
time.  Is there any best practice guides for doing this?

Thanks,
Dan



--
"Does Emacs have the Buddha nature? Why not? It has bloody well everything else!"




Current thread: