nanog mailing list archives

Re: Facility wide DR/Continuity


From: Stefan <netfortius () gmail com>
Date: Wed, 3 Jun 2009 10:01:38 -0500

On Wed, Jun 3, 2009 at 7:09 AM, Drew Weaver <drew.weaver () thenap com> wrote:

Hi All,

I'm attempting to devise a method which will provide continuous operation
of certain resources in the event of a disaster at a single facility.

The types of resources that need to be available in the event of a disaster
are ecommerce applications and other business critical resources.

Some of the questions I keep running into are:

               Should the additional sites be connected to the primary site
(and/or the Internet directly)?
               What is the best way to handle the routing? Obviously two
devices cannot occupy the same IP address at the same time, so how do you
provide that instant 'cut-over'? I could see using application balancers to
do this but then what if the application balancers fail, etc?

Any advice from folks on list or off who have done similar work is greatly
appreciated.

Thanks,
-Drew




In an environment where a DR site is deemed critical, it is my experience
that critical business applications also have a test or development
environment associated with the production one. If you look at the problem
this way, then a DR equipped with the test/devel systems, with one
"instance" of production always available, would only be challenging in
terms of data sync. Various SAN solutions would resolve that (SAN sync-ing
over WAN/MAN/etc.). Virtualization of critical systems may also add some
benefits here: clone the critical VMs in the DR, and in conjunction with the
storage being available, you'll be able to bring up this type of machines in
no time - just make sure you have some sort of L2 available - maybe EoS, or
tunneling over an L3 connectivity - tons of info when querying for virtual
machine mobility and inter-site connectivity.

Voice has to be considered, also - f/PSTN - make arrangements with provider
to re-route (8xx) in case of disaster. VoIP may add some extra capabilities
in terms of reachability over the Internet, in case your DR site cannot
accommodate - C/S people, for example, who are critical to interface with
customers in case of disaster (if no information - bigger loss - perception
issues) have to be able to connect even from home.

As far as "immediate" switch from one to another - DNS is the primary
concern (unless some wise people have hardcoded IPs all over), but there are
other issues people tend to forget, at the core of some clilents - take
Oracle "fat" client and its TNS names - I've seen those associated with IPs,
instead of host names ... etc.

Disclaimer: the above = one of many aspects. Have seen DNS comments already,
so I won't repeat those aspects.

HTH,
-- 
***Stefan
http://twitter.com/netfortius


Current thread: