nanog mailing list archives

Re: Redundant Data Center Architectures

From: Truman Boyes <truman () suspicious org>
Date: Thu, 29 Oct 2009 15:52:05 +1100


On 29/10/2009, at 8:39 AM, Stefan Fouant wrote:

-----Original Message-----
From: Darren Bolding [mailto:darren () bolding org]
Sent: Wednesday, October 28, 2009 4:57 PM
To: Roland Dobbins
Cc: NANOG list
Subject: Re: Redundant Data Center Architectures
Also, commercial solutions from F5 (their GTM product and their old3-
DNS
product).
Using CDN's is also a way of handling this, but you need to beprepared
for
all your traffic to come from their source-ip's or do creative things
with
x-forwarded-for etc.
Making an active/active datacenter design work (or preferably onewith
enough sites such that more than one can be down without seriously
impacted
service) is a serious challenge.  Lots of people will tell you (and
sell you
solutions for) parts of the puzzle.  My experience has been that the
best
case is when the architecture of the application/infrastructure have
been
designed with these challenges in mind from the get-go.  I have seen
that
done on the network and server side, but never on the software side-
that
has always required significant effort when the time came.

The "drop in" solutions for this (active/active database replication,
middleware solutions, proxies) are always expensive in one way or
another
and frequently have major deployment challenges.

The network side of this can frequently be the easiest to resolve, in
my
experience.  If you are serving up content that does not require
synchronized data on the backend, then that will make your life much
easier,
and GSLB, a CDN or similar may help a great deal.
Thanks everyone who has responded so far.
I should have clarified my intent a bit in the original email. I amdefinitely interested in architectures which support synchronizeddata between data center locations in as close to real-time aspossible. More specifically, I am interested in designs whichsupport zero down-time during failures, or as close to zero down-time as possible. GSLB, Anycast, CDNs... those types of approachescertainly have their place especially where the pull-model isemployed (DNS, Netflix, etc.). However, what types of solutions arebeing used for synchronized data and even network I/O on back-endsystems? I've been looking at the VMware vSphere 4 Fault Tolerancestuff to synchronize the data storage and network I/O acrossdistributed virtual machines, but still worried about theconsequences of doing this stuff across WAN links with high degreesof latency, etc. From the thread I get the feeling that L2interconnects (VPLS, PWs) are generally considered a bad thing, Igathered as much as I figured there would be lots of unintendedconsequences with regards to designated router elections and otherweirdness. Besides connecting sites via L3 VPNs, what otherapproaches are others using? Also, would appreciate any comments tothe synchronization items above.
Thanks,

--
Stefan Fouant

Layer 2 interconnects (whether they are VPLS / PWE3 / or other CCC-based models) are not bad in their own right, but I think it'simportant to realize that extending a (sub)network across largegeographical regions because applications are not buildingintelligence about locality or presence is a move without intelligentengineering. I hear it all the time: just extend layer 2 between thesetwo data centers so that we can have either (1) disaster recovery or(2) vmotion / heart beats / etc ...

The truth is we can do things better and smarter than just extendingbridging domains across disparate geographical locations. Real timestorage should ideally be local .. but there is no reason why it can'tbe "available" over the cloud to other networks. The key is to have asingle namespace for all storage, to not be tied to a particularstorage technology, but to simply be able to present the storage/disk/mount point to virtual machines.

Extending layer 2 for iSCSI / SAN / and even FCoE is feasible. But,lets think about the technology in detail .. FCoE uses pause frames,and when there is significant geographical delay between sites, thenFCoE is not the right technology. It works great locally .. and thisshould be just one technology to deliver storage locally in DCs.

Internally I would explore DNS (GSLB / anycast / etc) .. and evenideas like mobile IPv4 / IPv6 mobility before I started extendinglayer 2 domains across the world.


Kind regards,
Truman

Current thread:

Redundant Data Center Architectures Stefan Fouant (Oct 28)
- Re: Redundant Data Center Architectures ChrisSerafin (Oct 28)
- Re: Redundant Data Center Architectures Roland Dobbins (Oct 28)
  - Re: Redundant Data Center Architectures Charles Wyble (Oct 28)
  - Re: Redundant Data Center Architectures Ray Sanders (Oct 28)
    - Re: Redundant Data Center Architectures Roland Dobbins (Oct 28)
    - Re: Redundant Data Center Architectures Brandon Galbraith (Oct 28)
    - Re: Redundant Data Center Architectures Darren Bolding (Oct 28)
    - RE: Redundant Data Center Architectures Stefan Fouant (Oct 28)
    - Re: Redundant Data Center Architectures Truman Boyes (Oct 28)
  - Re: Redundant Data Center Architectures Ryan Brooks (Oct 28)
  - Re: Redundant Data Center Architectures Brandon Galbraith (Oct 28)
    - Re: Redundant Data Center Architectures Roland Dobbins (Oct 28)