nanog mailing list archives

Re: FYI Netflix is down


From: Cameron Byrne <cb.list6 () gmail com>
Date: Sat, 30 Jun 2012 06:12:27 -0700

On Jun 30, 2012 12:25 AM, "joel jaeggli" <joelja () bogus com> wrote:

On 6/30/12 12:11 AM, Tyler Haske wrote:

I am not a computer science guy but been around a long time.  Data
centers
and clouds are like software.  Once they reach a certain size, its
impossible to keep the bugs out.  You can test and test your heart out
and
something will slip by.  You can say the same thing about nuclear
reactors,
Apollo moon missions, the NorthEast power grid, and most other
technology
disasters.

How to run a datacenter 101. Have more then one location, preferably
far apart. It being Amazon I would expect more. :/

there are 7 regions  in ec2 three in north  america two in asia one in
europe and one in south america.

us east coast, the one currently being impacted is further subdivided
into 5 availability zones.

us east 1d appears to be the only one currently being impacted.

distributing your application is left as an exercise to the reader.



+1

Sorry to be the monday morning quarterback, but the sites that went down
learned a valuable lesson in single point of failure analysis.  A highly
redundant and professionally run data center is a single point of failure.

Geo-redundancy is key. In fact, i would take distributed data centers over
RAID, UPS, or any other "fancy pants" © mechanisms any day.

And,  aws East also seems to be cursed. I would run out of west for a
while. :-)

I would also look into clouds of clouds. ... Who knows. Amazon could have
an Enron moment, at which point a corporate entity with a tax id is now a
single point of failure.

Pay your money, take your chances.

CB


Current thread: