nanog mailing list archives

Re: FYI Netflix is down


From: Jon Lewis <jlewis () lewis org>
Date: Tue, 3 Jul 2012 13:13:39 -0400 (EDT)

On Mon, 2 Jul 2012, david raistrick wrote:

On Mon, 2 Jul 2012, James Downs wrote:

back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem.

Someone needs to define back-plane/control-plane in this case. (and what wasn't working)

Amazon resources are controlled (from a consumer viewpoint) by API - that API is also used by amazon's internal toolkits that support ELB (and RDS..). Those (http accessed) API interfaces were unavailable for a good portion of the outages.

It seems like if you're going to outsource your mission critical infrastructure to "cloud" you should probably pick at least 2 unrelated cloud providers and if at all possible, not outsource the systems that balance/direct traffic...and if you're really serious about it, have at least two of these setup at different facilities such that if the primary goes offline, the secondary takes over. If a cloud provider fails, you redirect to another.

----------------------------------------------------------------------
 Jon Lewis, MCP :)           |  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


Current thread: