nanog mailing list archives
Re: Better description of what happened
From: Tom Beecher <beecher () beecher cc>
Date: Wed, 6 Oct 2021 12:48:25 -0400
I mean, at the end of the day they likely designed these systems to be able to handle one or more datacenters being disconnected from the world, and considered a scenario of ALL their datacenters being disconnected from the world so unlikely they chose not to solve for it. Works great, until it doesn't. I'm sure they'll learn from this and in the future have some better things in place to account for such a scenario. On Wed, Oct 6, 2021 at 12:21 PM Bjørn Mork <bjorn () mork no> wrote:
Tom Beecher <beecher () beecher cc> writes:Even if the external announcements were not withdrawn, and the edge DNS servers could provide stale answers, the IPs those answers provided wouldn't have actually been reachableDo we actually know this wrt the tools referred to in "the total loss of DNS broke many of the tools we’d normally use to investigate and resolve outages like this."? Those tools aren't necessarily located in any of the remote data centers, and some of them might even refer to resources outside the facebook network. Not to mention that keeping the DNS service up would have prevented resolver overload in the rest of the world. Besides, the disconnected frontend servers are probably configured to display a "we have a slight technical issue. will be right back" notice in such situations. This is a much better user experience that the "facebook? never heard of it" message we got on monday. yes, it makes sense to keep your domains alive even if your network isn't. That's why the best practice is name servers in more than one AS. Bjørn
Current thread:
- Re: Facebook post-mortems..., (continued)
- Re: Facebook post-mortems... Randy Monroe via NANOG (Oct 05)
- Better description of what happened Michael Thomas (Oct 05)
- Re: Better description of what happened scott (Oct 05)
- Re: Better description of what happened Curtis Maurand (Oct 06)
- Re: Better description of what happened PJ Capelli via NANOG (Oct 06)
- Re: Better description of what happened Andy Brezinsky (Oct 05)
- Re: Better description of what happened Michael Thomas (Oct 05)
- Re: Better description of what happened Hugo Slabbert (Oct 05)
- Re: Better description of what happened Tom Beecher (Oct 06)
- Re: Better description of what happened Bjørn Mork (Oct 06)
- Re: Better description of what happened Tom Beecher (Oct 06)
- Re: Better description of what happened Hugo Slabbert (Oct 06)
- Re: Facebook post-mortems... Masataka Ohta (Oct 05)
- Re: Facebook post-mortems... Bjørn Mork (Oct 05)
- Re: Facebook post-mortems... Masataka Ohta (Oct 06)
- Re: Facebook post-mortems... Bjørn Mork (Oct 06)
- DNS pulling BGP routes? Michael Thomas (Oct 06)
- Re: DNS pulling BGP routes? J. Hellenthal via NANOG (Oct 06)
- Re: DNS pulling BGP routes? Jared Mauch (Oct 06)
- Re: DNS pulling BGP routes? Blake Dunlap (Oct 06)
- Re: DNS pulling BGP routes? Masataka Ohta (Oct 06)