nanog mailing list archives

Re: Facebook post-mortems...


From: Hank Nussbacher <hank () interall co il>
Date: Wed, 6 Oct 2021 07:51:52 +0300

On 05/10/2021 21:11, Randy Monroe via NANOG wrote:
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/

Lets try to breakdown this "engineering" blog posting:

- "During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network"

Can anyone guess as to what command FB issued that would cause them to withdraw all those prefixes?

- "it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we’d normally use to investigate and resolve outages like this. Our primary and out-of-band network access was down..."

Does this mean that FB acknowledges that the loss of DNS broke their OOB access?

-Hank


Current thread: