nanog mailing list archives

Re: massive facebook outage presently

From: Łukasz Bromirski <lukasz () bromirski net>
Date: Mon, 4 Oct 2021 22:27:57 +0200


Dual homing won’t help you if your automation template will do „no router bgp X” and at this point session will 
terminate as suddenly advertisement will be withdrawn…

It won’t you either if the change triggers some obscure bug in your BGP stack.

I bet FB tested the change on smaller scale and everything was fine, and only then started to roll this over wider 
network and at that point „something” broke. Or some bug needed a moment to start cascading issues around the infra.

-- 
./

On 4 Oct 2021, at 22:00, Michael Thomas <mike () mtcc com> wrote:




On 10/4/21 11:48 AM, Luke Guillory wrote:


I believe the original change was 'automatic' (as in configuration done via a web interface). However, now that 
connection to the outside world is down, remote access to those tools don't exist anymore, so the emergency 
procedure is to gain physical access to the peering routers and do all the configuration locally.

Assuming that this is what actually happened, what should fb have done different (beyond the obvious of not screwing 
up the immediate issue)? This seems like it's a single point of failure. Should all of the BGP speakers have been 
dual homed or something like that? Or should they not have been mixing ops and production networks? Sorry if this 
sounds dumb.

Mike

Current thread:

Re: massive facebook outage presently, (continued)
- - - Re: massive facebook outage presently Michael Thomas (Oct 04)
    - Re: massive facebook outage presently Baldur Norddahl (Oct 04)
    - Re: massive facebook outage presently bzs (Oct 04)
    - Re: massive facebook outage presently Niels Bakker (Oct 04)
    - Re: massive facebook outage presently Doug McIntyre (Oct 04)
    - Re: massive facebook outage presently PJ Capelli via NANOG (Oct 05)
    - Re: massive facebook outage presently Baldur Norddahl (Oct 04)
    - Re: massive facebook outage presently Blake Dunlap (Oct 04)
    - Re: massive facebook outage presently Mel Beckman (Oct 04)
    - Re: massive facebook outage presently Mark Tinka (Oct 04)
    - Re: massive facebook outage presently Łukasz Bromirski (Oct 04)
    - Re: massive facebook outage presently Mark Tinka (Oct 04)
    - Re: massive facebook outage presently bzs (Oct 04)
    - RE: massive facebook outage presently Tony Wicks (Oct 04)
    - Re: massive facebook outage presently Mark Tinka (Oct 04)
    - Re: massive facebook outage presently Karl Auer (Oct 04)
  - Re: massive facebook outage presently Niels Bakker (Oct 04)
- RE: massive facebook outage presently Dmitry Sherman (Oct 04)
  - Re: massive facebook outage presently Tom Beecher (Oct 04)
    - Re: massive facebook outage presently Jay Hennigan (Oct 04)
    - Re: massive facebook outage presently Miles Fidelman (Oct 04)

(Thread continues...)