nanog mailing list archives

RE: massive facebook outage presently


From: Luke Guillory <LGuillory () reservetele com>
Date: Mon, 4 Oct 2021 18:48:12 +0000

From what I believe was a FB employee on Reddit, account now deleted it seems.


As many of you know, DNS for FB services has been affected and this is likely a symptom of the actual issue, and that's 
that BGP peering with Facebook peering routers has gone down, very likely due to a configuration change that went into 
effect shortly before the outages happened (started roughly 1540 UTC).



There are people now trying to gain access to the peering routers to implement fixes, but the people with physical 
access is separate from the people with knowledge of how to actually authenticate to the systems and people who know 
what to actually do, so there is now a logistical challenge with getting all that knowledge unified.



Part of this is also due to lower staffing in data centers due to pandemic measures.



I believe the original change was 'automatic' (as in configuration done via a web interface). However, now that 
connection to the outside world is down, remote access to those tools don't exist anymore, so the emergency procedure 
is to gain physical access to the peering routers and do all the configuration locally.



https://twitter.com/jgrahamc/status/1445068309288951820 "About five minutes before Facebook's DNS stopped working we 
saw a large number of BGP changes (mostly route withdrawals) for Facebook's ASN."




From: NANOG <nanog-bounces+lguillory=reservetele.com () nanog org> On Behalf Of Baldur Norddahl
Sent: Monday, October 4, 2021 1:41 PM
To: NANOG <nanog () nanog org>
Subject: Re: massive facebook outage presently

*External Email: Use Caution*
I got a mail that Facebook was leaving NLIX. Maybe someone botched the script so they took down all BGP sessions 
instead of just NLIX and now they can't access the equipment to put it back... :-)


man. 4. okt. 2021 20.31 skrev Billy Croan <BCroan () unrealservers net<mailto:BCroan () unrealservers net>>:
I know what this is.....  They forgot to update the credit card on their godaddy account and the domain lapsed.  I 
guess it will be facebook.info<https://link.edgepilot.com/s/7bad5051/Di9CwLEB1E6iB_KlhyWtZA?u=http://facebook.info/> 
when they get it back online.  The post mortem should be an interesting read.

On Mon, Oct 4, 2021 at 11:46 AM Jason Kuehl <jason.w.kuehl () gmail com<mailto:jason.w.kuehl () gmail com>> wrote:
Looks like they run there own nameservers and I see the soa records are even missing.

On Mon, Oct 4, 2021, 12:23 PM Mel Beckman <mel () beckman org<mailto:mel () beckman org>> wrote:
Here’s a screenshot:

 -mel beckman


On Oct 4, 2021, at 9:06 AM, Eric Kuhnke <eric.kuhnke () gmail com<mailto:eric.kuhnke () gmail com>> wrote:

https://link.edgepilot.com/s/3926b9ff/bTkszib6zUmYbE_rZxhltQ?u=https://downdetector.com/status/facebook/

Normally not worth mentioning random $service having an outage here, but this will undoubtedly generate a large volume 
of customer service calls.

Appears to be failure in DNS resolution.



Links contained in this email have been replaced. If you click on a link in the email above, the link will be analyzed 
for known threats. If a known threat is found, you will not be able to proceed to the destination. If suspicious 
content is detected, you will see a warning.

Current thread: