nanog mailing list archives

Re: Facebook post-mortems...


From: Warren Kumari <warren () kumari net>
Date: Tue, 5 Oct 2021 14:07:46 -0400

On Tue, Oct 5, 2021 at 1:47 PM Miles Fidelman <mfidelman () meetinghouse net>
wrote:

jcurran () istaff org wrote:

Fairly abstract - Facebook Engineering -
https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr
<https://m.facebook.com/nt/screen/?params=%7B%22note_id%22:10158791436142200%7D&path=/notes/note/&_rdr>

Also, Cloudflare’s take on the outage -
https://blog.cloudflare.com/october-2021-facebook-outage/

FYI,
/John

This may be a dumb question, but does this suggest that Facebook publishes
rather short TTLs for their DNS records?  Otherwise, why would an internal
failure make them unreachable so quickly?


Looks like 60 seconds:

$  dig +norec star-mini.c10r.facebook.com. @d.ns.c10r.facebook.com.

; <<>> DiG 9.10.6 <<>> +norec star-mini.c10r.facebook.com. @
d.ns.c10r.facebook.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25582
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;star-mini.c10r.facebook.com. IN A

;; ANSWER SECTION:
star-mini.c10r.facebook.com. 60 IN A 157.240.229.35

;; Query time: 42 msec
;; SERVER: 185.89.219.11#53(185.89.219.11)
;; WHEN: Tue Oct 05 14:01:06 EDT 2021
;; MSG SIZE  rcvd: 72



... and cue the "Bwahahhaha! If *I* ran Facebook I'd make the TTL be [2
sec|30sec|5min|1h|6h+3sec|1day|6months|maxint32]" threads....

Choosing the TTL is a balancing act between stability, agility, load,
politeness, renewal latency, etc -- but I'm sure NANOG can boil it down to
"They did it wrong!..."

W


Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

Theory is when you know everything but nothing works.
Practice is when everything works but no one knows why.
In our lab, theory and practice are combined:
nothing works and no one knows why.  ... unknown



-- 
The computing scientist’s main challenge is not to get confused by the
complexities of his own making.
  -- E. W. Dijkstra

Current thread: