nanog mailing list archives
Re: [outages] Major Level3 (CenturyLink) Issues
From: Warren Kumari <warren () kumari net>
Date: Wed, 2 Sep 2020 15:24:21 -0400
On Wed, Sep 2, 2020 at 3:04 PM Vincent Bernat <bernat () luffy cx> wrote:
❦ 2 septembre 2020 16:35 +03, Saku Ytti:I am not buying it. No normal implementation of BGP stays online, replying to heart beat and accepting updates from ebgp peers, yet after 5 hours failed to process withdrawal from customers.I can imagine writing BGP implementation like this a) own queue for keepalives, which i always serve first fully b) own queue for update, which i serve second c) own queue for withdraw, which i serve lastOr maybe, graceful restart configured without a timeout on IPv4/IPv6? The flowspec rule severed the BGP session abruptly, stale routes are kept due to graceful restart (except flowspec rules), BGP sessions are reestablished but the flowspec rules is handled before before reaching EoR and we loop from there.
... or all routes are fed into some magic route optimization box which is designed to keep things more stable and take advantage of cisco's "step-10" to suck more traffic, or.... The root issue here is that the *publicc* RFO is incomplete / unclear. Something something flowspec something, blocked flowspec, no more something does indeed explain that something bad happened, but not what caused the lack of withdraws / cascading churn. As with many interesting outages, I suspect that we will never get the full story, and "Something bad happened, we fixed it and now it's all better and will never happen ever again, trust us..." seems to be the new normal for public postmortems... W
-- Make sure your code "does nothing" gracefully. - The Elements of Programming Style (Kernighan & Plauger)
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
Current thread:
- Re: [outages] Major Level3 (CenturyLink) Issues, (continued)
- Re: [outages] Major Level3 (CenturyLink) Issues Aaron C. de Bruyn via NANOG (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Randy Bush (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Mike Hammett (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Baldur Norddahl (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Tom Beecher (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Baldur Norddahl (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Luke Guillory (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Vincent Bernat (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Warren Kumari (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Jon Lewis (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Dantzig, Brian (Sep 02)
- Re: [outages] Major Level3 (CenturyLink) Issues Mark Tinka (Sep 03)
- Re: [outages] Major Level3 (CenturyLink) Issues Robert Raszuk (Sep 03)
- Re: [outages] Major Level3 (CenturyLink) Issues Mark Tinka (Sep 03)