nanog mailing list archives

Re: [outages] Major Level3 (CenturyLink) Issues

From: Tom Beecher <beecher () beecher cc>
Date: Wed, 2 Sep 2020 09:57:46 -0400

Yeah. This actually would be a fascinating study to understand exactly what
happened. The volume of BGP messages flying around because of the session
churn must have been absolutely massive, especially in a complex internal
infrastructure like 3356 has.

I would say the scale of such an event has to be many orders of magnitude
beyond what anyone ever designed for, so it doesn't shock me at all that
unexpected behavior occurred. But that's why we're engineers ; we want to
understand such things.

On Wed, Sep 2, 2020 at 9:37 AM Saku Ytti <saku () ytti fi> wrote:

On Wed, 2 Sep 2020 at 16:16, Baldur Norddahl <baldur.norddahl () gmail com>
wrote:

I am not buying it. No normal implementation of BGP stays online,

replying to heart beat and accepting updates from ebgp peers, yet after 5
hours failed to process withdrawal from customers.

I can imagine writing BGP implementation like this

 a) own queue for keepalives, which i always serve first fully
 b) own queue for update, which i serve second
 c) own queue for withdraw, which i serve last

Why I might think this makes sense, is perhaps I just received from
RR2 prefix I'm pulling from RR1, if I don't handle all my updates
first, I'm causing outage that should not happen, because I already
actually received the update telling I don't need to withdraw it.

Is this the right way to do it? Maybe not, but it's easy to imagine
why it might seem like a good idea.

How well BGP works in common cases and how it works in pathologically
scaled and busy cases are very different cases.

I know that even in stable states commonly run vendors on commonly run
hardware can take +2h to finish converging iBGP on initial turn-up.

--
  ++ytti

Current thread:

Re: [outages] Major Level3 (CenturyLink) Issues, (continued)
- - Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Vincent Bernat (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Randy Bush (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Aaron C. de Bruyn via NANOG (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Randy Bush (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Mike Hammett (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Baldur Norddahl (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Saku Ytti (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Tom Beecher (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Baldur Norddahl (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Luke Guillory (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Vincent Bernat (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Warren Kumari (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Jon Lewis (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Dantzig, Brian (Sep 02)
    - Re: [outages] Major Level3 (CenturyLink) Issues Mark Tinka (Sep 03)
    - Re: [outages] Major Level3 (CenturyLink) Issues Robert Raszuk (Sep 03)
    - Re: [outages] Major Level3 (CenturyLink) Issues Mark Tinka (Sep 03)