nanog mailing list archives

Re: Any2 LAX


From: Bryan Holloway <bryan () shout net>
Date: Fri, 11 Jun 2021 20:18:24 +0200

This is what I got from those guys ...

--

CoreSite Incident Notification


Description: During a planned maintenance event to integrate new hardware into our MPLS core an extreme dip in Any2 traffic was observed. After about 4 hours running in a degraded state, an emergency case was opened with the hardware vendor. After working with the hardware vendor to rule out any possible hardware or software bugs, the network engineering team located the source of the traffic loss. It was an errant configuration applied by the custom automation written to build LSP's in our MPLS network. A formal IR will be provided for this event.




On 6/11/21 8:03 PM, jim deleskie wrote:
Also saw a major traffic drop. There is a Root Cause to be issued early in the week I'm told.


-jim

On Fri, Jun 11, 2021 at 2:42 PM Siyuan Miao <aveline () misaka io <mailto:aveline () misaka io>> wrote:

    Yea, it was down but both RS are online and feeding us unreachable
    nexthops during the outage .

    On Sat, Jun 12, 2021 at 1:27 AM Seth Mattinen <sethm () rollernet us
    <mailto:sethm () rollernet us>> wrote:

        On 6/11/21 10:16 AM, Jon Lewis wrote:
         > On Fri, 11 Jun 2021, Seth Mattinen wrote:
         >
         >> Did Any2 LAX barf last night between about 1am and 8am
        Pacific time?
         >
         > More like 00:00-7:45 (Pacific time).
         >
         > Anyone know what broke, and why the IX was dead for nearly 8
        hours?
         > This is our second recent issue with "an Any2 IX", having
        dealt with an
         > IX partition event at Any2 Denver just a few weeks ago.
         >


        What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
        advertised through the route servers. Most of my direct BGP
        sessions
        were down, but a handful were still working including the route
        servers.

        For example, I was getting routes for AS29791 from the route
        servers,
        but nexthop 206.72.211.106 was dead to me. Not to pick on
        Internap other
        than a mutual customer called me directly at 1am and wanted to
        know why
        things were down.

        I killed the route server sessions and went back to sleep.

        Feels like LA1 and LA2 got split, but however the route servers
        interconnect still worked, which was problematic.



Current thread: