nanog mailing list archives

Re: CenturyLink RCA?


From: Saku Ytti <saku () ytti fi>
Date: Mon, 31 Dec 2018 17:06:35 +0200

Hey Steve,

I will continue to speculate, as that's all we have.

1.  Are you telling me that several line cards failed in multiple cities in the same way at the same time?  Don't 
think so unless the same software fault was propagated to all of them.  If the problem was that they needed to be 
reset, couldn't that be accomplished by simply reseating them?

L2 DCN/OOB, whole network shares single broadcast domain

2.  Do we believe that an OOB management card was able to generate so much traffic as to bring down the optical 
switching?  Very doubtful which means that the systems were actually broken due to trying to PROCESS the "invalid 
frames".  Seems like very poor control plane management if the system is attempting to process invalid data and 
bringing down the forwarding plane.

L2 loop. You will kill your JNPR/CSCO with enough trash on MGMT ETH.
However I can be argued that optical network should fail up in absence
of control-plane, IP network has to fail down.

3.  In the cited document it was stated that the offending packet did not have source or destination information.  If 
so, how did it get propagated throughout the network?

BPDU

My guess at the time and my current opinion (which has no real factual basis, just years of experience) is that a bad 
software package was propagated through their network.

Lot of possible reasons, I choose to believe what they've communicated
is what the writer of the communication thought that happened, but as
they likely are not SME it's broken radio communication. BCAST storm
on L2 DCN would plausibly fit the very ambiguous reason offered and is
something people actually are doing.

-- 
  ++ytti


Current thread: