nanog mailing list archives

Re: Famous operational issues


From: Pierre Emeriaud <petrus.lt () gmail com>
Date: Tue, 16 Feb 2021 23:52:01 +0100

Le mar. 16 févr. 2021 à 21:03, Job Snijders via NANOG
<nanog () nanog org> a écrit :

https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment/

The experiment triggered a bug in some Cisco router models: affected
Ciscos would corrupt this specific BGP announcement ** ON OUTBOUND **.
Any peers of such Ciscos receiving this BGP update, would (according to
then current RFCs) consider the BGP UPDATE corrupted, and would
subsequently tear down the BGP sessions with the Ciscos. Because the
corruption was not detected by the Ciscos themselves, whenever the
sessions would come back online again they'd reannounce the corrupted
update, causing a session tear down. Bounce ... Bounce ... Bounce ... at
global scale in both IBGP and EBGP! :-)

In a similar fashion, a network I know had a massive outage when a
failing linecard corrupted is-is lsps, triggering a flood of purges
and taking down the whole backbone.

This was pre-rfc6232, so you can guess that resolving the issue was a real PITA.

This kind of outages fuels my netops nightmares.


Current thread: