nanog mailing list archives

Re: [nanog] Famous operational issues


From: Patrick Schultz <lists-nanog () schultz top>
Date: Sat, 12 Jun 2021 17:57:02 +0200

opening the link currently gives me a HTTP 500 error, very fitting :)

Am 12.06.2021 um 04:42 schrieb Dan Mahoney:
I only just now found this thread, so I'm sorry I'm late to the party, but here, I put it on Medium.

https://gushi.medium.com/the-worst-day-ever-at-my-day-job-beff7f4170aa

On Mar 12, 2021, at 10:07 PM, Mark Tinka <mark@tinka.africa> wrote:

Hardly famous and not service-affecting in the end, but figured I'd share an incident from our side that occurred 
back in 2018.

While commissioning a new node in our Metro-E network, an IPv6 point-to-point address was mis-typed. Instead of 
ending in /126, it ended in /12. This happened in Johannesburg.

We actually came across this by chance while examining the IGP table of another router located in Slough, and found 
an entry for 2c00::/12 floating around. That definitely looked out of place, as we never carry parent blocks in our 
IGP.

Running the trace from Slough led us back to this one Metro-E device in Jo'burg.

It took everyone nearly an hour to figure out the typo, because for all the laser focus we had on the supposed link 
of the supposed box that was creating this problem, we all overlooked the fact that the /12 configured on the 
point-to-point link was
actually supposed to have been a /126.

The reason this never caused a service problem was because we do not redistribute our IGP into BGP (not that anyone 
should). And even if we did, there are a ton of filters and BGP communities on all devices to ensure a route such as 
that would have
never made it out of our AS.

Also, the IGP contains the most specific paths to every node in our network, so the presence of the 2c00::/12 was 
mostly cosmetic. It would have never been used for routing decisions.

Mark.

Current thread: