nanog mailing list archives

Re: Centurylink having a bad morning?


From: Ben Cannon <ben () 6by7 net>
Date: Mon, 31 Aug 2020 13:41:31 -0700

We’re bailing out a customer in exactly this same boat as we speak.  There are so many.

Ms. Benjamin PD Cannon, ASCE
6x7 Networks & 6x7 Telecom, LLC 
CEO 
ben () 6by7 net
"The only fully end-to-end encrypted global telecommunications company in the world.”

FCC License KJ6FJJ



On Aug 31, 2020, at 12:52 PM, Eric Kuhnke <eric.kuhnke () gmail com> wrote:


There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their 
stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber 
paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and 
fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.



On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver () thenap com> wrote:
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and 
also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple 
paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.

 

 

From: NANOG <nanog-bounces+drew.weaver=thenap.com () nanog org> On Behalf Of Tom Beecher
Sent: Monday, August 31, 2020 9:26 AM
To: Hank Nussbacher <hank () interall co il>
Cc: NANOG <nanog () nanog org>
Subject: Re: Centurylink having a bad morning?

 

https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/

 

I definitely found Mr. Prince's writing about yesterday's events fascinating.

 

Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to 
propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are. 

 

L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and 
says "Incidents happen." 

 

 

 

 

 

On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank () interall co il> wrote:

On 30/08/2020 20:08, Baldur Norddahl wrote:

 

https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/

 

Sounds like Flowspec possibly blocking tcp/179 might be the cause.

 

But that is Cloudflare speculation.

 

Regards,
Hank

Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer

 

An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.

 

It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is 
killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I 
do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a 
contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible 
I guess I won't be getting a contract with them.

 

But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and 
how they changed the design, so something like this can not happen again. The basic design of BGP is such that this 
should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database 
or something?

 

Regards,

 

Baldur

 

On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho () gmail com> wrote:

Exactly. And asking that they somehow prove this won't happen again is impossible.

- Mike Bolitho

 

On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver () thenap com> wrote:

I’m not defending them but I am sure it isn’t intentional.

 

From: NANOG <nanog-bounces+drew.weaver=thenap.com () nanog org> On Behalf Of Baldur Norddahl
Sent: Sunday, August 30, 2020 9:28 AM
To: nanog () nanog org
Subject: Re: Centurylink having a bad morning?

 

How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove 
that they won't advertise my prefixes after I pull them. Under any circumstances. 

 

søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe () breathe-underwater com>:

Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue 
with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried 
failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.

 

On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog () nanog org> wrote:

Hello,

 

Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections 
to get it to return to normal.

 

As of right now their support portal won’t load: https://www.centurylink.com/business/login/

 

Just wondering what others are seeing.

 

 

Current thread: