nanog mailing list archives
Re: Centurylink having a bad morning?
From: Tom Beecher <beecher () beecher cc>
Date: Mon, 31 Aug 2020 17:24:37 -0400
In this specific event, 3356 not withdrawing routes is certainly a head scratcher, and I'm sure for many the thing we're most looking forward to a definitive answer on. However, if a network only has 3356 as their upstream, they are 100% at the mercy of 3356 at all times. Having a redundant AND diverse connection to a 2nd upstream ASN at least provides you some options. In this case for example, let's say at all times you did a +2 prepend to both 3356 and Acme. 3356 even happens, you shut down your session to them. Some percentage of your traffic that would have been faceplanting in/through 3356 now works via Acme. Then you notice the non-withdrawl issue. You can then remove 1 prepend, or perhaps deagg strategically to try and get more traffic away from the trouble. A redundant path to a different.upstream at least provides you some potential options to work around that with which you otherwise could not. It wouldn't be perfect, but options > no options. On Mon, Aug 31, 2020 at 5:08 PM Warren Kumari <warren () kumari net> wrote:
On Mon, Aug 31, 2020 at 4:36 PM Tom Beecher <beecher () beecher cc> wrote:Hopefully those customers learned the difference between redundancy anddiversity this weekend. :) I'm unclear how either solves things for many customers... If they had CenturyLink and AcmeNetworkWidgets, and announce the same network through both -- and their connection to CL went down, *but CL continues to announce / doesn't withdraw* they are still stuck, yes? (Unless they can deaggregate that is...) What am I missing? WOn Mon, Aug 31, 2020 at 3:48 PM Eric Kuhnke <eric.kuhnke () gmail com>wrote:There's a number of enterprise end user type customers of 3356 thathave on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver () thenap com>wrote:I also found the part where they mention that a lot of hostingcompanies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.From: NANOG <nanog-bounces+drew.weaver=thenap.com () nanog org> OnBehalf Of Tom BeecherSent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank () interall co il> Cc: NANOG <nanog () nanog org> Subject: Re: Centurylink having a bad morning?https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/I definitely found Mr. Prince's writing about yesterday's eventsfascinating.Verizon makes a mistake with BGP filters that allows a secondarymistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.L3 allows an erroneous flowspec announcement to cause massive globalconnectivity issues, and Mr. Prince shrugs and says "Incidents happen."On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank () interall co il>wrote:On 30/08/2020 20:08, Baldur Norddahl wrote:https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/Sounds like Flowspec possibly blocking tcp/179 might be the cause. But that is Cloudflare speculation. Regards, Hank Caveat: The views expressed above are solely my own and do not expressthe views or opinions of my employerAn outage is what it is. I am not worried about outages. We havemultiple transits to deal with that.It is the keep announcing prefixes after withdrawal from peers andcustomers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.But I disagree in that it would be impossible. They need to make agood report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?Regards, Baldur On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho () gmail com>wrote:Exactly. And asking that they somehow prove this won't happen again isimpossible.- Mike Bolitho On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver () thenap com>wrote:I’m not defending them but I am sure it isn’t intentional. From: NANOG <nanog-bounces+drew.weaver=thenap.com () nanog org> OnBehalf Of Baldur NorddahlSent: Sunday, August 30, 2020 9:28 AM To: nanog () nanog org Subject: Re: Centurylink having a bad morning? How is that acceptable behaviour? I shall remember never to make acontract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe () breathe-underwater com>:Finally got through on their support line and spoke to level1. Theonly thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog () nanog org>wrote:Hello, Woke up this morning to a bunch of reports of issues with connectivityhad to shut down some Level3/CTL connections to get it to return to normal.As of right now their support portal won’t load:https://www.centurylink.com/business/login/Just wondering what others are seeing.-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
Current thread:
- Re: Centurylink having a bad morning?, (continued)
- Re: Centurylink having a bad morning? Tomas Lynch (Aug 31)
- Re: Centurylink having a bad morning? Martijn Schmidt via NANOG (Aug 31)
- Re: Centurylink having a bad morning? Mike Bolitho (Aug 31)
- Re: Centurylink having a bad morning? Jason Kuehl (Aug 31)
- Re: Centurylink having a bad morning? Bryan Holloway (Aug 31)
- RE: Centurylink having a bad morning? Drew Weaver (Aug 31)
- Re: Centurylink having a bad morning? Eric Kuhnke (Aug 31)
- Re: Centurylink having a bad morning? Warren Kumari (Aug 31)
- Re: Centurylink having a bad morning? Tom Beecher (Aug 31)
- Re: Centurylink having a bad morning? Warren Kumari (Aug 31)
- Re: Centurylink having a bad morning? Tom Beecher (Aug 31)
- Re: Centurylink having a bad morning? Ben Cannon (Aug 31)
- Re: Centurylink having a bad morning? Bjørn Mork (Aug 31)
- Re: Centurylink having a bad morning? Ross Tajvar (Aug 30)
- Re: Centurylink having a bad morning? Jared Geiger (Aug 30)
- Re: Centurylink having a bad morning? Chris Adams (Aug 30)
- Re: Centurylink having a bad morning? Baldur Norddahl (Aug 30)
- Re: Centurylink having a bad morning? Saku Ytti (Aug 30)
- RE: Centurylink having a bad morning? Drew Weaver (Aug 30)
- Re: Centurylink having a bad morning? Jason Kuehl (Aug 30)
- RE: Centurylink having a bad morning? Drew Weaver (Aug 30)