nanog mailing list archives

Re: Level3 worldwide emergency upgrade?


From: Dorian Kim <dorian () blackrose org>
Date: Thu, 7 Feb 2013 17:12:14 -0500

No one had hit the ISIS bug before the IETF enforced maintenance freeze because no one in their right mind would be 
running three week old code back then. I don't think things have changed that much. ;)

-dorian

On Feb 7, 2013, at 4:19 PM, Siegel, David wrote:

I remember being glued to my workstation for 10 straight hours due to an OSPF bug that took down the whole of net99's 
network.

I was pretty proud of our size at the time...about 30Mbps at peak.  Times are different and so are expectations.  :-)

Dave


-----Original Message-----
From: Brett Watson [mailto:brett () the-watsons org] 
Sent: Wednesday, February 06, 2013 6:07 PM
To: nanog () nanog org
Subject: Re: Level3 worldwide emergency upgrade?

Hell, we used to not have to bother notifying customers of anything, we just fixed the problem. Reminds me a of a 
story I've probably shared on the past. 

1995, IETF in Dallas. The "big ISP" I worked for at the time got tripped up on a 24-day IS-IS timer bug (maybe all of 
them at the time did, I don't recall)  where all adjacencies reset at once. That's like, entire network down. Working 
with our engineering team in the *terminal* lab mind you, and Ravi Chandra (then at Cisco) we reloaded the entire 
network of routers with new code from Cisco once they'd fixed the bug. I seem to remember this being my first 
exposure to Tony Li's infamous line, "... Confidence Level: boots in the lab."

Good times.

-b


On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote:

David. I am on an evening shift and am just now reading this thread.   

I was almost tempted to write an explanation that would have had 
identical content with yours based simply on Level3 doing something 
and keeping the information close.

Responsible Vendors do not try to hide what is being done unless it is 
an Op Sec issue and I have never seen Level3 act with less than 
responsibility so it had to be Op Sec.

When it is that, it is best if the remainder of us sit quietly on the 
sidelines.

Ralph Brandt


-----Original Message-----
From: Siegel, David [mailto:David.Siegel () Level3 com]
Sent: Wednesday, February 06, 2013 12:01 PM
To: 'Ray Wong'; nanog () nanog org
Subject: RE: Level3 worldwide emergency upgrade?

Hi Ray,

This topic reminds me of yesterday's discussion in the conference 
around getting some BCOP's drafted.  it would be useful to confirm my 
own view of the BCOP around communicating security issues.  My 
understanding for the best practice is to limit knowledge distribution 
of security related problems both before and after the patches are 
deployed.  You limit knowledge before the patch is deployed to prevent 
yourself from being exploited, but you also limit knowledge afterwards 
in order to limit potential damage to others (customers, 
competitors...the Internet at large).  You also do not want to 
announce that you will be deploying a security patch until you have a 
fix in hand and know when you will deploy it (typically, next 
available maintenance window unless the cat is out of the bag and danger is real and imminent).

As a service provider, you should stay on top of security alerts from 
your vendors so that you can make your own decision about what action 
is required.  I would not recommend relying on service provider 
maintenance bulletins or public operations mailing lists for obtaining 
this type of information.  There is some information that can cause 
more harm than good if it is distributed in the wrong way and 
information relating to security vulnerabilities definitely falls into that category.

Dave

-----Original Message-----
From: Ray Wong [mailto:rayw () rayw net]
Sent: Wednesday, February 06, 2013 9:16 AM
To: nanog () nanog org
Subject: Re: Level3 worldwide emergency upgrade?



OK, having had that first cup of coffee, I can say perhaps the main 
reason I was wondering is I've gotten used to Level3 always being on 
top of things (and admittedly, rarely communicating). They've reached 
the top by often being a black box of reliability, so it's (perhaps
unrealistically) surprising to see them caught by surprise. Anything 
that pushes them into scramble mode causes me to lose a little sleep 
anyway. The alternative to what they did seems likely for at least a 
few providers who'll NOT manage to fix things in time, so I may well 
be looking at longer outages from other providers, and need to issue 
guidance to others on what to do if/when other links go down for 
periods long enough that all the cost-bounding monitoring alarms start 
to scream even louder.

I was also grumpy at myself for having not noticed advance 
communication, which I still don't seem to have, though since I 
outsourced my email to bigG, I've noticed I'm more likely to miss 
things. Perhaps giving up maintaining that massive set of procmail 
rules has cost me a bit more edge.

Related, of course, just because you design/run your network to 
tolerate some issues doesn't mean you can also budget to be in support 
contract as well. :) Knowing more about the exploit/fix might mean 
trying to find a way to get free upgrades to some kit to prevent more 
localized attacks to other types of gear, as well, though in this case 
it's all about Juniper PR839412 then, so vendor specific, it seems?

There are probably more reasons to wish for more info, too. There's 
still more of them (exploiters/attackers) than there are those of us 
trying to keep things running smoothly and transparently, so anything 
that smells of "OMG new exploit found!" also triggers my desire to 
share information. The network bad guys share information far more 
quickly and effectively than we do, it often seems.

-R>









Current thread: