nanog mailing list archives

Re: United Airlines is Down (!) due to network connectivity problems


From: Matthew Huff <mhuff () ox com>
Date: Wed, 8 Jul 2015 19:02:01 +0000

Traders on the floor are being told that it’s a software glitch from new software that was rolled out Tuesday night. 
Nothing official has been said.  The only thing I know for sure is that if the NYSE was hacked, they wouldn’t tell 
anyone the details for a long time, if ever.

The impact of the NYSE being down is much less significant than it used to be since most stocks are multiple-listed on 
other exchanges.

The lack of information through official channels is unusual though. In previous situations, there has been at least a 
little hand-holding. So far, nada. In fact, other than financial service provider’s emails, there has been no emails so 
far today from the NYSE, including the announcement of resumption of service. According the the NYSE web page, trading 
will resume at 3:05pm EST today with primary specialist, and 3:10 for everyone.




On Jul 8, 2015, at 2:33 PM, Brett Frankenberger <rbf+nanog () panix com> wrote:

On Wed, Jul 08, 2015 at 01:55:43PM -0400, Valdis.Kletnieks () vt edu wrote:
On Wed, 08 Jul 2015 17:42:52 -0000, Matthew Huff said:

Given that the technical resources at the NYSE are significant and
the lengthy duration of the outage, I believe this is more serious
than is being reported.

My personal, totally zero-info suspicion:

Some chuckleheaded NOC banana-eater made a typo, and discovered an
entirely new class of wondrous BGP-wedgie style "We know how we got
here, but how do we get back?" network misbehaviors....

We don't know how long the underlying problem lasted, and how much of
the continued outage time is dealing with the logistics of restarting
trading mid-day.  Completely stopping and then restarting trading
mid-day is likely not a quick process even if the underlying technical
issue is immediately resolved.

(Such things have happened before - like the med school a few years ago that
extended their ethernet spanning tree one hop too far, and discovered that
merely removing the one hop too far wasn't sufficient to let it come back up...)

No, but picking a bridge in the center, giving it priority sufficient
for it to become root, and then configuring timers[1] that would
support a much larger than default diameter, possibly followed by some
reboots, probably would have.  

From what has been publicly stated, they likely took a much longer and
more complicated path to service restoration than was strictly
necessary.  (I have no non-public information on that event.  There may
be good reasons, technical or otherwise, why that wasn't the chosen
solution.)

    -- Brett

[1] You only have to configure them on the root; non-root bridges use
what root sends out, not what they ahve configured.


Current thread: