nanog mailing list archives

Re: Never push the Big Red Button (New York City subway failure)


From: Tom Beecher <beecher () beecher cc>
Date: Wed, 15 Sep 2021 14:35:43 -0400


If the generators are "emergency power", and you need to switch back to
"utility power", obviously the way to do this must be the big red button,
clearly marked as "EMERGENCY POWER OFF", no?!


The owner of my previous company did the same thing to us many years ago
because there was a small smudge on the placard between POWER and OFF that
he interpreted as a dash.

He was never happy with the custom sign I hung after that, REVENUE
REDUCTION SWITCH. But he never tried to be helpful after that, so
mission accomplished.

On Fri, Sep 10, 2021 at 4:35 PM Warren Kumari <warren () kumari net> wrote:



On Fri, Sep 10, 2021 at 4:21 PM Baldur Norddahl <baldur.norddahl () gmail com>
wrote:

A nearby datacenter once lost power delayed because someone hit the
switch to transfer from city power to generator power and then failed to
notice. The power went out the day after when there was no fuel left.


:-)

A story, told to me by a friend...

The utility let them know that they were going to be doing some
maintenance work in the area. No impact expected, but out of an abundance
of caution, they transfer over to generators. After the utility lets them
know that the maintenance work is all finished, they want to switch back.
If the generators are "emergency power", and you need to switch back to
"utility power", obviously the way to do this must be the big red button,
clearly marked as "EMERGENCY POWER OFF", no?!

I suspect it is apocryphal, but it's still entertaining,
W




On Fri, Sep 10, 2021 at 9:24 PM Matthew Huff <mhuff () ox com> wrote:

Since we are telling power horror stories…





How about the call from the night operator that arrived at 10:00pm
asking “Is there any reason there is no power in the data center?”



Turns out someone had plugged in a new high end workgroup laser printer
to the outside wall of the datacenter. The power receptacle was wired into
the data center’s UPS and completely smoked the UPS. Luckily the static
transfer switched worked, but the three mainframes weren’t’ happy…





Or



Our building had a major ground fault issue that took years to find and
resolve. We got hit with lightning that caused the mainframe to fault and
recycle…and two minutes in, we got hit by lightning again. When the system
failed to start, we called IBM support. When we explained what happened
there was a very long pause…then some mumbling off phone, then the manager
got on the line and said someone would be flying out and be onsite within
12 hours. We were down for 3 days, and got fined $250,000 by the insurance
regulators since we couldn’t pay claims.



*Matthew Huff* | Director of Technical Operations | OTA Management LLC



*Office: 914-460-4039*

*mhuff () ox com <mhuff () ox com> | **www.ox.com <http://www.ox.com>*


*...........................................................................................................................................*



*From:* Chris Kane <ccie14430 () gmail com>
*Sent:* Friday, September 10, 2021 3:16 PM
*To:* Christopher Morrow <morrowc.lists () gmail com>
*Cc:* Matthew Huff <mhuff () ox com>; nanog () nanog org
*Subject:* Re: Never push the Big Red Button (New York City subway
failure)



True EPO story; maintenance crew carrying new drywall into the data
center backed into the EPO that didn't have a cover on it. One of the most
eerie sounds in networking...a completely silent data center.



-chris



On Fri, Sep 10, 2021 at 2:48 PM Christopher Morrow <
morrowc.lists () gmail com> wrote:





On Fri, Sep 10, 2021 at 1:49 PM Matthew Huff <mhuff () ox com> wrote:

Reminds me of something that happened about 25 years ago when an
elementary school visited our data center of the insurance company where I
worked. One of our operators strategically positioned himself between the
kids and the mainframe, leaned back and hit it's EPO button.



Or when your building engineering team cuts themselves a new key for the
'main breaker' for the facility... and tests it at 2pm on a tuesday.

Or when that same team cuts a second key (gotta have 2 keys!) and tests
that key on the same 'main breaker' ... at 2pm on the following tuesday.



<quadruple face palm>



not fakenews, a real story from a large building full of gov't employees
and computers and all manner of 'critical infrastructure' for the agency
occupying said building.



Matthew Huff | Director of Technical Operations | OTA Management LLC

Office: 914-460-4039
mhuff () ox com | www.ox.com

...........................................................................................................................................

-----Original Message-----
From: NANOG <nanog-bounces+mhuff=ox.com () nanog org> On Behalf Of Sean
Donelan
Sent: Friday, September 10, 2021 12:38 PM
To: nanog () nanog org
Subject: Never push the Big Red Button (New York City subway failure)

NEW YORK CITY TRANSIT RAIL CONTROL CENTER POWER
OUTAGE ISSUE ON AUGUST 29, 2021
Key Findings
September 8, 2021



https://www.governor.ny.gov/sites/default/files/2021-09/WSP_Key_Findings_Summary-for_release.pdf

Key Findings
[...]

3. Based on the electrical equipment log readings and the manufacturer’s
official assessment, it was determined that the most likely cause of RCC
shutdown was the “Emergency Power Off” button being manually activated.

Secondary Findings

1. The “Emergency Power Off” button did not have a protective cover at
the
time of the shutdown or the following WSP investigation.

[...]
Mitigation Steps

1. Set up the electrical equipment Control and Communication systems
properly to stay active so that personnel can monitor RCC electrical
system operations.

[...]




--

Chris Kane



--
The computing scientist’s main challenge is not to get confused by the
complexities of his own making.
  -- E. W. Dijkstra


Current thread: