Interesting People mailing list archives

Microsoft server crash nearly causes 800-plane pile-up


From: David Farber <dave () farber net>
Date: Mon, 27 Sep 2004 20:00:02 -0400



Begin forwarded message:

From: Bruce R Koball <bkoball () well com>
Date: September 27, 2004 7:33:48 PM EDT
To: David Farber <dave () farber net>
Cc: ip () v2 listbox com
Subject: Microsoft server crash nearly causes 800-plane pile-up

Dave,

This is a week old, but I don't recall seeing it on IP.

-brk-


http://www.techworld.com/opsys/news/index.cfm?NewsID=2275


Microsoft server crash nearly causes 800-plane pile-up
Failure to restart system caused data overload.

By Matthew Broersma, Techworld

A major breakdown in Southern California's air traffic control system
last week was partly due to a "design anomaly" in the way Microsoft
Windows servers were integrated into the system, according to a report
in the Los Angeles Times.

The radio system shutdown, which lasted more than three hours, left 800
planes in the air without contact to air traffic control, and led to at
least five cases where planes came too close to one another, according
to comments by the Federal Aviation Administration reported in the LA
Times and The New York Times. Air traffic controllers were reduced to
using personal mobile phones to pass on warnings to controllers at other
facilities, and watched close calls without being able to alert pilots,
according to the LA Times report.

The failure was ultimately down to a combination of human error and a
design glitch in the Windows servers brought in over the past three
years to replace the radio system's original Unix servers, according to
the FAA.

The servers are timed to shut down after 49.7 days of use in order to
prevent a data overload, a union official told the LA Times. To avoid
this automatic shutdown, technicians are required to restart the system
manually every 30 days. An improperly trained employee failed to reset
the system, leading it to shut down without warning, the official said.
Backup systems failed because of a software failure, according to a
report in The New York Times.

The contract for designing the system, called Voice Switching and
Control System (VSCS), was awarded to Harris Corporation in 1992 and the
system was installed in the late 1990s, initially using Unix servers,
according to Harris. In 2001, the company completed testing of the VSCS
Control Subsystem Upgrade (VCSU), which replaced the original servers
with off-the-shelf Dell hardware running Microsoft Windows 2000 Advanced
Server. The upgrade was installed in California last year, according to
the FAA.

Soon after installation, however, the FAA discovered that the system
design could lead to a radio system shutdown, and put the maintenance
procedure into place as a workaround, the LA Times said. The FAA
reportedly said it has been working on a permanent fix but has only
eliminated the problem in Seattle. The FAA is now planning to institute
a second workaround - an alert that will warn controllers well before
the software shuts down.

The shutdown is intended to keep the system from becoming overloaded
with data and potentially giving controllers wrong information about
flights, according to a software analyst cited by the LA Times.

Microsoft told Techworld it was aware of the reports but was not
immediately able to comment.

-------------------------------------
You are subscribed as interesting-people () lists elistx com
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/


Current thread: