Interesting People mailing list archives

more on United computer outage


From: David Farber <dave () farber net>
Date: Thu, 5 Jan 2006 16:37:03 -0500

Hell we did better in the 50s with ESS Telephone systems

Begin forwarded message:

From: John Levine <johnl () iecc com>
Date: January 5, 2006 4:21:57 PM EST
To: dave () farber net
Subject: Re: [IP] more on United computer outage

A "processor" failure??!! djf

Yup, almost certainly processor as in CPU.

Airline systems like Galileo still run on tight clusters of IBM
mainframes.  These are basically database engines with phenomenal
transaction rates.  While it's not hard to do distributed searches in
parallel, updates are limited by locking, which works worse the more
computers you have contending for the locks.  So the core systems are
clusters of a few mainframes, each with a couple of dozen CPUs and
shared memory, cranking away on the transactions.

Modern mainframes are designed to be very, very reliable.  The CPUs
come in groups of maybe 16, with at least two of the 16 reserved as
spares, and extensive hardware checking so that if a CPU fails, one of
the spares takes over immediately.  They have facilities for doing hot
add and remove of equipment which work well enough that the system
uptime is measured in years.  It sounds to me like one of the CPUs
wedged in some way that the recovery hardware couldn't deal with, and
if the system is wedged, it's down.  This is a big embarassment for
IBM since the main selling point for million dollar mainframes is
reliability.

I'll be interested to hear what if any reports we get about what the
problem was.

R's,
John



-------------------------------------
You are subscribed as lists-ip () insecure org
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/


Current thread: