Interesting People mailing list archives

"News media deja vu: IBM misquoted again -- From V. Platt"


From: David Farber <farber () central cis upenn edu>
Date: Sun, 18 Dec 1994 06:24:21 -0500

From: pratt () sunburn stanford edu (Vaughan R. Pratt)


====================================================================
Readers of comp.sys.intel might be so good as to forward this message
to those who should see it but lack the time to cope with the
extraordinary volume of correspondence.  It corrects a seemingly slight
yet significant misrepresentation of the IBM study in the San Jose
Mercury News, which is read by many people in a position to understand
the technicalities of the error but who are in no position to read
comp.sys.intel.
====================================================================


In the business section of Wednesday's New York Times I was quoted as
having found the frequency of errors in the Pentium chip *to be*
significantly higher than what Intel has reported.  I corrected this
misattribution at the time in this forum, and that *could be* was the
strongest form of this statement I would stand by.  See bug25 on
boole.stanford.edu:/pub/FDIV/individual.bugs for further details.


Today's San Jose Mercury News (12/17, page 13C) repeats this error,
this time for IBM, reporting that "IBM said that the typical
spreadsheet users *would* see an error every 24 days" (my italics).
Had IBM indeed said such a thing, one could understand the contrast
drawn here with Prof. William Kahan's complaint in the same article, "I
wish people would stop trying to estimate probabilities based on
incomplete information about what is a typical user."  I am in full
agreement with the sentiment of this wish.


As with the NYT article, I am quoted in the SJMN article as fully
agreeing with IBM's conclusion, which I certainly am.  I would have no
problem with being linked to IBM in this way were it not in the context
of these repeated misrepresentations of IBM's conclusion.  By way of
clarifying how it could be that I find myself in any agreement at all
with both IBM and Prof. Kahan when the papers appear to be reporting
them at odds with each other, I would like to draw attention once again
to what it was that IBM actually said in their Pentium study,
*especially* their conclusion, which clearly refutes the Mercury's
interpretation.


As I've said a number of times, both in this forum and to reporters, we
do not know where the great bulk of applications lie on the rate
spectrum.  An excellent quote in this context is Hennessy and
Patterson's "Fallacy: there is such a thing as a typical program"
("Computer Architecture" p.183).


The low-rate end of the Pentium's error-rate spectrum has been
estimated by Intel at one error in 9 billion divisions for random
double-precision data, an estimate not in dispute.  (The yet lower rate
of no errors at all is an even more welcome possibility for
applications doing divisions with suitably constrained data, e.g. small
integers.) There is no controversy here; the controversy arises when
Intel makes the considerably stronger statement that this is where one
can expect to find the typical applications.


IBM's study differs from Intel's in the following two key respects.


First, IBM demonstrates that the error-rate spectrum is a lot wider
than contemplated by Intel, by showing the existence of high-rate
scenarios of a rather more plausible kind than repeatedly performing
4195835/3145727.  Intel in contrast does not raise the question of
whether anywhere near such a high rate might be achieved in any
practical situation.


Second, IBM makes no claim as to where along this spectrum any given
application might happen to reside.  Intel in contrast takes a definite
stand on the low probability of the typical spreadsheet user
encountering the bug once, not just in his own lifetime but in several
hundred lifetimes.


The closest statement IBM makes about actually experiencing any given
rate is with their statement that a user *could* make a mistake every
24 days.  Both this statement and its context make clear that it is a
purely hypothetical statement based on clearly stated assumptions
which, when met, fully and rigorously justify the claimed rate.


No suggestion is made that these assumptions are actually satisfied in
any existing situations.  Readers must draw their own conclusions about
actual likelihoods of encountering such scenarios.  Evidently the
Mercury article has drawn its own conclusion, whether or not warranted,
and then drawn the additional unwarranted conclusion that IBM must have
done the same.


Even had IBM's reference to 24 days inadvertently led some readers
astray, the conclusion of the study should dispel all doubt as to the
basic message of the report.  The final two sentences of the brief
conclusion read as follows.  "In reality, a user could face either
significantly more errors or no errors at all.  If an error occurs, it
will first appear in the fifth or higher significant digit and it may
have no effect or it may have catastrophic effects."


Nowhere in its study does IBM make a claim substantially exceeding this
clear summary.


==========


A possible small objection to IBM's study is that it does not explore
very far the range of dependencies on some of the assumptions made in
their hypothetical scenarios, such as the assumption in some examples
that operands have two decimal digits.  While this is a reasonable
assumption for monetary amounts represented in floating point, one
would like some sense of whether this is an extreme case or whether
other precisions are equally afflicted.


A possible objection to my own studies of small bruised integers is
that it is not clear to the typical spreadsheet user what inference to
draw for typical calculations.  When one knows the incidence of both
integers and bruising in one's spreadsheet it is straightforward to
take my raw error rates for a steady stream of small bruised integers
and dilute them according to the proportion of small bruised integers
encountered in one's own daily use.  But thanks to roundoff when
displaying data to only 6 or 8 decimal digits, bruising tends to
disappear like Mr. Snuffleupagus or an electron when you try to look
directly at it.  Hence most users remain blissfully unaware of the
extent to which their data is subject to bruising.  The typical user's
reaction to the observation that the difference between 4.1 and 1.1 is
not exactly 3 but a slightly bruised 3 is one of incredulity; it took
me half an hour to convince a Byte reporter of the reality of this
phenomenon (we tracked every bit).  This makes the *practical*
significance of my tables quite unclear to most spreadsheet users.


To address these objections I plan to write a short program based on
relatively unobjectionable features of these studies to permit people
to determine for themselves the rate, seriousness, and cumulative
effect of errors for the evaluation of simple expressions involving
division.  Unlike my study, there is no subtraction of small quantities
to force integers to be bruised, and unlike the IBM study the
experimenter gets to choose for herself the distribution of operands by
both size and precision.  Bruising still happens, but now only to the
same extent as in routine spreadsheet calculations, as a result of
arithmetic done with operands prior to their arrival at the division
operation.  My program limits this arithmetic to a single addition of
nonnegative reals, a particularly common operation performed throughout
typical spreadsheets.  The omission of both subtraction and negative
operands removes any concern about the possibility of mistaking the
implicit numerical instability of x/(ya-yb) for a large Pentium error
(and besides, what substance is there in negative numbers?:).


I will release the C program as source; it will be quite short and
hopefully reasonably readable, which should leave no doubt as to what
is going on behind the scenes when running experiments.


The limitation to a single addition for each operand of division
implies that the expression to be analyzed is of the form
(xa+xb)/(ya+yb) with xb and yb optional.  This severe restriction of
the language greatly increases the chances of sampling a fair
cross-section of the available expressions; when one complicates the
language with more constructs, the resulting combinatorial explosion in
variety of expressions makes it far harder to tell whether any given
set of expressions is a fair cross-section, and the results of any
feasible number of experiments then have a less clear significance.


I will include with the program the results of a few experiments
illustrating the full range of error rates, from no errors at all (an
even lower rate than Intel's estimate) to rates higher than Intel's,
all obtainable merely by changing sizes and precisions independently
for the numerator operands and the denominator operands.  The rate
claimed by Intel for typical spreadsheet users is achieved for data
with generous limits on both size and precision of input data.


Besides the overall error rates and cumulative and average magnitudes,
the program will also report, for the much smaller population
consisting of those divisions having any nontrivial error (e.g. at
least one cent in a financial transaction), the minimum, mean, and
maximum error magnitudes, together with their standard deviation.  This
data tells the complaints office whether to expect a large number of
trivial complaints or a much smaller number of serious injuries.  In
the former case a complaints office is neither a worthwhile return on
the investment nor urgently needed; in the latter however it would be
market suicide not to set up an efficient complaints office geared
towards the most likely complaints.

--
Vaughan Pratt           http://boole.stanford.edu/boole.html
                        My every word is copyright, especially those
                        not in the dictionary.



Current thread: