Interesting People mailing list archives
"News media deja vu: IBM misquoted again -- From V. Platt"
From: David Farber <farber () central cis upenn edu>
Date: Sun, 18 Dec 1994 06:24:21 -0500
From: pratt () sunburn stanford edu (Vaughan R. Pratt) ==================================================================== Readers of comp.sys.intel might be so good as to forward this message to those who should see it but lack the time to cope with the extraordinary volume of correspondence. It corrects a seemingly slight yet significant misrepresentation of the IBM study in the San Jose Mercury News, which is read by many people in a position to understand the technicalities of the error but who are in no position to read comp.sys.intel. ==================================================================== In the business section of Wednesday's New York Times I was quoted as having found the frequency of errors in the Pentium chip *to be* significantly higher than what Intel has reported. I corrected this misattribution at the time in this forum, and that *could be* was the strongest form of this statement I would stand by. See bug25 on boole.stanford.edu:/pub/FDIV/individual.bugs for further details. Today's San Jose Mercury News (12/17, page 13C) repeats this error, this time for IBM, reporting that "IBM said that the typical spreadsheet users *would* see an error every 24 days" (my italics). Had IBM indeed said such a thing, one could understand the contrast drawn here with Prof. William Kahan's complaint in the same article, "I wish people would stop trying to estimate probabilities based on incomplete information about what is a typical user." I am in full agreement with the sentiment of this wish. As with the NYT article, I am quoted in the SJMN article as fully agreeing with IBM's conclusion, which I certainly am. I would have no problem with being linked to IBM in this way were it not in the context of these repeated misrepresentations of IBM's conclusion. By way of clarifying how it could be that I find myself in any agreement at all with both IBM and Prof. Kahan when the papers appear to be reporting them at odds with each other, I would like to draw attention once again to what it was that IBM actually said in their Pentium study, *especially* their conclusion, which clearly refutes the Mercury's interpretation. As I've said a number of times, both in this forum and to reporters, we do not know where the great bulk of applications lie on the rate spectrum. An excellent quote in this context is Hennessy and Patterson's "Fallacy: there is such a thing as a typical program" ("Computer Architecture" p.183). The low-rate end of the Pentium's error-rate spectrum has been estimated by Intel at one error in 9 billion divisions for random double-precision data, an estimate not in dispute. (The yet lower rate of no errors at all is an even more welcome possibility for applications doing divisions with suitably constrained data, e.g. small integers.) There is no controversy here; the controversy arises when Intel makes the considerably stronger statement that this is where one can expect to find the typical applications. IBM's study differs from Intel's in the following two key respects. First, IBM demonstrates that the error-rate spectrum is a lot wider than contemplated by Intel, by showing the existence of high-rate scenarios of a rather more plausible kind than repeatedly performing 4195835/3145727. Intel in contrast does not raise the question of whether anywhere near such a high rate might be achieved in any practical situation. Second, IBM makes no claim as to where along this spectrum any given application might happen to reside. Intel in contrast takes a definite stand on the low probability of the typical spreadsheet user encountering the bug once, not just in his own lifetime but in several hundred lifetimes. The closest statement IBM makes about actually experiencing any given rate is with their statement that a user *could* make a mistake every 24 days. Both this statement and its context make clear that it is a purely hypothetical statement based on clearly stated assumptions which, when met, fully and rigorously justify the claimed rate. No suggestion is made that these assumptions are actually satisfied in any existing situations. Readers must draw their own conclusions about actual likelihoods of encountering such scenarios. Evidently the Mercury article has drawn its own conclusion, whether or not warranted, and then drawn the additional unwarranted conclusion that IBM must have done the same. Even had IBM's reference to 24 days inadvertently led some readers astray, the conclusion of the study should dispel all doubt as to the basic message of the report. The final two sentences of the brief conclusion read as follows. "In reality, a user could face either significantly more errors or no errors at all. If an error occurs, it will first appear in the fifth or higher significant digit and it may have no effect or it may have catastrophic effects." Nowhere in its study does IBM make a claim substantially exceeding this clear summary. ========== A possible small objection to IBM's study is that it does not explore very far the range of dependencies on some of the assumptions made in their hypothetical scenarios, such as the assumption in some examples that operands have two decimal digits. While this is a reasonable assumption for monetary amounts represented in floating point, one would like some sense of whether this is an extreme case or whether other precisions are equally afflicted. A possible objection to my own studies of small bruised integers is that it is not clear to the typical spreadsheet user what inference to draw for typical calculations. When one knows the incidence of both integers and bruising in one's spreadsheet it is straightforward to take my raw error rates for a steady stream of small bruised integers and dilute them according to the proportion of small bruised integers encountered in one's own daily use. But thanks to roundoff when displaying data to only 6 or 8 decimal digits, bruising tends to disappear like Mr. Snuffleupagus or an electron when you try to look directly at it. Hence most users remain blissfully unaware of the extent to which their data is subject to bruising. The typical user's reaction to the observation that the difference between 4.1 and 1.1 is not exactly 3 but a slightly bruised 3 is one of incredulity; it took me half an hour to convince a Byte reporter of the reality of this phenomenon (we tracked every bit). This makes the *practical* significance of my tables quite unclear to most spreadsheet users. To address these objections I plan to write a short program based on relatively unobjectionable features of these studies to permit people to determine for themselves the rate, seriousness, and cumulative effect of errors for the evaluation of simple expressions involving division. Unlike my study, there is no subtraction of small quantities to force integers to be bruised, and unlike the IBM study the experimenter gets to choose for herself the distribution of operands by both size and precision. Bruising still happens, but now only to the same extent as in routine spreadsheet calculations, as a result of arithmetic done with operands prior to their arrival at the division operation. My program limits this arithmetic to a single addition of nonnegative reals, a particularly common operation performed throughout typical spreadsheets. The omission of both subtraction and negative operands removes any concern about the possibility of mistaking the implicit numerical instability of x/(ya-yb) for a large Pentium error (and besides, what substance is there in negative numbers?:). I will release the C program as source; it will be quite short and hopefully reasonably readable, which should leave no doubt as to what is going on behind the scenes when running experiments. The limitation to a single addition for each operand of division implies that the expression to be analyzed is of the form (xa+xb)/(ya+yb) with xb and yb optional. This severe restriction of the language greatly increases the chances of sampling a fair cross-section of the available expressions; when one complicates the language with more constructs, the resulting combinatorial explosion in variety of expressions makes it far harder to tell whether any given set of expressions is a fair cross-section, and the results of any feasible number of experiments then have a less clear significance. I will include with the program the results of a few experiments illustrating the full range of error rates, from no errors at all (an even lower rate than Intel's estimate) to rates higher than Intel's, all obtainable merely by changing sizes and precisions independently for the numerator operands and the denominator operands. The rate claimed by Intel for typical spreadsheet users is achieved for data with generous limits on both size and precision of input data. Besides the overall error rates and cumulative and average magnitudes, the program will also report, for the much smaller population consisting of those divisions having any nontrivial error (e.g. at least one cent in a financial transaction), the minimum, mean, and maximum error magnitudes, together with their standard deviation. This data tells the complaints office whether to expect a large number of trivial complaints or a much smaller number of serious injuries. In the former case a complaints office is neither a worthwhile return on the investment nor urgently needed; in the latter however it would be market suicide not to set up an efficient complaints office geared towards the most likely complaints. -- Vaughan Pratt http://boole.stanford.edu/boole.html My every word is copyright, especially those not in the dictionary.
Current thread:
- "News media deja vu: IBM misquoted again -- From V. Platt" David Farber (Dec 18)