Interesting People mailing list archives

IP: Pentium II Math Bug?

From: David Farber <farber () cis upenn edu>
Date: Fri, 09 May 1997 17:36:22 -0400
I can not vouch in any way for this one. djf


Pentium II Math Bug?




------------------------------------------------------------------------




It would appear that there may be a bug in the floating point unit of the
new Pentium II Processor, as well as the current Pentium Pro Processor. Is
it real? Is it serious? It appears to be real. The observed behavior
contradicts the IEEE Floating Point Specifications, and Intel's printed
documentation. However, I'm not a numerical analyst, and therefore I'm not
qualified to comment on its seriousness or its implications. Instead, I'll
present the facts herein, and leave the determination to you.


The Facts


I received email from "Dan" who asked if I could reproduce what he thought
was a bug in the Pentium Pro processor. I wrote an assembly language
program that checked into the problem. I also ran the test on a Pentium-II
processor that I had recently bought at Fry's Electronics, an Intel Pentium
Processor (P54C), Intel Pentium Processor with MMX Technology (P55C), and
an AMD K6. Sure enough, I came to the same conclusion as Dan: it looks like
a bug to me.


What do we call this bug?


These days, astronomers name new stars and comets by combining the
discoverer's name and some number. Why should microprocessor bugs be any
different? In this case, "Dan" is the discoverer of the bug, and 04-11
(1997) is the date on which I got my first email about it. So I've named
the bug "Dan-0411" after its discoverer and the date he first reported it
to me.


What is the bug, and what does it affect?


The bug relates to operations that convert floating point numbers into
integer numbers. Floating point numbers are stored inside of the
microprocessor in an 80-bit format. Integer numbers are stored in two
different sizes. A short integer is stored in 16-bits, and a long integer
is stored in 32-bits. It is often desirable to store the 80-bit floating
point numbers as integer numbers. Sometimes the converted number won't fit
into the smaller integer format. This is when the bug occurs.


The host software is supposed to be warned by the microprocessor when such
a floating point conversion error occurs; a specific error flag is supposed
to be set in a floating point status register. If the microprocessor fails
to set this flag, it would not be in compliance with the IEEE Floating
Point Standards which mandate such behavior. For the Dan-0411 bug, the
Pentium II and Pentium Pro processors fail to set this error flag in many
cases.=20


When storing 16-bit integers, the chance of randomly hitting the bug is
247/280 or 1 in 8,589,934,592 (1 in 8.6 billion). When storing 32-bit
integers, the chance is 231/280 or 1 in 562,949,953,421,312 (1 in 562,950
billion). That's approximately 140,739,635,839,000 different floating point
numbers that result in the incorrect behavior. The Pentium, Pentium with
MMX Technology, and AMD K6 microprocessors do not appear to have this=
 problem.


It might be interesting to note that a launch failure of the Ariane 5
rocket, which happened less than a minute into the launch, was traced to
behavior around an overflow condition (in this case, it was software, not
hardware, that was the problem). One of the computers on board had a
floating point to integer conversion that overflowed, but because the
overflow was not handled by the software the computer did a dump of its
memory. Unfortunately, this memory dump was interpreted by the rocket as
instructions to its rocket nozzles. Result--boom!=20


There is a stuffy but complete description of this story (which is actually
quite interesting) at http://www.math.ufl.edu/~cws/3114/ariane-siam.html


Why wasn't this bug detected before?


I'm not exactly sure why this bug wasn't detected sooner, but there are a
few clues that could help provide an explanation. There appears to be a bug
in a popular floating point test program. If Intel relied on this program,
its bug may have inadvertently allowed the Dan-0411 bug to slip by
undetected. Professor William Kahan of Berkeley has written a suite of
floating point test programs in the FORTRAN programming language. (Please
refer to Dr. Kahan's home page at http://http.cs.berkeley.edu/~wkahan.)
These programs are commonly used to test the Float-to-Integer Store
instructions (FIST and FISTP). FORTRAN compilers may have differences in
how they handle bit-wise expressions. These compiler differences could make
this test behave differently as well. Technically, it looks like the
original intent of Dr. Kahan's was to use a bit-wise AND instead of a
logical AND in his original FORTRAN source code; this is a potential
non-portability issue -- as I'm not sure how AND is defined by the FORTRAN
standard. This "non-portable" code was discovered when Dan tried to convert
Dr. Kahan's FORTRAN source code to the C programming language -- which has
separate bit-wise and logical AND operators. Dan recognized Dr. Kahan's
original intent and used the proper bit-wise AND operator in his C source
code. This is when the bug appeared in the chip. So in the end, either a
bug in the test software, or in a FORTRAN compiler, may have hidden a bug
in the chip.


That's the end of the non-technical discussion. For further technical
details, continue reading.




------------------------------------------------------------------------




How did I get involved?


"Dan, who wants his full name to remain anonymous, sent me the following
email on April 11, 1997 (reprinted with permission):






Robert,




There seems to be a bug in the FIST[P] m16int and FIST[P] m32int
instructions for the P6 (Pentium Pro).  Some (perhaps all) values
in the following ranges fail to set the IE (Invalid operation Exception)
flag as required for integer overflow.


FIST[P] m32int: [ c05e80000000000000001, c05e8000000080000000 ] (~-295)
FIST[P] m16int: [ c06e80000000000000001, c06e8000800000000000 ] (~-2111)


(Number of failing mantissas =3D 231 + 247)




Example on P6 (Pentium Pro):
  fcw =3D 0x37f
  FIST[P] m16int c06e80000000000000001 -> 8000 (stored in memory)
  FPU status word:  B C3 TOP C2 C1 C0 ES SF PE UE OE ZE DE IE
                    0  0 000  0  0  0  0  0  1  0  0  0  0  0
  ***FAIL***




Example on P5 (Pentium):
  fcw =3D 0x37f
  FIST[P] m16int c06e80000000000000001 -> 8000 (stored in memory)
  FPU status word:  B C3 TOP C2 C1 C0 ES SF PE UE OE ZE DE IE
                    0  0 000  0  0  0  0  0  0  0  0  0  0  1




Prof. William Kahan at U.C. Berkeley wrote the following FORTRAN programs
to test floating-point to integer conversions:


  http://HTTP.CS.Berkeley.EDU/~wkahan/tests/fistest2.lst
  http://HTTP.CS.Berkeley.EDU/~wkahan/tests/fistest4.lst




The following line in the "fistest" programs is non-portable FORTRAN
and could prevent the P6 bug from being detected:


199                         Li =3D ((kflag.AND.Invalid) .NE. Invalid) .OR.=
 Li


-- Dan




Dan wanted to make sure that there wasn't a bug in his C source code, or
his C compiler. That's when he contacted me. Dan wanted me to write
assembly language source code on his behalf. By writing in assembly
language, the floating point hardware may be tested directly and queried
directly for its response without the possible influence of compiler bugs
and such.=20


Normally I don't get involved in debugging other people's problems or
writing source code on their behalf. But Dan was persistent. Within a day
or two, Dan had come up with some very concrete examples of the bug and
instructions which I could use as guidelines for reproducing it. I still
wasn't convinced that I wanted to be involved (not being a floating point
expert). But after 10 days or so, I finally became convinced, and that's
when I wrote the first piece of assembly language source code to detect the
Dan-0411 bug.


The Nature of the Bug


This bug occurs when a large negative floating point number is stored to
memory in an integer format. Under normal operation, the largest negative
integer is stored in memory when a floating point number is too large to
fit in the integer format. The FPU Status Word indicates that an Invalid
operand Exception (IE) occurred (FSW.IE =3D 1).=20


Storing floating point numbers that overflow the "real number" format are
supposed to behave differently than floating point numbers that overflow
the "integer number" format. Floating point numbers set the overflow flag
(FSW.OE =3D 1), not the Invalid operand Exception flag (FSW.IE). Instead of
setting the Invalid operand Exception flag (FSW.IE), the Dan-0411 bug sets
the Precision Exception flag (FSW.PE =3D 1). The Pentium Pro Family
Developer's Manual, Volume 2, section 7.8.4 makes this difference quite=
 clear:






The FPU reports a floating-point numeric overflow exception (#O) whenever
the rounded result of an arithmetic instruction exceeds the largest
allowable finite value that will fit into the real format of the
destination operand. For example, if the destination format is
extended-real (80 bits), overflow occurs when the rounded result falls
outside the unbiased range of -1.0 * 216834 to 1.0 * 216834 (exclusive).
Numeric overflow can occur on arithmetic operations where the result is
stored in an FPU data register. It can also occur on store-real operations
(with the FST and FSTP instructions), where a within-range value in a data
register is stored in memory in a single-or double-real format. The
overflow threshold range for the single-real format is -1.0 * 2128 to 1.0 *
2128; the range for the double-real format is -1.0 * 21024 to 1.0 * 21024.






That explains how float-to-real overflows are supposed to be handled. But
the Pentium Pro manual is very specific by making a distinction between
float-to-real overflows and float-to-integer overflows. In fact, the very
next paragraph in the Pentium Pro manual describes the behavior for the
exact conditions exposed by Dan-0411.






The numeric overflow exception cannot occur when overflow occurs when
storing values in an integer or BCD integer format. Instead, the
invalid-arithmetic-operand exception is signaled.






As I said, this is the precise condition which is not being met by the
Pentium Pro and Pentium II microprocessors. The programs that demonstrate
Dan-0411 will set up these conditions and test whether or not the proper
error condition codes are set by the microprocessor.


Is this already a known bug?


Part of the process of disclosing this bug, was ensuring that it hadn't
already been reported in any of Intel's errata documents. Thanks to Intel
for providing electronic versions of their errata for the Pentium and
Pentium Pro microprocessors, it's very easy to perform an electronic search
to see if this bug has been previously reported. Using this technique, I
could not find any documentation disclosing the Dan-0411 bug on either the
Pentium or Pentium Pro microprocessors.


The Source Code & Programs


I have provided one source code file, and two executable programs. In the
case of the executable programs, both are executable versions of the
stand-alone assembly language source code. The first program, FISTBUG.EXE
demonstrates the bug in a very simple manner. All that appears on the
screen is the simple message:


*** Dan-0411 bug found. ***


- or -


Dan-0411 not found.


The second program, FISTBUGV.EXE runs the same exact tests as the first,
but is much more verbose. This program shows the microprocessor stepping
information and itemized results. Each operand under test is printed to the
screen, along with pass/fail status for four different testing methods.


The Results


I ran this test on various Pentia and other microprocessors. For
demonstration purposes of this article, I will show the results of the
Intel 486, Pentium (P54C), Pentium with MMX Technology (P55C), AMD K6,
Pentium Pro, and Pentium II microprocessors. These results demonstrate that
the bug is only present on the Pentium Pro and Pentium II microprocessors.
All other processors I tested did not demonstrate the Dan-0411 bug.


Conclusion


After reading this, I'm sure than many people will work vigorously to
verify or refute my test results. For this reason, I've provided the source
code along with executable binaries that can be run in DOS or Windows.
Since I'm not a numerical analyst, you should draw your own conclusions or
rely on the conclusions of a qualified expert as to the significance of the
Dan-0411 bug. One thing I can say conclusively: the Pentium Pro and Pentium
II processors behave differently than their predecessors.


Send your feedback.=20


Tell me how significant you think this bug is. Send me your feedback. Your
feedback will be posted publicly -- here. This might help me understand the
significance of this bug and how it might affect your life. Please send
mailto:fistbug () x86 org.


To read what other people have had to say about the Dan-0411 bug, please
click here.




------------------------------------------------------------------------




View results of FISTBUG


ftp://ftp.x86.org/source/fistbug/fistbug.res


Source Code Availability


View source code for FISTBUG.EXE and FISTBUGV.EXE
ftp://ftp.x86.org/source/fistbug/fistbug.asm
ftp://ftp.x86.org/source/fistbug/makefile


Executable Programs


Download FISTBUG.EXE and FISTBUGV.EXE binary executables.
ftp://ftp.x86.org/source/fistbug/fistbug.exe
ftp://ftp.x86.org/source/fistbug/fistbugv.exe
ftp://ftp.x86.org/source/fistbug/Dan0411x.ZIP


The Entire FISTBUG Archive


Download FISTBUG.ZIP archive. Archive contains source code, binary
executables, and my results.
ftp://ftp.x86.org/dloads/FISTBUG.ZIP




------------------------------------------------------------------------




Back to Secrets and Bugs


<Picture: Return to the><Picture: x86 Monthly Digest home page><Picture>
------------------------------------------------------------------------


=A9 1991-1997 x86 Monthly Digest and Robert Collins. PGP key available.


Make no mistake!
This web site is proud to provide superior information and service without
any affiliation to Intel Corporation.


"Intel Secrets", "What Intel doesn't want you to know" and anything with a
dropped e in it, are phrases that infuriate Intel Corporation.


Pentium, Intel, and the letter "I" are registered trademarks of Intel
Corporation. 386, 486, 586, P6, all other letters, and all other numbers
are not!
All other trademarks are those of their respective companies. See
Trademarks and Disclaimers for more info.


Robert Collins works somewhere in the United States of America. Robert may
be reached via email or telephone.
Current thread:

IP: Pentium II Math Bug? David Farber (May 09)