Date: Mon, 26 Mar 2001 20:22:34 -0800 From: Alexander Driker Organization: ST To: wkahan@cs.berkeley.edu Subject: x87 question Dr. Kahan: I am observing something that I believe is incongruous, perhaps you can validate (or refute) my thoughts. I am getting different results when a single multiply is done in larger (extended or double) precision internally and then stored (converted) as single precision versus the same operation when done in single precision internally. Only one arithmetic operation is performed and I believe that all of these alternatives should match. I tried this on SPARC and it works as I expected. I am attaching a simple "C" program (compilable using BCC and SPARC gcc) which illustrates these cases. Thank you, Alex Driker =================================================== filename="multst.c" #define BCC #include #ifdef BCC #include #endif #define SINGLE 0 #define DOUBLE 1 #define EXTEND 2 #define RESERV 3 //======================================== // //======================================== #ifdef BCC void set_precision(int prec) { switch(prec) { case SINGLE : _control87( (0<<8), 0x0300); break; case DOUBLE : _control87( (2<<8), 0x0300); break; case EXTEND : _control87( (3<<8), 0x0300); break; case RESERV : _control87( (1<<8), 0x0300); break; default: printf("Set_precision: No such precision\n"); } printf(" cw = %x\n", _control87( 0, 0)); } #endif //======================================== // //======================================== #ifdef BCC void main() #else int main() #endif { float opa, opb, res; double opad, opbd, resd; unsigned int *pinta, *pintb, *pintr; unsigned int *pintad, *pintbd, *pintrd; pinta = (unsigned int*)&opa; pintb = (unsigned int*)&opb; pintr = (unsigned int*)&res; pintad = (unsigned int*)&opad; pintbd = (unsigned int*)&opbd; pintrd = (unsigned int*)&resd; // First set of operands *pinta = 0x197e02f5; *pintb = 0x26810000; #ifdef BCC set_precision(EXTEND); res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #endif #ifdef BCC set_precision(DOUBLE); #endif opad = (double)opa; opbd = (double)opb; res = (float)(opad * opbd); printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #ifdef BCC set_precision(SINGLE); #endif res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); // Second set of operands *pinta = 0xa100000d; *pintb = 0x9e800008; #ifdef BCC set_precision(EXTEND); res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #endif #ifdef BCC set_precision(DOUBLE); #endif opad = (double)opa; opbd = (double)opb; res = (float)(opad * opbd); printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #ifdef BCC set_precision(SINGLE); #endif res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); } =========================================================================== Examples submitted by Alex Driker, 26 March 2001 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Operand in Hex Hex Significand Dec. Exponent a1: 197e02f5 1.fc05ea -77 b1: 26810000 1.020000 -50 p1 = exact product 1.fffdf5d4 -127 [p1] to 24 sig. bits 1.fffdf6 -127 store [p1] as float 0.fffefb -126 [p1]: 00fffefb p1 denormalized 0.fffefaea -126 {p1} rounded to float 0.fffefa -126 {p1}: 00fffefa Summary: If the product is computed exactlly and then rounded to 24 sig. bits as if exponent range were unlimiteded, the result [p1] can be denormalized and stored as a float. This is what happens when x87's precision control is set to round to 24 sig. bits in the floating-point stack registers. But if the product is computed exactly and then denormalized to stay within float's exponent range, and then rounded as a float to (now) 23 sig. bits, the result {p1} must differ from [p1] in its last bit stored. {p1} is what SPARCs and MIPs get. To get {p1} from an x87, leave precision control at 53 or 64 sig. bits so that rounding occurs at a FST (store) operation interposed after every arithmetic operation. Java does this, but BCC does not. I prefer BCC's way. Operand in Hex Hex Significand Dec. Exponent a2: a100000d -1.00001a -61 b2: 9e800008 -1.000010 -66 p2 = exact product 1.00002a0001a -127 [p2] to 24 sig. bits 1.00002a -127 store [p2] as float 0.800015 -126 [p2]: 00800015 p2 denormalized 0.8000150000d -126 {p2} rounded to float 0.800016 -126 {p2}: 00800016 Exercise for the Diligent Student: Find operands a3 and b3 for which SPARC's once-rounded {p3} is closer to the exact product p3 than is the x87's twice-rounded [p3] , albeit by less than one unit in the last place. Discussion: IEEE 754 requires that any implementor of all three floating-point formats, namely single, double, and double-extended, provide precision-control modes that allow results destined for that last and widest format to be rounded to the narrower format's precision if the programmer so desires. This allows that implementation to mimic those implementations lacking the third format by getting the same results unless over/underflow occurs. IEEE 754 allows the implementor to choose whether to mimic the narrower exponent range too when rounding to a narrower precision in a wider destination register. The Motorola 68881, designed in 1981, chose to restrict exponent range to match precision when narrowed by precision control. This allowed the 68881 easily to mimic less well endowed machines perfectly. The Intel 8087, designed in 1977 before IEEE 754 was promulgated, chose to keep the wider destination registers' exponent range. My motive for this choice was to provide Intel's customers with valid results when the mimicked machine got only error-messages or invalid results because of intermediate over/underflow in an expression that the Intel machine could evaluate as the programmer intended in its wider floating-point registers. Things did not work out as I planned. Most compilers on the x86/x87 supported the third floating-point format either grudgingly or not at all (Microsoft did both) while using it surreptitiously to evaluate only expressions deemed simple enough by the compiler. Part of the trouble can be blamed upon a design error perpetrated in Israel that transformed the handling of floating-point register spill on the x87 from painless to awful. Thus, what should have been an advantage for users of portable computer software on x87s turned into several nasty nuisances for software developers and testers. Most of the nuisances arise from the compilers' (mis)use of the x87's ill-supported third format, and would go away if programmers could specify what they desire (and get it) as C 9X provides. But one nuisance persists, and it snagged Alex Driker: Intel's x87 precision control mimics imperfectly the underflowed intermediate results obtained by less well endowed machines unless extra stores and loads (among other things) are insinuated into the instruction stream. The imperfection, the difference between one rounding and two, is tinier than the tiniest nonzero number; but it is still a difference with which software developers and testers have to contend. Back in 1977 this nuisance seemed inconsequential to me compared with the prospect of getting intended results instead of spurious over/underflows in intermediate expressions. If only I had known then what I know now! The Intel/Hewlett-Packard Itanium allows programmers to mimic less well endowed machines either the x87's way or the 68881's way. I expect the latter to become the only way in the long run, perhaps after I'm dead. Prof. W. Kahan