George Porter
The Role of Proof in Ensuring Software and Hardware Reliability
History of Modern Science (HIS 322M)
Dr. Hunt
April 3, 2000
May 6, 1997—"Flaw Reported in New Intel Chip" headlines the Technology/Cybertimes section of the New York Times. "The Intel Corp. will not formally introduce its Pentium II microprocessor until Wednesday, but the World Wide Web is already buzzing with reports of a bug that causes the new chip to make errors in some complex mathematical calculations" ("Flaw" Technology/Cybertimes). The article goes on to describe a bug in Intel’s latest chip that remained unfixed, even as the chips were on their way to market. The Pentium II bug was not the first to attract large-scale attention from major media sources. A few years earlier, on December 20, 1994, Intel recalled its popular Pentium processor due to an "FDIV" bug discovered the previous June by computational number theorist Thomas R. Nicely at Lynchburg College in Virginia (Pentium FDIV Bug FAQ). Far from remaining in the realm of mathematical specialists, bugs such as the FDIV bug attract the attention of the general population, if the public’s reaction to the FDIV bug is any indication. On January 17, 1995, Intel announced an outlay of $475 million to cover the recall of its Pentium chip. Thus Intel spent nearly half a billion dollars to fix a problem that even its discoverer, Thomas Nicely, admitted would not effect most users (Pentium FDIV Bug FAQ). No longer is the reliability of our software and hardware systems the concern of specialists in the field of computer science—technology’s growing involvement in daily life raises concerns among users as to its safety and reliability.
Can bugs such as Intel’s FDIV bug be avoided? In Donald MacKenzie’s Knowing Machines: Essays on Technical Change, an article describes the British Ministry of Defense’s VIPER project. The VIPER was an ambitious chip design in the 1980s—a microprocessor that was supposed to be guaranteed free from errors. Unlike companies like Intel, which test their microprocessors by subjecting them to numerous program runs, the VIPER team sought to prove mathematically that their chip met their specifications, and thus was free from errors. While originally well received, the VIPER chip ultimately met with failure since the group was unable to complete the proof needed to ensure the correctness of the chip. Does this leave the reliability of our software and hardware to chance? Certainly not. The use of rigorous mathematical proofs to guarantee that software and hardware perform as expected is not only possible, but in previous years has been successfully applied to several chips similar to the VIPER. Only by relying on the rigor of mathematics during the design and construction of chips and programs can we ensure that our technology remains reliable.
The period during and after WWII brought about several new computing machines—from the Colossus in Britain spearheaded by Alan Turing to American computers like Eckert and Mauchly’s ENIAC. The mathematician John Von Neumann proposed a computing device called the EDVAC, and consequently, modern computers are called Von Neumann architectures due to his work. In the early days of computing, computer engineers wrote programs at a very low level—as long strings of 1’s and 0’s. These small programs were written alongside mathematical justifications as to their correctness (Boyer). Von Neumann, in a series of papers on programming for computing devices, outlines programs to add and subtract "double precision" or fractional numbers. He describes programs for determining the square root of a number, and dividing polynomial expressions. In each of these cases, rigorous mathematical arguments as to their correctness precede the given computer program ("Planning," pg. 106, 124, 145). For people with a mathematical background like Von Neumann, the notion of justifying their programs seemed natural. As computers progressed and more elaborate, higher-level languages developed, a wider group of developers began writing computer programs. Larger computers could handle longer and more complicated programs, and therefore it became harder to determine if a given program produced the correct results. Since the WWII, more sophisticated proof strategies were developed. These strategies rely on the notion that theorems about a program (i.e. that it generates certain outputs given well-formed inputs) derive from given axioms and valid rules of inference.
One sets out to prove software properties in the context of an exact specification of the systems’ desired behavior. Settling on the desired behavior of a piece of hardware or software is not as obvious as would seem. Chapter seven of MacKenzie’s Knowing Machines chronicles the standardization of the IEEE 754 floating point arithmetic system, which is arithmetic performed on fractional numbers often used in scientific calculations. Multiplication and division of these floating point numbers seems straightforward, but due to computers limited ability to store information, certain difficulties arise in specifying precisely what behavior the "correct" actions should be. Several different behaviors were suggested, each by groups that held an interest in the eventual outcome of the standard. Due to computers limited precision, various schemes were suggested to handle the imprecision and the Kahan, Coonen, and Stone model was eventually accepted into the 754 standard ("Negotiating," pg. 172). Most modern computers and software programs use the IEEE 754 standard to represent fractional numbers. So in what sense is a microprocessor that supports the 754 standard "correct" in its operations on fractional numbers? Both of the proposed systems of representing numbers with imprecise precison—the DEC and the K, C, and S model—were supported by entrenched technology. For example, Intel was on the verge of releasing a chip based on the K, C, and S model ("Negotiating," pg. 171). It seems that a standard that was influenced by individual concerns and was eventually passed by majority vote would cast doubts as to what correct mathematical operations really are. Some critics of the notion of software proofs point to this as a serious flaw in the application of proof to software and hardware. I will return to their concerns shortly.
The previously mentioned VIPER project attempted to create a fully verified microprocessor based on mathematical reasoning, and although it was never successfully proven correct, other processors have recently met with better success. One of the more recent success stories of using formal models to describe a system, and then proving properties of that system correct is the floating point unit of the AMD K5. Austin-based Advanced Micro Devices (AMD) manufactures a microprocessor compatible with the Intel Pentium line. Not wanting to repeat the FDIV bug of Intel (and suffer the enormous financial cost), they sought a proof of the correctness of part of their chip. They worked closely with the developers of an automatic theorem-proving program called ACL2. Their ACL2 proof was based on an earlier theorem-prover written by J Moore and Robert Boyer, and included additional work by Tom Lynch, Matt Kaufmann, and others. Based on the above-mentioned IEEE 754 standard, the group created a formal model of how the chip should implement the standard for multiplication and division. They then formulated a property they wished to assert about the chip:
If p and d are double extended precision floating-point numbers (d /= 0) and mode is a rounding mode specifying a rounding style and target format of precision n not exceeding 64, then the result delivered by the K5 microcode is p/d rounded according to mode (AMD5k86).
Using the automatic theorem prover ACL2, they successfully proved that the design of the AMD K5 microprocessor behaves as specified above. The statement says that division behaves as expected according to the IEEE 754 specification. No bugs have been found in the portion of the chip that was proved correct with the theorem prover that differ from the given specification. Note that the proof only connects the behavior of the AMD K5 to the specification as to how it should implement the IEEE standard. This connection has led some to criticize the use of proofs in ensuring reliable hardware and software by pointing out that proofs only connect software and hardware with formalized specifications, and that those specifications might not model what the designer intended.
Models of hardware and software, like the model used to verify the AMD K5 microprocessor, are similar to mathematical models used by physical scientists to describe natural phenomenon. For example, Galileo’s discovery that objects fall at uniform speed can be modeled with formulas such as F=ma. Such equations model quantities such as mass and the acceleration due to gravity g. Some might say that the model "F=ma" does not properly describe falling bodies—for example it discounts wind resistance. While that is true, mathematical models of physical events nevertheless remain very useful. When modeling computer programs, the designer of the model must precisely formulate the exact design of the system. The program can only be verified against behavior that is included in the model. In the case of the AMD K5, the behavior specified included the operations that performed multiplications and divisions according to prescribed rules related to rounding results and other details specified in the 754 specification. As mentioned before, some critics point out problems related to the use of formal models to verify software reliability. Brian Cantwell Smith in his article Limits of Correctness in Computers says,
What is called a proof of correctness is really a proof of the compatibility or consistency between two formal objects of an extremely similar sort: program and specification. As a community, we computer scientists should call this relative consistency, and drop the word ‘correctness’ completely ("Limits," pg. 287).
Smith’s article implies that software cannot be verified correct because it is impossible to formulate formal models that perfectly describe the real world. It is true one can only verify a program against some unambiguous model—a model that may be imperfect. In my opinion, what Smith fails to consider are the benefits of choosing one specification of the desired behavior, even if that model is not perfect or immediately obvious, as seen in the IEEE 754 specification. The several different suggestions during the formation of the 754 specification as to the behavior of floating point arithmetic might have been roughly equivalent—no one clearly better than the others. However, the fact that a chip such as the AMD K5 can be guaranteed to carry out that specification given any input is highly desirable. The benefits gained from knowing that a software or hardware system will perform predictably and consistently given any well-formed input (as described in the specification) allows developers to rely on the results, even if they differ from "real" arithmetic. Contrast this situation with the more common one—software or hardware based on English language descriptions of the desired behavior. Such systems are often based on conventions or "common sense" interpretations that become disastrous when implemented by large groups of engineers each with different backgrounds and training. The recent loss of a NASA satellite due to confusion as to the use of British verses metric measurements points out how such a "simple" miscommunication can have disastrous results. Given the above, it seems that the employment of specifications and rigorous mathematical reasoning about programs or computing hardware would gain wide acceptance. But indeed the adoption of such rigorous standards is not the general case.
University programs in computer science are the ideal place to introduce future programmers and computer scientists to the use of rigorous mathematical reasoning about software and hardware. Many new breakthroughs in automatic theorem proving programs originate at universities. For example, the development of ACL2 is focused at the University of Texas at Austin, and the LCF-LSM system used to write specifications of the VIPER chip was based at Cambridge ("Fangs," pg. 161). Based on my experiences of the University of Texas computer science curriculum, I have found that the use of rigorous mathematical reasoning to justify the correctness of computer programming projects is not often applied. This lack of a rigorous foundation has led at least one computer science professor, Robert Boyer, to draft The Rigor Resolution on Undergraduate Education. This document, proposed to the department in September of 1995, met with little acceptance. In the introduction, Boyer comments that "the students receive a vast but sloppy and unrigorous knowledge of many programming language constructs, architectures, and systems" and that
… this policy of rapid, sloppy exposure to many aspects of computing, all poorly learned, is spreading a thesis that pollutes not only our students’ minds but also the entire future of computing—that there are no real truths in computing, no science, only layers of partial, vague rules of thumb among which trial and error is preeminent (Rigor, 2).
The document suggests that each course should set about to demonstrate some specific set of truths related to its subject, so that when a student graduates from the department he will know with confidence that he fully understands each subject. The document points out several objections that some might make towards this new way of teaching computer science. Most center on the need to teach students the latest industry technology. Boyer points out that in addition to the students, the department has two additional constituencies: industry and parents of students (Boyer Interview). "… we cannot simultaneously educate students who will become scientists as we educate students to serve the immediate needs of industry (Rigor, 8)." It is ironic that industry seeks to get students through the program as quickly as possible, due to the current shortage of qualified programmers, yet by focusing on a "useful" curriculum heavy in industry practices, students graduate with little ability to demonstrate that their programs or hardware devices work reliably.
Given that the use of rigorous mathematical arguments of software programs has not reached a wide acceptance, some might suggest that such reasoning is impossible, or at least infeasible for large programs. Smith says in his article that "the current state of the verification art extends only to programs of up to several hundred [lines] (Limits, pg. 277)." He goes on to say, "It is estimated that the systems proposed in the Strategic Defense Initiative, in contrast, will require at least 10,000,000 lines of code (Limits, pg. 277)." Such arguments ignore a basic tenet of large software systems—that they are built up with smaller, simpler pieces. A ten million line program is not written in one sitting, it is made up of thousands of smaller pieces, each of which can be proved correct. These layers of abstraction make proofs of large-scale programs possible. Furthermore, the success received in proving the floating point unit of the AMD K5 gives confidence that in the future additional systems will receive the same attention and care. If the nearly half a billion dollars spent by one company alone—Intel—to repair a bug in one of their chips is any indication, then it seems clear that industry cannot afford to leave their systems to chance.
Works Cited
AMD5k86 Floating-Point Division. University of Texas at Austin Computer Science Department. 2 April 2000 <http://www.cs.utexas.edu/users/moore/best-ideas/fdiv/index.html>.
Boyer, Robert S., Ph.D. Personal Interview. 28 March, 2000.
"Flaw Reported in New Intel Chip." The New York Times 6 May 1997: Technology/Cybertimes.
Goldstine, H., and John Von Neumann. "Planning and Coding Problems." John Von Neumann Collected Works, Vol. V. Ed. A. H. Taub, Exeter, Great Britain: Pergamon Press, Ltd., 1961. 80-152.
MacKenzie, Donald. "The Fangs of the VIPER." Knowing Machines: Essays on Technical Change. Cambridge, MA: The MIT Press, 1996. 159-164.
- - -. "Negotiating Arithmetic, Constructing Proof." Knowing Machines: Essays on Technical Change. Cambridge, MA: The MIT Press, 1996. 165-183.
Pentium FDIV Bug FAQ. Lynchburg College. 9 June 1999 <http://lasi.lynchburg.edu/Nicely_T/public/pentbug/pentbug.htm>.
The Rigor Resolution on Undergraduate Education. Robert S. Boyer, Ph.D., Professor of Computer Sciences. University of Texas at Austin. September, 1995 <http://www.cs.utexas.edu/users/boyer/rigor-iv.ps>.
Smith, Brian Cantwell. "Limits of Correctness in Computers." Program Verification. Ed. Colburn, Timothy, James Fetzer, and Terry Rankin. AA Dordrecht: The Netherlands, 1993.