Epic failures: 11 infamous software bugs
By Matt Lake
September 9, 2010 06:00 AM ET
Forty seconds of Ariane-5
The European Space Agency (ESA) has also suffered embarrassment on the software front. The inaugural flight of its fifth-generation Ariane launcher bested NASA's Mariner 1 score for unmanned spacecraft disaster: It took only 40 seconds to blow up.
On June 4, 1996, after the kind of dramatic vertical blastoff you'd expect from a high-profile European vehicle, cameras on the ground barely had time to focus on the Ariane-5 as it turned around and began to fall apart, before it completely exploded.
The Ariane Flight 501 disaster began with a loss of guidance and attitude information 30 seconds after liftoff. Once it veered completely off course, it automatically self-destructed.
The problem was that Ariane-5's inertial reference system dealt with 64-bit floating-point data and converted it into 16-bit signed integer values. The result of the data conversion was too large for a 16-bit signed integer, which caused an arithmetic overflow in the hardware. In the ESA's case, a software handler that could have dealt with the problem had been disabled, and so there was no levee to dam the cascade of system failures that led to the destruction.
Some bugs are noisy: They cause explosions that destroy machines. Others are subtler in their destructiveness: They cause severe embarrassment that turns companies' good names to "Mud" and sometimes threatens the bottom line.
Pentium chips fail math
In 1994, an entire line of CPUs by market leader Intel simply couldn't do their math. The Pentium floating-point flaw ensured that no matter what software you used, your results stood a chance of being inaccurate past the eighth decimal point. The problem lay in a faulty math coprocessor, also known as a floating-point unit. The result was a small possibility of tiny errors in hardcore calculations, but it was a costly PR debacle for Intel.
How did the first generation of Pentiums go wrong? Intel's laudable idea was to triple the execution speed of floating-point calculations by ditching the previous-generation 486 processor's clunky shift-and-subtract algorithm and substituting a lookup-table approach in the Pentium. So far, so smart. The lookup table consisted of 1,066 table entries, downloaded into the programmable logic array of the chip. But only 1,061 entries made it onto the first-generation Pentiums; five got lost on the way.
When the floating-point unit accessed any of the empty cells, it would get a zero response instead of the real answer. A zero response from one cell didn't actually return an answer of zero: A few obscure calculations returned slight errors typically around the tenth decimal digit, so the error passed by quality control and into production.
What did that mean for the lay user? Not much. With this kind of bug, there's a 1-in-360 billion chance that miscalculations could reach as high as the fourth decimal place. More likely, with odds of 1-to-9 billion against, was that any errors would happen in the 9th or 10th decimal digit.
But wouldn't you know it? A Virginia-based math professor named Thomas Nicely needed that level of accuracy, found he wasn't getting it and figured out why.
In October 1994, he alerted Intel, then others, to the problem. Intel retorted with a response only marginally less tactful than "Oh, that thing? Yeah, we noticed that back in June."
Thus began an inexorable slide into PR hell and a costly mop-up bill. In January 1995, Intel announced a pretax charge of $475 million against earnings, most of which apparently stemmed from replacing flawed processors.
The bottom line in this arithmetic mess is this: In lookup-table and money calculations, 1,066 – 5 = –$475,000,000. Any way you look at it, that's bad math.