Epic failures: 11 infamous software bugs

Celebrate 'Debugging Day' by remembering these monster problems from the past

1 2 3 4 5 6 7 Page 4
Page 4 of 7

Call waiting ... and waiting ... and waiting

On Jan. 15, 1990, around 60,000 AT&T long-distance customers tried to place long-distance calls as usual -- and got nothing. Behind the scenes, the company's 4ESS long-distance switches, all 114 of them, kept rebooting in sequence. AT&T assumed it was being hacked, and for nine hours, the company and law enforcement tried to work out what was happening. In the end, AT&T uncovered the culprit: an obscure fault in its new software.

Here's how the switches were supposed to work: If one switch gets congested, it sends a "do not disturb" message to the next switch, which picks up its traffic. The second switch resets itself to keep from disturbing the first switch. Switch 2 checks back on Switch 1, and if it detects activity, it does another reset to reflect that Switch 1 is back online. So far, so simple.

The month before the crash, AT&T tweaked the code to speed up the process. The trouble was, things were too fast. The first server to overload sent two messages, one of which hit the second server just as it was resetting. The second server assumed that there was a fault in its CCS7 internal logic and reset itself. It put up its own "do not disturb" sign and passed the problem on to a third switch.

The third switch also got overwhelmed and reset itself, and so the problem cascaded through the whole system. All 114 switches in the system kept resetting themselves, until engineers reduced the message load on the whole system and the wave of resets finally broke.

In the meantime, AT&T lost an estimated $60 million in long-distance charges from calls that didn't go through. The company took a further financial hit a few weeks later when it knocked a third off its regular long-distance rates on Valentine's Day to make amends with customers.

Windows Genuine Disadvantage

Introduced in 2006, Windows Genuine Advantage was never a popular initiative with Microsoft's customers. Consumers had trouble seeing the advantages: It did nothing to help the security or stability of a legitimate Windows installation. All it did was help Microsoft root out software piracy.

In that task, it was as vigilant as, well, a vigilante. In fact, in late-August 2007, it found piracy everywhere it looked -- even among thousands of legitimate Windows customers.

On Friday, Aug. 24, someone on the WGA team accidentally installed bug-filled preproduction software on the WGA servers. The team quickly rolled back to a tested release of the software, but they didn't check that their fix actually addressed the problem. It didn't. So for 19 hours, until around 3 p.m. the following day, the server flagged thousands of WGA clients across the globe as illegal.

Windows XP customers were told they were running pirated software. Windows Vista customers were slapped harder: They had features turned off, including the eye candy Aero theme and support for ReadyBoost virtual RAM drives.

The first official response to complaints didn't help much: Disgruntled patrons were advised to try to revalidate on Tuesday. But even when the problem was fixed, mid-Saturday afternoon, Vista clients still had to revalidate their Windows installations before they could ReadyBoost their way back into Aero.

OK, so this was a relatively mild issue in engineering terms, and strictly speaking, it was caused by human error. But the error in question was deploying buggy, untested software, and when you factor in the number of people affected, the level of anger induced and the knock-on effect of bad publicity, it was more severe than it seems at first glance.

1 2 3 4 5 6 7 Page 4
Page 4 of 7
IT buyer’s guide to business laptops
Shop Tech Products at Amazon