Skip the navigation

Why Software Testing Can't Spare You From IT Disasters

By Paul Rubens
March 12, 2014 08:43 AM ET

CIO - On the day of Facebook's IPO, a concurrency bug that lay hidden in the code used by Nasdaq suddenly reared its ugly head. A race condition prevented the delivery of order confirmations, so those orders were resubmitted repeatedly.

UBS, which backed the Facebook IPO, reportedly lost $350 million. The bug cost Nasdaq $10 million in SEC fines and more than $40 million in compensations claims -- not to mention immeasurable reputational damage.

So why was this bug not discovered during testing? In fact, how did it never manifest itself at all before that fateful day in 2012?

Race Conditions Present Concurrency Time Bomb

The answer is that some bugs, including race conditions, which can occur in concurrent software can't be reliably detected by testing. Ten tests wouldn't be enough. Nor would 100, or even 1,000.

A concurrent application with a race condition is like a time bomb in your organization waiting to explode. It may chug along perfectly for years before a particular set of circumstances causes it to fail spectacularly.

[ Related: Software Testing Lessons Learned From Knight Capital Fiasco ][ How-to: Do Financial Trading Right: Behind the Scenes at Liquidnet ]

Here's the problem in a nutshell. To get high performance and low latency, application code runs on two or more processor cores, with multiple streams of instructions running at the same time. One stream may be writing data to memory, and another stream may be reading it.

Usually, the write will occur before the read. But just occasionally, the stream that's responsible for the write won't get to that point in its execution in time. The other stream will get its read in first. That's a race condition: The speed that each thread is executing affects the result.

Concurrent Apps Don't Let You Dictate What Runs When or Where

On a single-core processor, that can't happen. In a multicore processor, where streams are running concurrently, surely the outcome will always be the same? Surely the same stream will always win the race? Unfortunately, that's not always the case. Concurrent applications display non-deterministic behavior. They don't always yield the same results.

To understand why, bear in mind that a developer doesn't have control over all parts of the environment in which an application will run. Execution is determined by a low-level scheduler that decides which bit of a program runs when. The coder doesn't have access to this. Don't forget, too, that hardware does things such as prefetch data and instructions and move information to and from caches.

[ Tips: 4 Ways CIOs Can to Respond to a Service Outage ][ Also: 6 Lessons From Healthcare.gov's Failed Launch ]

Originally published on www.cio.com. Click here to read the original story.
This story is reprinted from CIO.com, an online resource for information executives. Story Copyright CXO Media Inc., 2012. All rights reserved.
Our Commenting Policies