A Dose of Reality

Nothing gives you a warts-and-all experience like testing your disaster recovery plan in the real world.

If you want to really test your disaster recovery plan, you have to get out from behind your desk and step out into the real world. Because in the real world, the backup site lost your tapes, your emergency phone numbers are out of date, and you forgot to order Chinese food for the folks working around the clock at your off-site data center.

"Unless it's tested, it's just a document," says Joyce Repsher, product manager for business continuity services at Electronic Data Systems Corp., an IT outsourcing and services provider in Plano, Texas.

How often should you test? Several experts suggest real-world testing of an organization's most critical systems at least once a year. In the wake of Sept. 11 and with new regulations holding executives responsible for keeping corporate data secure, organizations are doing more testing than they did 10 years ago, says Repsher. An exclusive Computerworld online survey of 224 IT managers supports that assertion, indicating that 71% had tested their disaster recovery plans in the past year.

Desktop disaster recovery testing involves going through a checklist of who should do what in case of a disaster. Such walk-throughs are a necessary first step and can help you catch changes such as a new version of an application that will trigger other changes in the plan. They can also identify the most important applications, says Repsher, "before moving to the expense of a more realistic recovery test."

Companies do desktop tests at different intervals. Fluor Fernald Inc., which is handling the cleanup of a government nuclear site in Fernald, Ohio, does both desktop and physical tests of its disaster response plans every three years "or anytime there's a significant change in our hardware configuration," says Jan Arnett, manager of systems and administration at the division of engineering giant Fluor Corp.

What's Critical?

Determining which systems need a live test is also critical. Fluor Fernald schedules live tests on only about 25 of its most critical applications and then tests only one server running a representative sample of these applications, says Arnett. "We feel if we can bring one server up, we can bring 10 servers up," he says, especially since the company uses standard Intel-based servers and networking equipment.

The most common form of live testing is parallel testing, says Todd Pekats, national director of storage alliances at IT services provider CompuCom Systems Inc. in Dallas. Parallel testing recovers a separate set of critical applications at a disaster recovery site without interrupting the flow of regular business. Costly and rarely done, the most realistic test is a full switch of critical systems during working hours to standby equipment, which Pekats says is appropriate only for the most critical applications.

Businesses that are growing or changing quickly should test their disaster recovery plans more often, says Al Decker, executive director of security and privacy services at EDS. He cites one firm that has grown eightfold since 1999, when its disaster plan called for the recovery of critical systems in 24 hours. Today, just mounting the tapes required for those systems would take four to 10 days, he says.

Deciding how realistic to make the test "is a balance between the amount of protection you want" and the cost in money, staff time and disruption, says Repsher. As an organization's disaster recovery program matures, the tests of its recovery plans should become more challenging, adds Dan Bailey, senior manager at risk consulting firm Protiviti Inc. in Dallas. While the more realistic exercises provide more lessons about what needs improvement, he says, an organization just starting out with a rudimentary plan probably can't handle a very challenging drill.

Never assume that everything will go as planned. That includes anything from having enough food or desks at a recovery site to having up-to-date contact numbers. Communications problems are common, but they're easily prevented by having every staff member place a test call to everyone on their contact list, says Kevin Chenoweth, a disaster recovery administrator at Vanderbilt University Medical Center in Nashville.

Also, never assume that the data on your backup tapes is current or that your recovery hardware can handle your production databases. Arnett found subtle differences in the drivers and network configuration cards on his replacement servers that forced him to load an older version of his Oracle database software to recover his data.

Chenoweth or his staffers review each test with the affected business units and develop specific plans (with timelines) for fixing problems.

Finally, Chenoweth says, thank everyone for their help, especially if the test kept them away from home. "If you've got a good relationship, they're more likely to be responsive" to the firm's disaster recovery needs, he says.

Scheier is a Computerworld contributing writer in Boylston, Mass. He can be reached at rscheier@charter.net. Additional reporting by Mitch Betts.


Survey Snapshot

When was your company’s disaster recovery plan last tested?

Less than a month ago 6%
One to three months ago 24%
Four to six months ago 18%
More than six months ago 16%
One year ago 7%
More than one year ago 10%
Don’t know 19%
Base: Online survey of 224 IT professionals at organizations that have a disaster recovery plan

Source: Computerworld, Framingham, Mass., February 2004

Special Report

Preparing For The Worst

Stories in this report:

Copyright © 2004 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon