Rising From Disaster

These tips from users with well-worn recovery plans will help keep your business running during the most common disasters.

One key to keeping your business on its feet in a disaster is anticipating the sometimes cascading effects a catastrophe can have on your IT operation.

Take Miami-Dade County, for example. When a hurricane hit southern Florida in 1992, the county's data center lost power. Diesel generators had overheated when well water ran out because high winds had broken water mains and lowered the water table. IT managers later had air-cooled generators installed.

One of the problems with disaster recovery, experts say, is that although most companies have plans for common scenarios — weather-related emergencies, headquarters lockouts and massive power outages—those plans aren't regularly tested or communicated to end users. In fact, in a recent survey of 283 Computerworld readers, 81% of the respondents said their organizations have disaster recovery plans. But 71% of the respondents at companies with plans said the plans hadn't been exercised in 2003.

It takes forethought to avoid a business shutdown during a disaster. Experts and users agree that there are steps you can take to increase your chances of coming through the most common disasters unscathed.

Weather-Related Emergencies

"If you look at why facilities fail [during weather disasters], it's all pretty predictable. They call it an act of God, and I call it an act of stupidity," says Ken Brill, executive director of The Uptime Institute in Santa Fe, N.M.

Hurricanes threaten Miami-Dade County's data center every year from June through November, yet IT managers still struggle with getting everyone to understand the importance of disaster planning. "The challenge we always have is to make sure the staff is completely involved and we have participation," says Ruben Lopez, director of the enterprise technology services department for the county.

Miami-Dade County gives itself a 56-hour window to test its disaster recovery plan each year by cutting over to its alternate data center and restoring data. It uses the time to find deficiencies and later corrects them. "Business continuity and disaster recovery preparedness is all about figuring out what your deficiencies are and how you're going to fix them. It's not about how to get an A+ on paper," says Joe Torres, disaster recovery coordinator for Miami-Dade County. He points out that it's not the people he's testing during a disaster recovery exercise but the plan—"because you can't depend on the people being available." "You're going to give them a book with instructions, and they need to be able to follow that," Torres says. One step Miami-Dade has taken in that direction is to consider call-tree software that could help employees contact key managers in an emergency.

Walter Hatten, senior vice president and technical services manager at Hancock Bank in Gulfport, Miss., has focused on consolidating his server farm and creating a redundant communications network for an area of the country that gets hit or brushed by a hurricane every three and a half years. The 100-branch bank, with headquarters on the Gulf of Mexico, is consolidating 500 servers onto a Linux-based mainframe to reduce recovery time in a disaster.

"Just the sheer magnitude of rebuilding 500 servers puts us at risk for not being able to do it quickly enough," says Hatten, who chose Linux for its open standard and scalability. He says the mainframe will offer greater speed for recovery of data, reducing the amount of time it would take to restore data from days to hours.

Headquarters Lockouts

Maria Herrera is chief technology officer at Patton Boggs LLP, a Washington-based law firm with 400 attorneys specializing in international trade law. Because of the firm's proximity to the U.S. Capitol building, one constant concern is a building lockout brought on by terrorist threats, she says.

Herrera has set up duplicate operating environments in several remote offices and has contracted with two disaster recovery vendors: SunGard Data Systems Inc. in Wayne, Pa., for server recovery and workstation services, and AmeriVault Corp. in Waltham, Mass., for data backup.

In January, AmeriVault installed its CentralControl interface on desktops and an agent on each of Patton Boggs' servers. After completing an initial full backup of all data, AmeriVault now performs daily incremental backups of deltas, or changes, to disaster recovery centers in Waltham and Philadelphia. In an emergency, data restores can be performed remotely, even from home, by administrators using a point-and-click function on a Web portal provided by AmeriVault, or data can be shipped on tape for large restores.

"Every month or couple of months, we access several documents and download them from AmeriVault to test the system," says Herrera. During full testing, she spends 16 hours recovering full data sets. "We're able to restore everything within the firm in about 10 hours," she says.

Herrera also suggests involving all IT personnel in the disaster recovery testing process, because in an emergency, you never know who might be available to help. She has trained employees in all four satellite offices around the country on disaster recovery procedures.

SunGard also has several facilities where IT personnel and lawyers can meet to continue work in the event of a headquarters lockout, Herrera says.

Officials at Mizuho Capital Markets Corp., a subsidiary of the world's second-largest financial services firm, Mizuho Financial Group Inc. in Tokyo, say that some of the most effective disaster recovery tools are the simplest.

For example, when a protest kept employees from entering the firm's Times Square headquarters late last year, IT managers passed out laminated business cards with a directory of managers' home phone numbers.

Doug Lilly, a senior telecommunications technologist at the Delaware Department of Technology and Information, says his agency has three data centers that support about 20,000 state employees. The department uses EMC Corp.'s Symmetrix Remote Data Facility to replicate data among the data centers. It also uses backup software from Oceanport, N.J.-based CommVault Systems Inc. as a central management tool.

"If this site were bombed ... we'd have servers running to replace them, but we'd still have to restore data from tapes," Lilly says. "CommVault's software transfers between 60GB and 65GB of data per hour. It would be a few hours before we got people up online."

Lilly's IT team also keeps a copy of disaster recovery procedures at home. "Team leaders notify everyone, and we carry cell phones and BlackBerries that are on redundant networks," he says. "It's a pretty unified messaging platform ... that ties data, voice, fax and video into one application. They can get hold of us anytime, anywhere."

Massive Power Outages

Edward Koplin, an engineer at Jack Dale Associates PC, an engineering firm in Baltimore, says a lack of disaster testing is the No. 1 cause of data center failures during a blackout. Koplin suggests that companies test their diesel generators often and at full load for as long as they're expected to be in use during a blackout.

The Uptime Institute's Brill adds to that advice: Always prepare for a blackout with at least two more generators than needed, and test them by literally pulling the plug. "I would test it for as long as I expected it to work under load. I'd do that at least every two or three years. And I would run it in the summer," Brill says.

Jim Rittas, a security administrator responsible for networking at Mizuho, says the company can now perform full data restores after blackouts or other disasters in an hour instead of two days because it now mirrors its data to a New Jersey office that's also an active work site. "The other thing we did was diversify our Internet connections. Internet connections now flow in and out of New York and New Jersey, where we only had one in New York before," Rittas says.

Needham, Mass.-based research firm TowerGroup recommends turning parts of disaster recovery or business continuity data centers into profit centers by going with an active/active operations model. Traditionally, companies have set up an active primary data center and unmanned backup site. An active/active model eliminates the need for IT staffers to relocate in an disaster because they're permanently stationed at the disaster recovery site, which is also used to run active business applications.

Integrating disaster recovery IT assets and personnel into operations budgets across geographically dispersed data centers will also help blur the line between disaster recovery and operations spending.

It's best to have a complete copy of your data in an alternate site at all times, "not just some of it," says Wayne Schletter, associate director of global technology at Mizuho Capital Markets. "You don't want to be piecing things together after something happens. You just want to be ready to go."

Survey Snapshots

Does your organization have a disaster recovery plan?
Does your organization have a disaster recovery plan?
BASE: Online survey of 283 IT professionals

Was your company's disaster recovery plan exercised in 2003?

Was your company's disaster recovery plan exercised in 2003?

BASE: Online survey of 227 IT professionals at organizations that have a disaster recovery plan

Source: Computerworld, Framingham, Mass., February 2004

Special Report

Preparing For The Worst

Stories in this report:

Copyright © 2004 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon