Preparing for big -- and small -- IT emergencies

Verizon Wireless did an admirable job restoring telecommunications services after the Sept. 11 attacks, but the job could have been easier if backup data for the West Street facility that serves the New York Stock Exchange had been stored off-site.

"That was the single point of failure," says Dennis Elwell, executive director of business recovery and continuity services at Verizon Enterprise Solutions Group. Although Verizon's business recovery plan enabled it to restore 90% of the affected telecommunications service areas in less than a month, there are lessons to be learned from the experience.

The events of Sept. 11 have raised awareness about the need for disaster recovery plans -- not just to bounce back from outages caused by massive natural or man-made disasters, but also from day-to-day events such as software corruption and human error. What follows is advice from IT executives about whom to involve in creating a disaster recovery plan and what elements it should include.

The Michigan Department of Environmental Quality (DEQ) developed its disaster recovery plan during a nine-month period in 2000. Mike Hatfield, the DEQ's security officer, headed up the project with help from an outside consultant.

The DEQ employs 1,450 people, 63 of whom work in IT. The majority of its systems are kept at a data center in downtown Lansing, and each of the organization's 18 sites has one or more servers.

The first stage of creating the disaster recovery plan was risk assessment. Hatfield surveyed the DEQ's 10 departments to assign priorities for their systems. Most said they needed to have their systems restored within 72 hours, but one said 12 hours, and several requested a 24-hour recovery period. Hatfield was then able to design appropriate network recovery procedures.

Chad Zemer, operations manager of the DEQ's Office of Automation Coordination, who provided all of the telecommunications, network and server information included in the plan, says to "make sure the business processes are at the forefront of a [disaster recovery] plan."

Jim Metzler, vice president of consultancy Ashton Metzler & Associates in Sanibel, Fla., suggests giving businesses options and prices for different levels of business continuity. Departments can decide how much risk they're willing to take and pay accordingly. For example, would organizations be willing to spend millions for a hot-standby site that's ready to take over for the data center when disaster strikes?

The DEQ opted for an off-site recovery service supplied by LiveVault Corp. in Marlboro, Mass. In December 2001, the DEQ began backing up its critical file data, databases, e-mail and Internet/intranet software to LiveVault's servers, which currently hold 400GB of the DEQ's data. Replication to LiveVault is done once via a secured T3 connection over the Internet. Subsequent changes to the primary files are updated to the backup system within 20 minutes.

The DEQ also has access to the state government's IT operations center, located 10 miles away, which acts as a hot-standby facility. The agency can restore systems in an emergency by retrieving saved copies from LiveVault over a secure Internet connection.

Zemer's team has used the service a few times to recover databases and the Novell GroupWise post office. Most commonly, it's accidental deletions that result in the need for data recovery, but on occasion, the DEQ has suffered from corrupt files or problems upgrading software.

Scot Nattrass, director of operations at Oncology Therapeutic Network (OTN) in South San Francisco, Calif., advises that companies test their plans and make sure their personnel know how to complete their responsibilities in an emergency. "Technical departments spend time making sure data is backed up or that people know how to access it, but training and testing can get overlooked," he says.

OTN, a subsidiary of Bristol-Myers Squibb Co., is a drug distributor that employs about 200 people. The firm uses Network Appliance Inc.'s SnapMirror replication tool to make real-time backups of its customer relationship and enterprise resource planning applications in Bristol-Myers Squibb's New Jersey data center. What's more, OTN established a backup call center about 100 miles away in the more seismically stable area of Sacramento.

Workers have access to management's disaster planning manual, and employees have been trained about what to do if an emergency arises. Customer service and order-processing staff have been given an emergency phone number to call for updates, and wallet-size reference cards provide phone numbers and other contact information.

DEQ's CIO keeps a list of names and contact details for every IT staff member, and teams of people have been identified to perform certain functions in case of an emergency. Copies of the disaster recovery plan are kept on- and off-site.

When you assemble your plan, Metzler recommends, consider as many contingencies as possible. For instance, would you know how to get staff to your building or to stand by in case of a disaster? How would you contact staff if there was a natural disaster and all the wired and wireless telephone networks were down? "There is no magic elixir. You should have a backup plan [for the real plan] -- what types of situations would there be for the [main] plan not to work?" Metzler says.

Some questions may be difficult to answer, but as Nattrass says, "The plan will never be complete. It is a living document." Test and update the plan as business progresses.

This story, "Preparing for big -- and small -- IT emergencies" was originally published by Network World.

Copyright © 2002 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon