A matter of minutes

Emergencies in IT often arise outside the realm of technology, and they're unpredictable, so developing a rapid-response disaster recovery plan for every contingency is impossible.

Nevertheless, rapid-response disaster recovery plans are joining security recovery and business continuity planning as staples in a chief technology officer's repertoire against potential threats to data and operations. The goal of an IT rapid-response plan is to provide a framework in which the chief technology officer can quickly react, respond and steer a predetermined course of action to minimize losses when an event occurs.

The key to that success, says Alan Lloyd Paris, a partner at Capco, a financial IT consultancy in New York, is to consider intangible elements rather than a strict set of rules to act upon at a moment's notice.

"The idea is to plan around a particular set of outcomes, as opposed to planning for any particular emergency," Paris says. "You can't plan for everything, so you have to develop a plan that's flexible and that takes a look at a tiered set of problems."

For instance, Paris says, rather than planning recovery based on certain external threats -- such as a bomb, or a chemical or biological attack -- use a simple triple-tiered approach: Plan what to do when building access is denied, what to do when a certain floor that's needed to transact business is closed and how to recover from a particular system outage.

The importance of rapid-response tactics was pushed to the forefront Sept. 11. The sheer mass destruction caused by the terrorist attacks on the World Trade Center and the Pentagon forever altered perceptions of the complex web of variables that might be affected by a major disruption.

Tom Moogan, director of global general services for Citigroup Inc.'s corporate and investment banking businesses in New York, says that prior to Sept. 11, his company's assumption was that a single power grid failure was the only realistic disaster for which to plan. But times have changed.

Citigroup lost 472 file servers and 4,300 workstations when 7 World Trade Center was destroyed, and the financial firm had to evacuate 16,500 of its employees in the aftermath of the attacks, Moogan says. Citigroup lost 1.3 million square feet of property.

During the disaster, Moogan says that 800 out of 2,550 disaster recovery plans that he manages were implemented, and the company's foreign exchange desk was transferred to London.

Although all Citigroup employees were up and running the next day, Moogan says, implementation of the rapid response measures provided valuable insight for future disaster planning. "The issues we faced were significant problems with counter parties. Some of the disaster recovery plans of the firms that we trade with and close with were not as robust as we had assumed," he says. "Therefore, we had significant issues on our balance sheet we had to resolve."

Due to damage to the Verizon Communications building on Wall Street, New York's voice infrastructure was hampered. Citigroup's decision to invest in BlackBerry wireless devices from Research In Motion Ltd. in Waterloo, Ontario, proved critical, as they became the primary mode of communication on the day of the attack.

Wireless, Local and Prepared

Analysts say more recovery efforts were managed on BlackBerrys in the days following the terrorist attacks than on any other computing device. This shift signals a crucial relationship between effective rapid response and availability of wireless technology devices and wireless LANs.

"What firms really need to do is have a central site for employees, whether a Web site or a call-in line. ... A lot of these firms found that they couldn't reach people," Capco's Paris says.

Establishing personal familiarity and solid relationships with local authorities and rescue organizations could also pay dividends, should immediate assistance ever be required.

"I think critical [rapid] response worked where firms had already made contact with the mayor's office and the police and fire departments. Sept. 12 was not the time to make your first contacts [with them]," Paris notes.

In order to mitigate the response to be carried out after an unforeseen event, Moogan says, Citigroup performs crisis management planning for less-common disaster recovery areas such as application development, transportation and security protocol settings.

The CTO also plays a pivotal role in setting the rapid-response bar and provides an example of how to best deal with high-pressure situations while exploring the safest and quickest route for technological asset retention.

"People should take the track of learning from [those] who responded well during this crisis, regardless of the level of position in the organization. [If it happens again], you'll need to rely on these people," Moogan says. "People who stay have certain things they have to get done -- see who can respond well in a crisis and be innovative. A lot of people fall back on routine during a crisis, when routine is not what is required."

Rapid Product Ramp-up

Rapid-response preparation for some businesses may result in the ramp-up of products or applications weeks or months ahead of schedule.

Graham Albutt, president of the business technology group at Reuters in New York, says employees in Geneva worked through the night on Sept. 11 to produce a new product, Reuters Market Monitor. The Web-based trading-floor tool, offered freely to financial and trading institutions at the time, offered customers real-time quotes, news and, more important, a baseline communications application during the Sept. 11 power blackout.

Reuters also moved an unidentified amount of products from prototype phase and delivered them to customers during the days following the disaster to improve access to data capabilities, Albutt adds.

Core Recovery

For some IT managers, disaster recovery is part and parcel of the organization's core mission. And enterprise CTOs can learn from their recovery plans.

Doug Bolton, an information systems analyst at the San Diego Fire Department, says his team must be ready for rapid-response execution. All emergency 911 calls are rerouted to his organization in the event of a disaster.

For example, the department has its primary Stratus FT server -- which, by design, only uses 30% of capacity -- connected to two separate power grids, one at its central headquarters and another at its backup data center about 20 minutes away.

If the building is still standing, the location will be used as central headquarters for disaster recovery and crisis planning by the city's fire, police and civil service departments. An urban search-and-rescue team is also equipped with a radio and paging system designed to keep the department's system afloat and to transmit information online.

Bolton says laptops used by the fire department are equipped with applications written to handle typical emergency calls as well as disaster recovery measures, if necessary. Also, wireless connectivity and communication devices that generate their own power are crucial at a disaster site, he says.

"We're responsible to the citizens of San Diego. We cannot predict when they are not going to be feeling well or having a traffic accident," Bolton says. "It's not acceptable to give excuses. If it means we have to grab some equipment and we'll run to get to this call -- it is something we need to do."

Eugene Grygo contributed to this story.

This story, "A matter of minutes" was originally published by InfoWorld.

Copyright © 2002 IDG Communications, Inc.

  
Shop Tech Products at Amazon