Put Your IT Eggs In Different Baskets

The terrorist attacks on the U.S. last September fundamentally changed the way some IT managers think about disaster recovery.

"It's no longer a matter of planning what to do should fire or flooding prevent access to buildings," says Bob Fucito, vice president of crisis management and business continuity at investment banking firm BNP Paribas. Today, businesses have to prepare for the ultimate security risk: what to do when people and buildings are intentionally targeted and destroyed.

Fucito should know. His duties include managing disaster recovery for Paris-based BNP Paribas' North American operations. And he says he's thankful that his company's executives supported the creation of a disaster recovery plan that emphasizes distribution of IT resources - two years before the Sept. 11 attacks. The company had to evacuate its New York City building after the attacks, but Fucito says having two separate data centers and a contract with a hot-site recovery provider put BNP Paribas in a better position to continue doing business.

BNP Paribas isn't alone in thinking that having IT resources in one building or on a single network isn't a good idea. Other major organizations, such as The Boeing Co., United Air Lines Inc., the Chicago Board of Trade and the U.S. Postal Service, try to mitigate the risk to IT resources by distributing data, applications and network infrastructure. They also have redundant communications links at the ready.

All of those organizations have the same goal: to quickly recover or even seamlessly continue doing business when disaster strikes. But they have different ways to accomplish it. Here are four approaches that major companies are using to stay prepared.

1 Redundancy and multiple routes: UAL Loyalty Services Inc. in Schaumburg, Ill., an online customer service unit of United Air Lines parent UAL Corp., is installing duplicate systems at two company-owned and -operated data centers. Both are in the Chicago area, says Igor Rafalovsky, director of networking and security, but the facilities are geographically separated.

A metropolitan-area network capable of gigabit speeds, known as a GigaMAN, connects the two centers, Rafalovsky says. Moreover, each data center is connected over T3 lines running to separate Private Network Access Points (P-NAP), which are Internet backbone connection points owned and operated by Internap Network Services Corp. in Seattle.

And even at the P-NAPs, traffic going to and from the two UAL data centers runs across multiple Internet backbones from different providers, such as Sprint Corp., WorldCom Inc. and others. A P-NAP may have up to six or eight backbone providers online and available at any given time.

Both UAL data centers host Web servers, applications and databases. Disk storage is synchronized in real time over the GigaMAN, and both data centers are online all the time. "In the case of a catastrophic failure of one data center, the other one just picks up the traffic, in many cases without interruption . . . or manual intervention," Rafalovsky says.

2 Outsourced hot sites: When BNP Paribas IT employees evacuated their building in New York in response to the terrorist attacks, they moved to the company's other data center in New Jersey to continue operations. Even so, Fucito says his firm also has a contract with New York-based SchlumbergerSema to provide off-site hot sites.

Hot sites duplicate the mission-critical parts of a company's IT systems in secure buildings miles away from the primary sites. IT workers can go to hot sites to initiate recovery or simply resume work.

John Kersley, SchlumbergerSema's vice president of business recovery, describes how it works: A corporate customer configures its own data centers to automatically mirror data and applications to the appropriate hot-site recovery center (or centers). That company's IT employees are assigned physical positions (desks and workstations) at a specific center and instructed on how to get there if there's a crisis. When the company's workers are in place at the recovery center, it becomes a matter of patching the data through to the off-site desktops.

Hot sites are especially appealing to financial services organizations like BNP Paribas and the Board of Trade Clearing Corp., the clearinghouse for the Chicago Board of Trade, which has a hot-site contract with SunGard Data Systems Inc. in Wayne, Pa.

The concept also has value for major retailers. For example, Leeds, England-based ASDA Group Ltd. - a chain of food and clothing superstores owned by Wal-Mart Stores Inc. in Bentonville, Ark. - has an agreement with SchlumbergerSema to send select members of its IT staff to a global business recovery center if a disaster closes ASDA's own IT facilities.

3 Blend of internal and external redundancy: SunGard and SchlumbergerSema say the trend is toward using hot sites for disaster recovery. But Damian Walch, vice president of consulting at T-Systems Inc. in Lisle, Ill., sees the trend heading in the opposite direction.

"Companies are looking at internalizing their disaster recovery systems and moving away from hot-site providers," Walch says. However, he acknowledges that the hot-site idea won't go away anytime soon and that disaster recovery strategies often involve a blend of approaches.

In fact, extremely large and diverse organizations, particularly those using mainframes in addition to PC servers, foster redundancy through a mix of multiple in-house data centers and mirrored hot sites.

Chicago-based Boeing, for example, has to consider the specific needs of business units and the communication challenges that come with having a multitude of far-flung locations.

"Distributed hot-site contracts tend to be more expensive with mainframe environments. We try to consolidate and centralize IT but also avoid the risk of too many megacenters . . . by having geographic separation [of IT facilities]," says Steve Guzek, Boeing's program manager for disaster recovery.

Guzek maintains that focusing on networks is the key to eliminating single points of failure.

4 Satellite backup: Bob Otto, vice president of IT at the U.S. Postal Service (USPS) in Washington, says he could see the smoke from his office after the aircraft struck the Pentagon on Sept. 11. "We then evacuated our computer center of our Washington facility and set it up for remote management from our Raleigh [N.C.] disaster center and immediately instructed our data centers in California and Minnesota to begin backing up to Raleigh," Otto says.

Then Otto's group learned that the New York attacks had knocked out the frame-relay links connecting facilities in New York to the postal service's wide-area network. So the USPS pointed its VSAT satellite system toward New York, and the city's post offices were almost immediately back on the network.

It was all part of the plan, says Larry Wills, manager of distributed computing for the USPS. While frame-relay land lines are the primary network connection to thousands of post offices across the U.S., the USPS has 11,000 VSAT installations nationwide, Wills says. The VSAT services are provided by SpaceNet Inc. in McLean, Va.

Generally, the switch-over is automatic: When frame relay goes down, a satellite connection takes over. Wills says post offices generally don't even know when it has happened.

Cope is a Computerworld contributing writer. He can be reached at jc@jamescope.com.

Special Report

The Security Action Plan

Stories in this report:

Related:

Copyright © 2002 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon