Data Resurrection

New technologies, enlightened administration and just plain old-

Feb. 23, 1993. Brokerage workers at the bombed-out World Trade Center in New York carried wastebaskets stuffed with order tickets down 90 flights of smoke-filled stairs. Only the previous day's transactions had been entered and backed up in the company's computer systems, so without those tickets, a day's business would have joined the terrorists' casualty list.

That's how precious up-to-the-minute data can be in a fast-moving operation, says Jim Manias, a vice president at Advanced Systems Concepts Inc. in Hoboken, N.J.

In a time when customer call records are worth their weight in gold, data recovery is no longer a matter of following a regular backup regimen and occasionally grabbing an off-the-shelf utility to recover files from a trashed hard drive. Today's data managers emphasize redesigning storage systems to make data restoration faster, more reliable and more complete. Along with this comes extra planning, often extra service and additional personnel and infrastructure costs.

These days, data recovery systems are typically grouped under a new functional heading: business continuity. Their purpose: Make a copy of mission-critical data available at the speed needed to avoid business losses and at a cost commensurate with the data's value. Analysts say an effective recovery plan must examine business processes to identify the data that's minimally necessary for staying operational, how long that data can be unavailable without affecting customers and which applications are needed to access the data conveniently.

The demand for fail-safe data recovery appears to be largely a response to the increase in around-the-clock commerce and the sheer amount of data being generated. According to "The Cost of Lost Data," a 1999 study conducted by Pepperdine University professor David Smith for enterprise storage vendor Legato Systems Inc., U.S. companies spent $11.8 billion to recover data during the previous year. In any given year, 6% of PCs will suffer serious data loss, usually because of human error, hardware or software failures or viruses, according to the study.

While the risk may be growing, not enough companies have business-continuity plans and procedures in place, according to surveys commissioned by Comdisco Inc., a vendor of such plans. According to its most recent Vulnerability Index released in November, 33% of 200 large organizations and government agencies said they lack disaster plans, down from 45% two years earlier. Comdisco found Internet-dependent companies especially vulnerable to data loss and system downtime.

The best, but most expensive, option is to run a mirror site that contains copies of applications and data, perhaps located at the other end of a leased line miles away from the main site, where a natural or man-made disaster is unlikely to strike simultaneously. Mirror sites are becoming more popular with high-volume e-commerce sites that can't risk even a few minutes of botched transactions and dead Web links. They can take over in seconds when equipment goes down.

Cheaper alternatives include shadowing, or replication, software like Remote Shadow from Advanced Systems Concepts and add-on software sold by enterprise database and storage-area network vendors. Shadowing captures drives' disk-write operations and sends them over remote links to drives at a second site. Another option, server clustering, either disperses the processing load so if one server fails, another can take over, or keeps mirrored servers running in parallel, making switchover nearly instantaneous. Traditional backup and restoration systems require a much longer turnaround, though vendors like EMC Corp. sell hardware and software that boost tape's effective transfer rate, driving recovery times down to a few hours for even large databases.

Generally, the cheapest recovery is achievable with traditional data recovery tools like Symantec Corp.'s Norton Utilities and PowerQuest Corp.'s Lost and Found. They remain important lifesavers in many companies. Still, recovering data with such utilities can be slow, tedious and frequently unsuccessful, so many companies outsource the job to specialists like Ontrack Data International Inc., Data Recovery Labs and DriveSavers Data Recovery.

Some outsource the entire process, from planning to hardware installation to recovery. Three main vendors compete in this market: Comdisco Continuity Services, SunGard Recovery Services Inc. and IBM Business Continuity and Recovery Services. All offer yet another continuity option: mobile recovery trucks that can bring your data, and the hardware and applications needed to access it, to your door.

Outsourcing has taken yet another turn in recent months toward network storage centers that keep backups handy at the end of a high-speed data link. These "storage utilities," or storage service providers, were pioneered by Storage Networks Inc., says Rick Miller, an analyst at Cahners In-Stat Group in Newton, Mass. "You're pretty much guaranteed to never lose a single byte of data," Miller says. "Because bandwidth is becoming more economical, it's feasible for smaller companies to have a high-speed connection to a data center."

Storage service providers can help cut management and maintenance costs, which account for nearly 50% of the average company's storage outlay, by spreading personnel and resources over multiple customers' data, Miller says.

Overcoming earth, wind and water to keep data safe

Word came early in the evening on May 3, 1999: An F5-level tornado -- the deadliest kind, with winds exceeding 260 miles per hour -- had hit Oklahoma City. Kevin McDonald, director of information services at Tontitown, Ark.-based PAM Transportation Services Inc., feared the worst. Oklahoma City housed PAM's truck terminal and the dispatch center for the company's subsidiary, Choctaw Express Inc. McDonald had to keep that center up and running.

So he sent a damage-assessment team from Arkansas. The team arrived to find Oklahoma City a mass of debris and devastation. Although the twister barely kissed the dispatch center grounds, coming only within 200 yards of the metal-frame building, it had wreaked havoc.

Most of the contents, including some PCs, had been sucked out of the building. A diesel truck was hurled a half-mile away. Trailers exploded from the air-pressure changes. Windows were blown out of the office, and rain had drenched much of the electronic equipment. Fortunately, no employees were hurt: They had waited out the storm huddled in the long, narrow, four-ft.-deep grease pits used to service trucks.

The team wrapped up the communication equipment to protect it from rain. McDonald called SunGard Recovery Services Inc. in Wayne, Pa., and formally declared a disaster under the terms of PAM's service agreement.

SunGard contacted its Metro Recovery unit in Atlanta and sent a truck loaded with basic equipment configurations previously specified by PAM on the nearly 900-mile journey to Oklahoma City.

PAM called Little Rock, Ark.-based Alltel Corp. and purchased a duplicate phone system. A local interconnect company shipped it to Oklahoma City. In the meantime, incoming phone calls were routed to PAM headquarters.

When the maintenance director used a generator to restore power, McDonald discovered he was luckier than he originally thought: The office's frame-relay link, router and phone system still worked. But that didn't mean everything was back to normal. "The building was unusable," McDonald says. "There was no way anyone could work in there."

SunGard's Metro Recovery people arrived around 4 p.m. the day after the tornado, and "they basically picked up everything," McDonald says.

By 6 p.m., a bit more than 24 hours after everything was blown to pieces, the system was fully restored, and the dispatch center was back online.

Except for some initial confusion, McDonald says PAM's workflow was never seriously disrupted. That's largely because within an hour of the strike, PAM's in-vehicle satellite messaging and tracking system notified drivers that it would handle dispatching while Oklahoma City was down.

One factor, McDonald says, must be added into the disaster preparedness equation: Don't overlook the human element. When visualizing recovery scenarios, realize that employees may have overwhelming personal obligations to help family and friends during catastrophes, and that will limit the ability to staff a backup site internally.

Sharon Savings Bank wasn't as lucky last September. One of its bank buildings in Darby, Pa., sits next to a creek that was flooded by Hurricane Floyd's torrential rains. "It kind of took us by surprise," recalls network administrator Shirley Martin. "We didn't have much time to get outside ourselves."

The next day, workers found eight feet of water in the building. Twenty-two PCs were smashed to the floor by raging floodwaters. A nearby administrative building was also out of commission, so workers in Martin's building had to set up shop in a nearby mortgage office.

The most critical data was safe: The bank's main database was kept off-site at an Electronic Data Systems Corp. division in Florida. But important documents, policies and account balances created in Microsoft Office and specialized applications were on the hard drives of the lost PCs.

Paper copies and tape backups (the latter stored in a bank vault) weren't viable restoration sources, so Martin asked her local maintenance contractor to remove four soggy hard drives and gauge the odds of data recovery. The flood had deposited caustic substances on the drives, so the consultant recommended sending them to Ontrack Data International, where technicians in "clean rooms" could safely remove the drive's magnetic platters and use special instruments to read the remaining data.

Over the next two weeks, Ontrack shipped back CD-ROMs containing nearly all the original data. "I'd say it saved us about three or four months worth of overtime work," Martin says.

Certifiable customers

The company: Verisign Inc., a Mountain View, Calif., supplier of online digital certificates, with 400 employees.

The data problem: Verisign servers must be available around the clock to handle requests for certificate authentication from customers who need such approvals to offer secure transactions at their Web sites. Corporate customers such as Ford Motor Co. and Hewlett-Packard Co. buy groups of certificates for internal security.

Reliability is mission-critical. "I think we realized it was a requirement of doing business," says John Ferguson, Verisign's director of production services. "Companies are outsourcing a part of their IT business to us," so strong assurance of around-the-clock availability "is critical to getting them to sign a contract," he says. In fact, it's specified in service-level agreements.

The solution: A "hot site" at an undisclosed East Coast location maintained by Comdisco Continuity Services provides the duplicated data and systems Verisign would need to stay online in case a disaster hit Mountain View. An Advanced Recovery Site (ARS) -- actually a 215-sq.-ft. caged area at Comdisco's site -- stores relevant data and what Ferguson calls "long lead-time" services: Internet service provider connections and links to merchants that would be hard to quickly restore. "It's a scaled-down, more consolidated view of our services," he says. An Oracle8.15 utility writes database transaction logs to the ARS, and NSI Software's Double-Take replicates only the data that has changed, saving on network bandwidth costs.

Staff at a nearby Verisign office were trained to perform the company's elaborate "key ceremonies" and other security safeguards. Comdisco also maintains a site that could take over Verisign's customer-service functions. "It's not an instant recovery," Ferguson says. "There is an element of manual changeover."

The results: After a monumental effort to set up the admittedly complex operation, Verisign hasn't had to use the ARS. "But I think we can sleep at night," Ferguson says.

Deloitte fights drive crashes in big notebook fleet

The company: Deloitte & Touche, a Big Five accounting firm based in New York.

The Data Problem: Senior PC LAN Technician Gino Ahn manages data recovery services for the firm's 3,500-plus notebook PCs, many of which hold hard-to-replace accounting information collected at client sites and entered in customized auditing software with complex links to Microsoft Excel spreadsheets. About every three weeks, a laptop (usually a standard-issue Toshiba Tecra 8000) has a data recovery problem that Ahn is called on to solve. "It's usually the hard drive that goes bad," says Ahn, adding that desktop drives fail at a much slower rate of approximately one per year.

The Solution: The company has a service contract with Ontrack Data International. Ontrack charges $500 to $1,500 to recover data from drives shipped to its laboratories. Estimates cost approximately $100.

"Before they proceed with any recovery, they get back to us with costs and a list of the data that can be recovered," Ahn says.

Rescued files are returned within days on CD-ROMs shipped via overnight mail. Deloitte's information technology staff must then reintegrate the files, which depend heavily on a special index file and linked libraries for their operation.

The results: "There are some occasions where the data can't be recovered," Ahn says, but at least two-thirds of the time, the paying department opts for a full recovery effort, and 80% or more of the data is typically recovered.

Ahn says that in addition to saving in labor costs that would have been spent recreating the data, an accountant's sanity is often rescued. One recently needed a two-day turnaround to get data back in time for the weekend, when he planned to work feverishly to meet a deadline. Ahn says the $4,000 bill was worth every penny.

Essex is a freelance writer in Antrim, N.H.

Copyright © 2000 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon