9/11: Top lessons learned for disaster recovery

A renewed focus on the workforce is biggest change over the past decade

In the decade since the Sept. 11, 2001 terrorist attacks, physical security, human contingency planning and an evolution in technological capabilities have improved the odds that business can carry on during -- and after -- a disaster.

While some rules were imposed by the federal government, corporations have in general been doubling down on their own disaster recovery capabilities.

Internal cloud architectures, or virtualization, as well as the ability to run multiple live data centers with active failover, have decreased the time between system failures and data recovery points.

But perhaps the single biggest change to emerge in the post-9/11 world -- prompted in part by later natural disasters such as Hurricane Katrina -- has been a new focus on keeping workers working when corporate systems go down.

Workforce resilience

In the years since 9/11, corporations have been forced to consider more flexible work environments that allow employees to work remotely during a disaster through the use of virtual private networks (VPNs) or other means, such as hand-held devices like smart phones.

Gartner analyst Roberta Witty believes the most important 9/11 lesson may seem altruistic, but it's really about survival of the fittest: Companies have to care for their workforce.

Gartner uses the phrase "Workforce Resilience" to cover best practices that ensure workers have access to Internet services, power for mobile devices, use of VPNs and that call trees and mass notification services are in place.

"One thing that came out of that event was the long lapse of time between when it happened and when information could be distributed," Witty said. "So, being able to tell [workers] about the event became important. You want to communicate with them every hour, whether it's new information or not."

Emergency notification service companies such as Everbridge, SunGard, Omnilert and Federal Signal have seen a tremendous uptick in automated call tree services.

Municipal services have also stepped up their capabilities to help businesses and workers function during a disaster. When millions in the Northeast were left without power during Hurricane Irene last month, municipal offices that had power set up Internet cafes and mobile charging stations as a public service. By coordinating with a local Office of Emergency Management, companies can quickly find out how state and city officials are dealing with a disaster.

"Nothing gets done if you don't have the group of people who are on your disaster recovery teams ... able to come to your aid," Witty said. "FEMA has done a great job since 9/11 and Katrina at mobilizing state agencies down to the local level."

Public services and corporate disaster recovery teams have stepped up their use of social media, such as Facebook and Twitter, to keep employees informed and communicate with key players. Many companies have even created the position of social media officer to manage online communications and ensure corporate sites remain updated.

"It's also about controlling the rumors," Witty said.

In addition, some companies now consider having cots, flashlights, food and water on hand for employees who stay in the office and have a remote recovery site in operation to make sure they can restore critical systems as quickly as possible.

Risk management

Even in the aftermath of 9/11, IT managers said they had to fight for money to implement disaster recovery plans and technology.

What began with 9/11 but evolved with numerous cases of fraud and rogue trading, was the concept that risk management needed to be a part of disaster recovery planning.

"Chief risk officers who used to never ... look at the IT side of things, do talk more about IT risk as becoming part of something they need to incorporate as part of an enterprise risk management capability," said Rodney Nelsestuen, a senior research director at industry consultancy Tower Group.

The U.S. government saw to it that in the years following the terrorist attacks, the financial industry spent hundreds of millions of dollars upgrading internal systems to comply with the Patriot Act. That law required financial services companies to beef up their ability to flag suspicious transactions and customers.

"The fact that there's evil out there -- 9/11 drove that, but I don't think people were looking internally to that," Nelsestuen said.

According to Tower Group, after 9/11 about 39% of IT budgets went to integrating back-end systems; 34% was spent on new software; and 24% was used to upgrade IT infrastructures, such as server, network and and storage systems. Another 2% was spent on outsourcing services with operators of customer databases, such as Regulatory DataCorp International LLC (RDC) in New York.

Firms such as Merrill Lynch, whose headquarters was located right next to ground zero and lost its primarydata center for six weeks, performed a gap analysis to determine what was missing and what might be needed respond to another disaster.

Analysts today say regular gap analysis is still a key component to disaster preparedness.

Cantor Fitzgerald LP, a bond-trading firm located in the World Trade Center, lost 658 employees and its primary data center on Sept. 11. It was the worse-case scenario of what could happen.

"They were one of the major bond traders on the globe. We had not imagined the scope of that disaster," Nelsestuen said.

What was remarkable about the recovery effort with Cantor Fitzgerald was that its competitors jumped in to lend a hand and took over its bond trades so that the firm could continue operations as it recovered from the devastation.

"There was no one who planned that: "If we have a disaster, will you do our processing and credit it to us," Nelsestuen said. "But, those are the kinds of things that came out of that level of disaster.... People [had] to start thinking about the human contingency that we'd never thought about before."

RPO and RTO

Businesses disasters are classified in three categories by Tower Group: natural, such as hurricanes and earthquakes; technological failures; and human, either on purpose or by accident. But no matter what causes a disaster, the nature of how best to recover is constantly being reexamined, Nelsestuen said.

"Companies are asking: 'How can we change our technology infrastructure to make it more recoverable and dynamic?' When failure occurs, your data is still preserved up to that point," he said.

Disaster recovery and business continuity today are often thought of in terms of recovery point objectives (RPO) and recovery time objectives (RTO). In other words, how much data is a company willing to lose if its systems go down.

For example, a company that synchronously replicates all backups to separate data centers that are actively up and running 24/7 has created an architecture with a tight RPO and RTO. A firm that allows data to be replicated off site asynchronously or backed up only to tape, expects it will lose some of the data being transmitted at the time of failure and assumes it will take longer to restore systems.

"The whole concept before was we have a production data center and then we have the disaster recovery site and that will take 24 to 72 hours to set up and get going," Nelsestuen said. "Now they're looking at making internal backups between the two. There are many institutions running data in multiple data centers throughout the day now."

Virtualization has allowed firms to be more dynamic in their recoveries because of self-healing systems and automated failover capabilities; when one server or data center goes down, another with the same data can come up almost instantly.

"It's a lot more dynamic now with the ability to...install backups and roll it back to any point in time," Nelsestuen said. "I've even seen some institutions look at creating a paper trail, so that if all else fails -- get out a slide rule and piece of paper."

Geographic distances were rarely considered prior to 9/11. Most companies were comfortable replicating data intercampus or to a facility within a few miles of a primary data center. A few firms, such as Nasdaq, actually replicated data out of state. Even so, some still get it wrong, Nelsestuen said.

"I know a company that has data centers in Florida and Galveston, Texas, which means a single hurricane could take both of the sites down," he said.

The cloud

Cloud services, or application and storage service providers, are nothing new. Even before 9/11, companies such as Storage Networks were offering to store business data in an offsite facility that could be accessed remotely in times of disaster.

Today, a combination of public and private cloud services offer a more robust protection scheme where the most critical business data - that which is needed to keep revenue coming in - is replicated to a service provider or stored in a corporate cloud accessible from any location.

Public clouds are particularly advantageous for small-to medium-sized businesses because the services offer enterprise-class disaster recovery capabilities at a cost that's affordable. But, experts warn companies not to hog the bandwidth. The more data they want to recover, the more it'll cost. So they should store only what's needed to get the business running again -- not up to full speed.

Another bit of advice: When choosing a cloud service provider, companies should make sure the provider is on a different power grid.

"A business may think they're pretty well covered because they're replicating data to an offsite data center miles away, but it may be on same power grid as their office building," said Al Berman, executive director of the Disaster Recovery Institute International (DRI) in New York City. If that power grid goes down, the company and is offsite data center could both be affected at the same time.

Nelsestuen believes cloud services are over-hyped, particularly in the financial services industry. While Tower Group estimates that spending on cloud servicers will grow to $27 billion by 2015, that's only 5% or 6% of all IT spending in financial services.

"There's still so many issues associated with security and operational aspects of that," he said. "There is a huge effort to try to create internal clouds. They're virtualizing their platforms, and the hardware and the networks, so they have a continuous backup. But that's all internal."

In the end, said Witty, 9/11 taught companies lessons that are still being put into practice.

"There were huge gaps in what companies were able to do before 9/11," he said. "9/11 showed us there weren't business continuity programs [in place]. It showed us that managing your people and being able to track your people is important, [as is] being able to care for them, and ... having the employee assistance programs in place.

Working with police and other government agencies -- all that became extremely important [too]."

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed . His e-mail address is lmearian@computerworld.com.

Join the discussion
Be the first to comment on this article. Our Commenting Policies