As American Eagle Outfitters learned in July, even if you do everything right to ensure you have disaster recovery and business continuity plans in place, Murphy's Law sometimes takes over. And problems can be compounded if you rely on an outsourcer for disaster recovery services.
The multibillion-dollar clothing retailer suffered an eight-day Web site outage because its Oracle backup utility failed -- and then an IBM disaster recovery site wasn't up and running as it should have been, according to a report from StorefrontBacktalk.com.
IBM did not respond to requests for comment on the outage. American Eagle did not dispute StorefrontBacktalk.com's basic account of what happened, though a spokeswoman said a few details about the incident were incorrect.
According to Evan Schuman from StorefrontBacktalk.com, which monitors retail Web sites, the outage began with series of server failures.
Schuman, who said he spoke with an unnamed IT source at American Eagle, said a storage drive failed at an IBM off-site hosting facility. That failure was followed by a secondary backup disk drive failure. Once the drives were replaced, the company attempted a restore of about 400GB of data from backup, but the Oracle backup utility failed, possibly as a result of data corruption. Finally, American Eagle Outfitters attempted to restore its data from its disaster recovery site, only to discover the site wasn't ready and could not get the logs up and running.
"I know they were supposed to have completed it with Oracle Data Guard, but apparently it must have fallen off the priority list in the past few months," the source told Schuman.
In an e-mail response to questions from Computerworld, a spokeswoman for American Eagle Outfitters said StorefrontBacktalk.com was "off track" by saying the retailer should have directed Web traffic to its mobile Web site. That's because the mobile site was also down.
"Second, despite the slant of some reports, we worked closely with IBM in the spirit of partnership to resolve the issue as quickly as possible for our customers," she said.
In a follow-up interview, Schuman said it was only an initial story by StorefrontBacktalk.com that had stated American Eagle Outfitters should have directed traffic to the mobile site, and that story was quickly changed to acknowledge that the company was unable to use its mobile site as a backup.
What is important to note, Schuman said, is that because American Eagle Outfitters did not use a parallel architecture, with separate product databases for both its online and mobile website, when one site went down it prevented the company from being able to redirect customers to the other in order to make their purchases.
The outage raises a question for companies wondering about their disaster recovery plans: Should IT staffers be assigned to periodically audit a service provider and perform recovery drills? Experts say yes.
"You should never give up ownership or responsibility or governance of what's going on with your data to your service provider partners," said Roberta Witty, an analyst at research firm Gartner Inc. "It's still your data. It requires a fair amount of due diligence to make sure third-party service providers have all the processes and procedures in place to ensure they meet your recovery needs."
And as service providers increase the use of cloud computing to support their hosting facilities, the problem of ensuring that they deliver the data backup and disaster recovery services they've promised will only become more difficult, Witty said.
"You don't know necessarily where your data will be. There is no definitive architecture. And, your data moves around," she said.
Schuman pointed out that American Eagle Outfitters appeared to have done everything right when it came to ensuring business continuity: it had backups of its backup -- and an off-site disaster recovery facility.
"The problem is nothing is foolproof," he said. "You have two conflicting interests here, which are at the core of IT today: cutting costs and increasing efficiency, and maintaining security in case the unlikely happens."
Schuman and others suggest one possible solution: using in-house IT personnel to monitor service providers to ensure logs are being maintained and backups are being performed as expected.