Planning for a Metro-Area Armageddon
Regional data center fail-over sites just won't cut it in a post-9/11 world.
Computerworld - The classic model of security covers three areas: integrity, confidentiality and availability. Historically, my focus has been on the first two, and my organization has always had a separate disaster recovery team. Now my security group has responsibility for that area as well.
We haven't been focusing on resilience, but rather on recovery from real disasters. A system is resilient if it continues to function well after losing a disk or a network connection. Our applications are designed to work like that, and the general IT support teams ensure that our systems continue to run in the face of these common failures.
Our responsibility kicks in for events that are much less likely but whose impact would be more difficult to deal with, such as a tornado or an outbreak of infection in the support staff.
The people who were previously responsible for this delightful set of problems have laid excellent groundwork. We have two data centers with applications spread across them so that, in theory, either one can handle the full load in the event that the other goes down.
Theory Tested
Until we took over, however, that theory had never been tested. We had never actually tried to move all of our applications to a single site. It appears as though everyone was worried that running everything from one place would cause an overload or that some critical service might be missed.
In the past few weeks, we finally bit the bullet and moved everything we do to one data center. That took a month of hard work.
Each weekend, I worked to get another application team to do a move to the other site. Then I spent each Sunday doing quality assurance checks, ready for either a frantic back-out on Sunday evening or a level of proof that everything would be fine on Monday morning.
It was surprising to me how few applications had a decent quality assurance plan. How support staffers know whether things are working properly seems to boil down to the volume of user complaints they receive. At least we've started them down the road to effective monitoring.
We had a few busy Sundays as we uncovered unknowns and backed them out. One Monday at 7 a.m., when we found that a fiber wasn't connected at the second site, we did the fastest back-out I've ever been involved with.
At the end of all the nail-biting moments, we ran all of our applications from a single site for an entire week. We did this



- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Overcome Top 7 Admin Challenges of Active Directory
- As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
- Insiders Can Ruin Your Company. Take Action.
- Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
- Top Solutions and Tools to Prevent Devastating Malware
- Custom malware frequently goes undetected. According to Forrester Research, the best way to reduce risk of breach is to deploy file integrity monitoring...
- X-Ray of the PCI Process-4 Proactive Steps
- This white paper from Forrester Research Inc., helps break PCI into understandable components. Security and risk professionals will gain knowledge and insight into...
- Identity Governance: The Business Imperatives
- This white paper describes the business challenges and opportunities that are driving interest in Identity Governance while discussing considerations your organization should make... All Security White Papers
- Live Webcast
Playing Defense: Staying on Top of Your Disaster Recovery Game - When it comes to disaster recovery, rapidly growing data volumes, distributed computing models, and new technologies all combine to present an ever-changing playing...
- Introduction to VMware vCenter Site Recovery Manager 5
- Traditional disaster recovery solutions are often too expensive, complex and unreliable to meet business requirements. As a result, IT departments are hesitant to...
- The Top Ten Secrets to Avoiding SAN Performance Problems
- Maintaining peak performance while simultaneously addressing the root cause of SAN errors is challenging. Learn the most common SAN problems and explore new...
- Deduplication Without Compromise
- Go inside Quantum's scalable, high-performance, multi-protocol new DXi deduplication appliances, designed to make backup much more effective. Discover how the new future-proof DXi6700...
- Director of Disk Products Discusses DXi6700
- Discover how the new DXi 6700 series of deduplication appliances provide investment protection and a future-proof feature set, all while delivering fast, scalable,...
- Playing Defense: Staying on Top of Your Disaster Recovery Game
- When it comes to disaster recovery, rapidly growing data volumes, distributed computing models, and new technologies all combine to present an ever-changing playing... All Security Webcasts