Planning for a Metro-Area Armageddon
Regional data center fail-over sites just won't cut it in a post-9/11 world.
Computerworld - The classic model of security covers three areas: integrity, confidentiality and availability. Historically, my focus has been on the first two, and my organization has always had a separate disaster recovery team. Now my security group has responsibility for that area as well.
We haven't been focusing on resilience, but rather on recovery from real disasters. A system is resilient if it continues to function well after losing a disk or a network connection. Our applications are designed to work like that, and the general IT support teams ensure that our systems continue to run in the face of these common failures.
Our responsibility kicks in for events that are much less likely but whose impact would be more difficult to deal with, such as a tornado or an outbreak of infection in the support staff.
The people who were previously responsible for this delightful set of problems have laid excellent groundwork. We have two data centers with applications spread across them so that, in theory, either one can handle the full load in the event that the other goes down.
Theory Tested
Until we took over, however, that theory had never been tested. We had never actually tried to move all of our applications to a single site. It appears as though everyone was worried that running everything from one place would cause an overload or that some critical service might be missed.
In the past few weeks, we finally bit the bullet and moved everything we do to one data center. That took a month of hard work.
Each weekend, I worked to get another application team to do a move to the other site. Then I spent each Sunday doing quality assurance checks, ready for either a frantic back-out on Sunday evening or a level of proof that everything would be fine on Monday morning.
It was surprising to me how few applications had a decent quality assurance plan. How support staffers know whether things are working properly seems to boil down to the volume of user complaints they receive. At least we've started them down the road to effective monitoring.
We had a few busy Sundays as we uncovered unknowns and backed them out. One Monday at 7 a.m., when we found that a fiber wasn't connected at the second site, we did the fastest back-out I've ever been involved with.
At the end of all the nail-biting moments, we ran all of our applications from a single site for an entire week. We did this


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Driving Secure Enterprise File Sharing and Syncing in the Enterprise
- GroupLogic's new activEcho is the industry's only secure Enterprise File Sharing and Synching solution that balances the need for simplicity for the end...
- The Enterprise File Sharing Option
- Enterprises and IT departments need to address several critical security issues when considering file sharing and syncing products. Many of today's solutions do...
- Security Strategies to Virtualizing Internet-Facing Applications
- The IT organization at Intel has set a goal to transition their enterprise to a private cloud for their Office and Enterprise applications....
- Cloud Security Planning Guide
- Cloud security considerations span protecting hardware and platform technologies in the data center to enabling regulatory compliance and defending cloud access through different...
- Cloud Security Vendor Round Table
- This vendor round table guide will help you to evaluate different cloud technology vendors and service providers based on a series of questions... All Security White Papers
- Live Webcast
Data Privacy and Protection in Production Environments: New Research from Ponemon Institute - Date: Wednesday, June 13, 2012, 1:00 PM EDT / 10:00 AM PDT
In a recent study conducted by Ponemon Institute, fifty-five percent of respondents... - Data Privacy and Protection in Production Environments: New Research from Ponemon Institute
- Date: Wednesday, June 13, 2012, 1:00 PM EDT / 10:00 AM PDT
In a recent study conducted by Ponemon Institute, fifty-five percent of respondents... - Security Certifications 101 - BlackBerry and all those acronyms what do they mean and why they matter?
- FIPS, Common Criteria, CAPS, AISEP, NFC, NIST, Fraunhofer SIT, CESG, DSD - these are just some of the government and industry certifications which...
- BlackBerry PlayBook OS 2.0 Security Overview
- The presentation provides an overview of BlackBerry PlayBook OS 2.0 security capabilities and features, including: BlackBerry® Balance™ technology, BlackBerry® Bridge, data-at-rest protection, and...
- BlackBerry NFC Security Overview
- The presentation on NFC security will provide an overview of the security protections built into the BlackBerry platform to protect users, application developers...
- Playing Defense: Staying on Top of Your Disaster Recovery Game
- When it comes to disaster recovery, rapidly growing data volumes, distributed computing models, and new technologies all combine to present an ever-changing playing... All Security Webcasts