Do-it-yourself disaster recovery

While most network executives are looking at server virtualization to reduce hardware costs, the technology could also offer a budgetary bonus: less-expensive disaster recovery. With disaster-facility contracts easily costing upward of $30,000 per month, killing off that budget line item is tempting.

"One of the hardest parts budget-wise [in IT] is disaster recovery and its incredible price tag. Traditionally, you had to duplicate everything you've got in one data center to another and then pray that you never have to use it," says Jason Brougham, enterprise network manager for American Medical Response (AMR Inc.), a Greenwood Village, Colo.-based ambulance service company with 18,000 employees and 255 locations nationwide. "The only way you can afford to build true disaster recovery is to run hot to hot, with both data centers active all the time on servers using virtualization."

Companies with virtualized servers and storage-area networks (SAN) in disparate data centers already have most of the pieces in place to take on in-house disaster recovery: They have a potential backup location in a faraway spot (that likely won't be affected by the disaster). They have network connections between the two sites. Their virtualization and load-balancing software would let one server or SAN take over for another almost instantly if a short-term failure occurs (from routine maintenance to a few hours of blackout).

Network executives easily can make the common-sense leap for full-fledged in-house disaster recovery. If servers float away in a storm or are otherwise permanently damaged, one data center can become the backup for another. Even if you don't bring disaster recovery completely in-house, virtualization can help save money on the facility contract. Fewer virtualized servers do the work of more physical servers.

"The pieces of hardware become less critical in a virtualized environment -- if there are 400 servers, with virtualization you could conceivably do disaster recovery on 20 servers. That might be reaching, but that's the idea," says Vivian Knoerle, principal consultant for Intellinet, a virtualization and disaster recovery systems integrator in Atlanta. "If you do still use a disaster recovery facility for hosting, the expense and hardware requirement can be less, because the number of physical servers can be far less."

Such is the case for insurance company Mutual of Enumclaw, based in Enumclaw, Wash., with 16 offices in Washington, Idaho, Oregon and Utah.

"We approach disaster recovery like a life insurance policy. We don't want to have too much, but we want enough," says John Weeks, IT director for Mutual of Enumclaw, which uses virtualization software from VMware, an EMC Corp. company. "The virtual capability simplifies our recovery efforts."

Critical insurance-related processing runs on the mainframe, so Weeks currently contracts with a disaster-recovery facility for the mainframe. But the company relies on Intel-based IBM xSeries servers for other applications such as Citrix, which it runs via a virtualized server farm. With VMware, Mutual of Enumclaw has reduced the number of physical servers it uses by about 35%. (Weeks has also begun rolling out virtualized IBM blade servers for the server farm. A dual-blade box hosts up to three virtual servers while the quad blade hosts as many as five, he says.)

This translates into a lot less hardware required for disaster recovery. Before implementing VMware, the company contracted with its disaster recovery facility to maintain a similar PC server environment -- one backup server for every production server. "We have simplified that [disaster] model by going to a virtual model," Weeks says. "VMware is hardware-agnostic, and we can restore systems without identical or near-identical hardware. This creates flexibility and expands our options regardless of what site we recover to, either our own site with older hardware or a new site with all new hardware." Mutual of Enumclaw also reduced its network and support requirements for disaster recovery, Weeks says.

Still, like all things IT, turning virtualized remote data centers into disaster recovery backups for one another won't be a cakewalk. Technology issues abound, with server configuration management/inventory control, data synchronization and WAN bandwidth among the greatest challenges. And you can't overlook the need to address processes, personnel and practice.

More to lose

Because each virtualized server is the equivalent of many physical servers, if even one of them goes up in flames, so too does much of your IT infrastructure. Rebuilding it quickly means knowing exactly what you've lost.

Tools are available that let you take an image or snapshot of an entire virtual server for firing up on another physical machine (as in VMotion or UltraBac). But missing are tools to keep track of exactly how those virtual servers are configured, what software is loaded on each, what tweaks might be needed to ensure all applications perform nicely together and so on.

"Configuration management and change management of virtualized machines is a whole new ball of wax," Intellinet's Knoerle warns. "You need to keep very good track of what's on each server, and the configuration management tools we have today don't support virtualized machines."

Plus, for any disaster recovery operation, "you need to keep track of configuration on an operations-group level," she says. Virtualization will ease that process -- you likely will have your most critical applications automatically fail over to other virtual servers. But restoring less critical applications could get ugly. Backup media labeled by the physical server won't be good enough. You'll need to know exactly which virtual machines and applications were running on each physical server, and which processes should be prioritized.

"Do some kind of classification -- as simple as maybe putting applications inside labels such as mission-critical, business-critical, operational. That's how you'll determine your recovery objectives, and that will determine the infrastructure you need and the plan," says Stephanie Balaouras, a senior analyst at The Yankee Group.

American Medical Response's Brougham, who has overseen in-house disaster recovery efforts for several companies, underscores that an IT inventory assessment of all resources, virtualized and not, is necessary. Most companies do a poor job of inventory management, particularly on servers, he says, because they rarely implement server-level inventory management tools. With a small number of virtualized servers now representing a large number of physical servers, in all likelihood your inventory assessment will uncover that "you've got 40 more apps than you really need. Or you'll find out you need 40 more apps," he says.

On the bright side, if you haven't yet standardized on equipment across your data centers, you're in for some relief. The virtualized servers won't care what hardware they are placed on, and older equipment can be used. This also differs from the days when data centers had to be exactly the same to perform as backup sites.

SANs and synchronization

You will need to analyze the data on your SAN in a similar fashion. Mutual of Enumclaw plans to expand its SAN but will continue using existing storage for testing and disaster recovery, Weeks says. It will add 3TB EMC AX100 Serial Advanced Technology Attachment SAN devices with built-in switches. These switches, also available in stand-alone versions from vendors such as Brocade Communications Systems Inc. and McData Corp., let the SAN move data from one device to another for disaster recovery, he explains.

Fail-over should be the easy part. Knowing where your most critical data is, and how to make sure it is the first to come back online, will be the hard part. This is part and parcel with categorizing your data using information life-cycle management techniques, which analysts recommend implementing as part of your in-house disaster recovery efforts. "The most important step is data classification," Balaouras says.

You'll want to look at technologies for synchronizing data between main and fail-over sites, too. Every disaster recovery plan uses the "recovery time objective" to keep data loss within acceptable boundaries, says Belinda Wilson, worldwide executive director of business continuity services for HP. This will help you pick your synchronization method. But with virtualization technologies, synchronization can occur at many levels -- at the application, the database and the SAN, for instance. Mixing and matching among synchronization techniques and ensuring full data synchronization are issues, as is determining which source is the last word should a bad sync occur.

The fun of requirements

Somewhere near the six-month mark, you should have the building blocks for in-house disaster recovery figured out: configuration/change management, inventory assessment, application and data classification, SAN fail-over and data synchronization. Now the real fun begins: planning technical requirements for your new virtual data center, disaster recovery infrastructure.

Your analysis should cover what systems employees use most, what systems the business most relies on and your technical needs, Brougham says. "What's the load on the network if I suddenly take this database out-of-building? What's the performance hit on the application if I take it out-of-building? Is it even possible to centralize these systems 250 miles away?"

The answers to these questions will determine your design, from a once-nightly, several-hour-long database-synchronization process to mirrored systems that take snapshots of each other every 15 minutes, for instance.

While you likely already have network connections between your virtualized sites, you'll have to look at them in a new light. Brougham suggests using Multiprotocol Label Switching (MPLS) for disaster recovery because it offers a lot more capacity than frame relay can be had at T1 prices yet also can be meshed. MPLS automatically shifts IP traffic among a variety of routes, which is just the kind of fail-over you'll want. With any-to-any site connections, you can maintain higher usage thresholds while still letting your links absorb the shared failover traffic. He compares it to a company using two T1 lines for its data centers, each operating at a 60% utilization rate. Disaster strikes, one data center must fail over to the other, and now all the company's traffic overloads the other link.

But, Brougham warns, "Watch out. You can create your own disaster with a fully meshed network; virus propagation can kill you." So you'll have to think through how to increase security when building a meshed WAN for disaster recovery.

The people factor

Like all IT projects, half the battle is won by technology, the other by process and people. Draw up detailed procedures and practice them. You don't want the live disaster to be the first time that your staff implements the plan. They might need a new mind-set when bringing up a backup data center via server virtualization, too. They might be accustomed to replicate applications, not the fast fail-over in a tightly coupled, virtualized environment. Unexpected issues might arise, such as deciding when they must reconfigure the DNS server to point to the backup data center.

"From a pure disaster recovery aspect, the first things that go wrong are the ones that you don't test enough. If you've got 120 applications that have to go onto a server in the disaster recovery site, and you get 119 of them tested, it's always the one you missed that blows up," Brougham says. He advises hiring a software-testing consultant to evaluate how applications will play together when ported to the back-up virtualized boxes.

Include dry run practices in your testing schedule -- which will, of course, require time.

Despite the hard work involved, in-house disaster recovery makes a lot of budgetary sense when comparing the rarity of disasters with the cost of maintaining idle backup equipment. And it almost goes without saying that these days you can't simply ignore disaster recovery and hope lightning never strikes. Your virtualized data center could be the godsend you never knew you needed.

Helpful tools for the do-it-yourself disaster recovery enterprise

Brocade SilkWorm SAN routers: Manage fail-over between remotely located and/or differing types of SANs.

Hewlett-Packard Co. Virtual Server Environment for HP-UX 11i: Aimed at servers and blade servers; is one of the many tools HP offers for virtualizing servers.

HP Workload Manager: Available for many operating systems, dynamically allocates CPU, memory and disk resources to meet application service-level objectives.

McData Eclipse or IPS families of multiprotocol SAN switches: Manage fail-over between remotely located and/or differing types of SANs.

Microsoft Corp.'s Virtual Server 2005 (evaluation release available now, full release expected late 2004): Supports virtualization of Windows 2000 servers.

Solaris 10: Provides a variety of virtualization services for Sun Microsystems Inc. boxes running Solaris applications, such as clustering (doesn't support other operating systems).

Solaris 10 N1 Grid Containers: Partitions Sun boxes so that they can run multiple instances of Solaris, even older and new versions simultaneously.

1 2 Page 1
Page 1 of 2
Shop Tech Products at Amazon