Opinion: How to recover from virtualization disasters

Be prepared for single machine failures, SAN failures and VM failures

To many people, disaster recovery means not much more than a hot site, but there is much more involved. What exactly is involved depends on how much money you have to put to the problem.

Fully redundant hot sites cost quite a bit in hardware, software and licensing. At best, they should be exact duplicates of your current environment; at worst, they should be able to run your most important virtual machines.

However, this is not the only aspect of DR that should be considered. Disasters come in all sizes, from the small-scale application failure to the catastrophic natural disaster. Both of these are fairly well understood.

But what about the middle-of-the-road business-continuity and disaster issues? I'm talking about those that are somewhere in between the extremes, but are specific to virtualization infrastructures: single machine failures, SAN failures, VM failures, etc.

For these there are a few tools, mostly from VMware, that will help. VMware High Availability tops the list. But any VM-to-VM clustering service will also work to solve these issues.

To help with storage server issues there is also LeftHand Networks' VSA and Xtravirt XVS products. These products use local machine disk to mirror between the systems using software. This way if one system fails, the data is not lost. These technologies add increased redundancy to the software stack and can replace redundant SANs in smaller shops.

Even good backups add to this concept of redundancy by adding replication features (VizionCore vReplicator and Veeam Backup). These will allow you to replicate VMs from storage device to storage device and place VMs in locations where they are ready to power on at a moment's notice. Which is another good way to keep things running if your SAN or NAS fails.

VMware SRM works with various SAN and NAS devices to allow the SAN or NAS's own mirroring software to work better with virtualization.

As we put more VMs on a system, we need to consider adding more redundancy into the systems. There are already some hardware solutions, like RAID Blade and RAID memory technologies; these software storage technologies add into the existing RAID level redundancy and expand them to include multiple systems. We also have the ability to have redundant switching fabrics.

While hot sites are the end goal for natural disasters, don't forget to plan for the middling disasters by increasing your local redundancy by using these or other tools.

Virtualization expert Edward L. Haletky is the author of "VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers," Pearson Education (2008.) He recently left Hewlett-Packard, where he worked in the virtualization, Linux and high-performance technical computing teams. Haletky owns AstroArch Consulting, providing virtualization, security and network consulting and development. Haletky is also a champion and moderator for the VMware discussion forums, providing answers to security and configuration questions.

This story, "Opinion: How to recover from virtualization disasters" was originally published by CIO.

Copyright © 2008 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon