What to consider in a data recovery time objective

There can be lots of moving parts that enter into recoverability for a complex application

In data recovery discussions with senior IT executives, an oft-repeated refrain is about the difficulty in obtaining an accurate picture of their organizations' ability to recover key applications and transact business in the event of a significant outage. They want to have confidence that critical data is recoverable, and they are looking for metrics that demonstrate that recoverability.

The reality is that IT infrastructures have become sufficiently complex that it's a challenge to peel through the layers of abstraction and aggregate the various components that in combination determine the recoverability of an application. Instead we tend to attempt to ascertain the health of each individual component under the hope that the overall recoverability is equal to the sum of the parts.

This isn't necessarily the case. First, there can be lots of moving parts that enter into recoverability for a complex application, and accounting for them all is difficult. More importantly, the synchronization among these elements presents a significant roadblock to successful recovery.

This consistency issue is usually well understood at a system level, but it's often overlooked when dealing with cross-platform business functions consisting of multiple application components. The fact that underlying databases are copied or backed up at different times can add hours and days to recovery as discrepancies among them are reconciled.

Compounding the problem is that in some situations, interdependent systems may be prioritized differently in terms of criticality, leading to entirely different protection profiles being applied. For example, a database may be deemed high priority and therefore fully replicated to support a recovery time objective (RTO) of under four hours, but an associated front-end Web-based application component might be assigned to a tape-based recovery tier. While this may be entirely acceptable for operational recovery (such as restoring a single volume or server) in a disaster recovery scenario, the likely result would be a running database that users can't connect to. (So much for the four-hour RTO.)

While there are some emerging technologies that can help to some degree, addressing this problem today remains largely a policy and process exercise. One of the key challenges is organizational. The identification of interdependencies requires cross-functional cooperation. Driving this requires someone of appropriate authority with overall responsibility for recovery at the business-function level. With all of the focus devoted to component recovery, that person is often nonexistent.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.

Copyright © 2007 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon