Is tape really dead?

With all the industry buzz around disk based data protection and deduplication, many companies are rethinking their current investments in tape infrastructure. The problem is that many have spent years perfecting that tried and true method of data protection and have invested substantial sums in people, training, and money along the way.

So the question is do you scrap everything you have invested and move to a different approach? If so, how do you do it without being disruptive? These are difficult decisions many IT administrators are facing today, and making the wrong decision can have a major impact on your business, or your career.

Disk-based data protection is definitly making inroads in replacing tape-based backup solutions, but from what I am seeing in the field, especially in larger companies, tape is not going away anytime soon.

Tape is simply being moved downstream to the archive tier rather than the backup tier of storage infrastructure. As an example, even though a virtual tape library can speed the backup process, and deduplicate data to store it more efficiently, it still takes power to keep the disks spinning. MAID fixes that issue somewhat, but many companies still like the mobility cost effectiveness of tape for longterm data archives.

What companies don't like about tape is all the handling, shipping, storage costs, and risk associated with it. (There are ways to get around these issues though).

The benefits of using disk versus tape to improve data protection and recovery are too numerous for IT administrators to ignore. Everyone is aware of the limitations of tape solutions.

Sequential accessRandom access
Relatively slowFast
Shipped offsiteElectronically vaulted
Once a day processPeriodic or continuous
High operational touchAutomated
No dedupeDedupe
Inexpensive mediaMore expensive

There are a few different innovative technologies currently being used to replace tape as a backup medium, and they are all based on disk.

Business continuance volumes: BCV copies are full copies of data, and are typically created and stored within the storage arrays. The BCV is created by splitting off a full mirror copy of the data, and then mounting that copy for backup to the tape infrastructure. This process can occur at the primary site, or at the remote site if used in conjunction with array-based replication. The problem with BCV is the cost, and the amount of storage required.

Snapshot copy: Snapshot functionality is similar to BCV copies, except it is a thin representation of the data rather than a full copy. Changes to the original data are stored in a snapshot pool (copy on write) or the changes are just written someplace else (WAFL). In either case, pointers and metadata are used along with the original volume to represent the different time periods of how the data looked when the snapshot was taken. Snapshots take up much less space than full BCV copies, and can occur more frequently, so there are more recovery points available.  

Disk based NAS target: Disk solutions that look like a simple NAS share usually include data deduplication so data being dumped to the share is stored as efficiently as possible. Also, these solutions can typically replicate data offsite for DR. These NAS shares make a perfect repository for database adminstrators to use as dump targets. The only problem with this solution is the data needs to be "rehydrated" at the other side before it can be used again.

Virtual tape: The simplest method is to replace tape, since all it does is make disk look like a tape. Provides all the benefits of disk without changing the current process. Some virtual tape solutions can also be used as a simple disk target by the backup software if desired so that no barcodes are required, and the storage can be reutilzed at will by simply deleting the data and reusing the storage when the retention period expires. Virtual tape solutions can integrate with the existing backup software and physical tape infrastructure so that backup and recovery can be enhanced without ripping out and replacing everything.

Continuous data protection: CDP provides the most robust service levels for backup and recovery. Implementing CDP can actually eliminate the entire backup process if desired. CDP is a continuous protection methodology, meaning every write is protected, and there is no data loss. Continuous protection can eb combined with continuous replication to provide robust DR service levels, and snapshots can be included along with CDP to provide recovery to not only the last write, but also a thin repository for days worth of recovery points. Some CDP solutions can even integrate with existing backup software to provide the best of all worlds.

Data flow for backup and recovery in the 21st century data center would be similar to the diagram below.

  1. CDP is used to protect every write so no data is lost.
  2. A snapshot is taken every 4 hours, and the CDP journal is rotated every four hours as new snapshots are taken. If the solution can provide 255 snapshots, then over 42 days worth of data would be available locally for recovery.
  3. The data is encrypted and electronically vaulted offsite for DR. Once offsite, the snapshots are mounted to the existing backup software for long term archive to tape once a month
  4. At the offsite location, data is globally deduplicated as an optimized searchable archive

Using a methodology as I have described would enable companies to provide an extremely robust SLA for application recovery from disk, provide a cost effective means for DR replication, and still be able to keep tape for long term archives.

Tape is not dead, and will have a place in many datacenters for a long time to come. It will simply not be used for day to day data recovery.   

Christopher Poelker is the author of Storage Area Networks for Dummies, and he is currently the vice president of Enterprise Solutions at FalconStor Software.

Copyright © 2009 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon