Skip the navigation
)

Demystifying de-duplication

Source de-duplication is set to be "more disruptive" than previous technologies

By Ray Lucchesi
February 21, 2007 12:00 PM ET

Computerworld - Data de-duplication technology has emerged as a key technology in the effort to reduce the amount of data backed up on a daily basis, which in many enterprises is growing at more than 100% every year.

For example, John Thomas, IT manager at Atlanta-based law firm Troutman Sanders LLP, was able to use data de-duplication technology to reduce the amount of data streamed from more than a dozen remote offices and thereby cut his backup window from 11 hours to 50 minutes. Thomas says his compression ratio for his backups run as high as 55:1.

Vendors have taken different approaches to the technology resulting in multiple distinct products that users should become familiar with in order to choose the flavor that best suits their environments.

Data de-duplication uses commonality factoring to reduce the amount of data either at the backup server or at the target storage device. As a result of enormous compression ratios achieved by data de-duplication technology, disk is becoming more attractive as a viable, online alternative to traditional tape-based backup. For example, people working at remote and branch offices need instant access to all the data and applications available at their company's headquarters. So IT shops typically set up remote mini data centers, with application servers, block and file data storage, backup tape and report printers, sacrificing administrative control. By utilizing data de-dupe technology, backups can be performed over the WAN using spare nighttime bandwidth, eliminating the need for tape at remote sites.

Greg Schulz, senior analyst at The StorageIO Group, says de-duplication technology mainly resides in the backup space, complementing traditional tape libraries with the purpose of lowering costs and reducing data.

The main benefit to de-dupe technology is that you're not seeing your virtual tape library fill up, and you're "not seeing your backup targets fill up as fast as it normally would," Schulz says.

De-dupe can be done at the target of a backup stream (the storage array or tape drive) or at the source of the data being backed up (the application server). Traditionally, de-dupe products had been used as a target for backup data, but Schulz says there is "a growing emphasis de-duping back on the server."

Target de-dupe products are generally used as part of a final repository for backup data. Most backup software today supports tape volumes, files or raw disk as targets. Target de-dupe products mimic a tape library  and support virtual tape libraries (VTL), or they can act as a network-attached storage file server supporting network file system (NFS) or Common Internet File System files. Target de-dupe technology can also work on raw disk supporting Internet SCSI or Fibre Channel logical unit numbers (LUN). Prominent target based de-dupe products are sold by Data Domain Inc., Diligent Technologies Corp., ExaGrid Systems Inc., FalconStor Software Inc., Quantum Corp., and Sepaton Inc. 

Today, de-duping data at the target is the leading method, but de-duping data at the source where the data is coming from is even more disruptive, and the benefits are far greater, Schulz says.

Source de-dupe products replace backup software used in a client/server configuration, where remote clients de-dupe data being backed up and only transmit unique data to the central server. This reduces bandwidth requirements considerably, according to Schulz. Some prominent source-based de-dupe products include Asigra Inc.'s Televaulting for Enterprises, EMC Corp.'s Avamar, Network Appliance Inc.'s SnapVault, and Symantic Corp.'s NetBackup Pure Disk.

In band or out of band

Another characteristic used to discriminate target de-dupe products is when data de-duplication processing occurs. Data de-duplication takes time to compute and find commonality in the data being backed up. To minimize the effect on backup performance, some vendors de-dupe data in the background. These de-dupe products buffer the backup stream to disk and then after the fact reduce its size via de-duplication. ExaGrid, FalconStor and Quantum provide target de-dupe products that do background data de-duplication.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Storage White Papers
The Total Economic Impact of the HP 3PAR Storage
Forrester Consulting provides an analysis of four HP 3PAR storage customer implementations to quantify the efficiency and cost savings achieved over legacy storage...
Using HP's Converged Storage to Develop/Enhance Business Resiliency in VMware Environments
In this report, Enterprise Strategy Group reviews how HP's portfolio of hardware, software, and services can provide the foundational support for VMware environments....
Converged Storage: Utility Storage - The Ideal Platform for Virtual and Cloud Computing
Server virtualization has transformed corporate IT -- companies have enjoyed major cost savings and have gained flexibility and efficiency. But this has also...
Defining Tier One Storage in the Modern Data Center
This report defines "tier-1" storage in the modern IT world and in the data centers and services that support it. What was a...
The Best Way to Build a Cloud -- HP CloudSystem Matrix and HP 3PAR Utility Storage provide solid, flexible foundation
Learn how HP CloudSystem Matrix and HP 3PAR Utility Storage provide a solid, flexible foundation for your cloud environment.

Intel and the Intel logo...
All Storage White Papers
Storage Webcasts
Live Webcast
Today's NAS: A Solution Beyond Old Limits
Date: Tuesday, July 17, 2012 2:00 PM EDT

Traditional NAS systems don't scale beyond fixed limits. Proliferation of NAS systems leads to management...
Today's NAS: A Solution Beyond Old Limits
Date: Tuesday, July 17, 2012 2:00 PM EDT

Traditional NAS systems don't scale beyond fixed limits. Proliferation of NAS systems leads to management...
Distributed Database Security with Real-time Monitoring
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
InfoSphere Warehouse Packs Demo
These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
Delivery Management -- Extending Lifecycle Management
Date: Wednesday, June 20, 2012, 1:00 PM EDT

Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
Leverage automation today to reduce IT complexity
Date: Tuesday, June 5, 2012, 2:00 PM EDT

Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
All Storage Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs