Skip the navigation

Opinion: Data de-dup offers new storage management possibilities

The 20:1 compression rates once touted by vendors seem to have been greatly exaggerated

By Jim Damoulakis
May 24, 2007 12:00 PM ET

Computerworld - Last week's announcement (see "NetApp Set to Launch De-duplication Tool") by Network Appliance of their A-SIS data de-duplication technology for primary storage opens new storage management possibilities, raises the bar for their competition, but also brings up some interesting questions.

Most data de-duplication efforts have focused on secondary data applications, particularly backup. Companies like Data Domain have been very successful in this area, and in addition to achieving impressive de-duplication numbers, they have been ratcheting up performance capabilities with each new generation of products. So it only seems natural to consider applying de-dup technology more broadly.

Interestingly, in the announcement, there appears to be an effort to lower expectations regarding the rate of data reduction likely to be realized in primary storage situations, suggesting that the 20:1 or more savings quoted by backup de-dup vendors are largely the result of multiple copies of unchanged data over time. They warn that the reduction will be significantly lower in non-backup scenarios, perhaps 40% or less, and this is borne out by the users quoted in the article referenced above.

Certainly the extent of data reduction, as with any method of compression, depends heavily on the characteristics of the data involved, but based on these data points, it appears that the levels being achieved are at or below those attained through more traditional data compression methods (e.g. gzip, OS filesystem-based compression, tape drives, etc.).

This raises the question of why storage vendors, in general, have not leveraged traditional compression techniques for at least some category of application. While anyone who has enabled file-based compression on their laptop knows that it introduces overhead and impacts performance, hardware-based compression, such as that already incorporated into tape drives and some VTLs, could potentially minimize or even eliminate this overhead in a storage system. One company, StorWiz, offers a network-based appliance that compresses data in-stream on its way to the storage system and claims to actually increase performance due to the reduced quantity of data ultimately being written to disk. Otherwise there appears to be a dearth of compression options for storage.

Let me emphasize that data de-duplication and compression are not mutually exclusive. While traditional compression cannot achieve the very high reduction numbers achievable in some circumstances via de-duplication, given the current emphasis on reducing the storage footprint, as well as operational power and cooling costs, making appropriate use of every tool in our arsenal makes sense.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.

Read more about Data Storage in Computerworld's Data Storage Topic Center.



Our Commenting Policies