There's a very old IT problem that's gaining renewed attention lately: The problem of keeping too many copies of data. The analyst firm IDC has quantified the problem and come up with some rather startling statistics:
- More than 60% of all enterprise disk capacity worldwide is filled with copy data
- By 2016, spending on storage for copy data will approach $50 billion and copy data capacity will exceed 315 million terabytes
- In the next 12 months, [IT departments] expect increased use of data copies for app development and testing, regulatory compliance, multi-user access and long-term archival
If we're really using 60 percent of our storage capacity to store copy data, the big questions are: how does this happen? And what can we do about it?
How it happens is self-evident. We have too many copies of data because we make too many copies! If, like me, you're of a certain age, the minute you hear the word "copies" you think of Rob Schneider's Saturday Night Live skit character Richard, the guy sitting in the copy room. One after another people walk in to make copies and Richard calls out "makin' copies!"
Well, your IT department works the same way. One after another, different data consumers walk in and request copies of data. They need it for test and development work, for running reports and data analytics, for regulatory searches and so on. And the way it's normally provided for them is by -- you guessed it -- makin' copies! Before you know it, you've copied a database four or five times over, and sometimes many more (IDC's research shows some organizations with over 100 copies of some data sets).
Because these copies end up scattered across different groups in your organization and are not centrally managed, they tend to get lost in the mix. Many of them end up sitting around for long periods of time just taking up space for no purpose.
What's the cure?
IDC is using the term Copy Data Management to describe an approach to the problem. It begins with centralizing your data as part of your backup solution, and then using that backup data as a source for copies. But the key is doing it in a way that doesn't require actually creating copies.
This isn't a new problem, or even a new solution. Disk vendors have tried to address it over the years by offering zero-footprint snapshot clones. That is, read/write addressable volumes that are created "virtually" and can be mounted and used by servers. The goal is to provide access to data quickly and without the need to create file-by-file copies.
While doing things at the disk array level can work, the limitation is that it's not centralized. If you've got three disk vendors on your floor (hardly uncommon), you'll have three different toolkits. Not the most efficient way to manage things, and it's very hard to get a high level view of what's going on when everything is in a separate stovepipe. In addition, what about data that's not on your snapshot-enabled SAN storage? What about boot drive data? What about all that direct attached storage you've got in use?
If you could centralize your snapshot capability and move all the data to a common target -- something you have to do anyway because you need to keep a backup -- and then deliver fast, efficient snapshot clones from that core data set, well, now you've got something.
This would make it easy to deliver instances of data to your demanding data consumers, and most importantly, it nearly eliminates additional storage consumption. Yes, any new data that gets written to these clone volumes has to be stored. But that is usually minimal compared to the original data set, and usually temporary. When the task is done, you can simply delete it.
There are some vendors already tackling this problem, and likely there are more to come. Meanwhile, it's time you took a close look at your own IT department to see if you're makin' copies all the time, and begin evaluating just how much storage and budget you could save if you stop.