When to shred: Purging data saves money, cuts legal risk

E-discovery ranges from $1 million to $3 million per terabyte of data

1 2 Page 2
Page 2 of 2

Archiving on the rise

Partly because of increased data retention activity, companies are increasingly implementing disk-based archiving tiers in their storage architectures. This is a better place to retain data than tape backup systems, Babineau says, because the data is indexed, searchable and stored in single-instance format, all of which makes it easier to find what you need during e-discovery.

According to Robert Stevenson, managing director of storage research at The InfoPro Inc. in New York, archiving tiers have seen a 54% annual growth rate among users surveyed vs. 20% for Tier 1 monolithic storage and 40% growth for Tier 2 modular storage. Tier 1 tends to include high-performance storage platforms, with integrated capabilities for replication, disaster recovery and minimum downtime, he says. Tier 2 includes modular systems with lower cache and disk capabilities, lower cost per terabyte and an emphasis on ease of use, Stevenson adds.

And in the past three years, e-mail archiving has grown, with 48% of survey respondents saying they use it today vs. 39% two and a half years ago. Database archiving is also up, with 36% using it vs. 21% two and a half years ago.

At East Carolina, Zimmer has reduced primary storage costs by 40% to 50% by moving data to the Centera devices.

Another reason for archiving growth is that companies are relying less on backup tapes for retention and more on disk-based storage. "Discovery is a difficult task, and if you have multiple copies in the backup environment, it's extremely expensive to retrieve, index, search and take it through the preproduction process of culling and narrowing down results," Merryman says. "It can turn discovery into a multimillion-dollar project."

Zimmer says that before East Carolina used a Centera disk array, the university relied on tape backups for data retention. But since backups collect data in daily snapshots, he says, there was always the potential for data to be missing. For instance, if the relevant information wasn't on the server the day the snapshot was taken, a user wouldn't be able to produce it. And even if the data could be found on tape, he says, the cost would be extremely high to restore it, especially if you needed to go back a year or more.

"You could potentially be working on gathering that information for a week or two, just to get to a certain piece of e-mail to restore to tape for the test lab to extract," he says. In fact, while researching the return on investment of Enterprise Vault, Zimmer estimated that it would take 80 man-hours to recover all the e-mail generated by one employee for one year if it had to be restored from every monthly backup tape. With the archive system, it takes just 15 to 20 minutes, and the employee is guaranteed to get every piece of e-mail, he says.

The urge to purge

The seemingly simplest way to reduce data volumes is to delete the data you don't need. But this is much more easily said than done. The fact is, according to Merryman, outside of e-mail, the status quo is to do nothing. "Most legacy applications have never purged data, and new applications are rarely designed to accommodate purging," he says.

Not to mention, he says, deleting production data is complicated. In addition, the issues associated with legal, compliance and operational risks are often ambiguous, and few organizations have a process to accommodate a web of requirements for data retention.

"If you look at legacy data outside the application world, a lot of people have no idea what it is, but they're scared of getting rid of it," he says. At one large bank in New York, Merryman says, he ran across hundreds of file extensions that no one knew about, as well as data inaccessible by currently maintained applications or interfaces.

The important thing is to start setting purging policies now rather than trying to apply them to old data. "If you address high-risk, high-volume applications and databases, you'll address 90% of the risk," he says. "If you target all 700 applications in your environment, you'll never get it done."

In fact, in a tiered storage environment, Merryman says, the business case is much better when you purge data rather than simply archiving it on lower cost disk. "The cost of perpetually managing and refreshing huge amounts of data that's never been culled or purged is extremely high," he says. "So if you come up with a strategy to tier 70% of your data to cheap storage, and then you factor in the cost of managing, backing up and protecting it for disaster recovery, it's expensive."

Unfortunately, he says, most companies that develop tiering strategies figure they'll purge at some time in the future. "But that's the problem with purge," he says. "It's always 'later,' like cleaning out the basement."

Another difficulty with purging is the lack of a guarantee that you've deleted all instances of the data set. You might think you deleted all your old e-mail, but it may be stored on tape from two years ago, so it still exists. "Some companies figure if you can't delete it consistently, don't delete it at all because it's probably somewhere that no one knows about," Babineau says.

Still, he says, "if you invest in technology that helps you retain data, why not invest in technology that helps expire data when you don't need it anymore?"

For instance, all archiving systems have a "delete" function, Merryman says, but no single product can purge data across all data types, such as messaging, unstructured and structured data. A fairly mature base of e-mail archiving is available from the likes of Symantec, Computer Associates International and EMC, as well as smaller companies such as Mimosa and Zantaz. File archiving systems vary widely, from EMC (Legato's hierarchical storage management product) to enterprise search vendors such as Kazeon and Abrevity. And in the database world, archiving vendors include OuterBay and PeopleSoft.

Merryman's advice: First identify vendors with proven technologies, and then look at emerging vendors. Second, he says, see if the vendors support or plan to support SNIA Archiving Standards being developed by the 100-Year Archive Task Force. "This body of standards is young," he says, "but it's the only industrywide effort to standardize archiving methods."

Copyright © 2008 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon