Skip the navigation

When to shred: Purging data saves money, cuts legal risk

E-discovery ranges from $1 million to $3 million per terabyte of data

By Mary Brandel
September 18, 2008 12:00 PM ET

Computerworld - A funny thing happened on East Carolina University's journey to creating a data-retention strategy. As part of a compliance project launched one and a half years ago, Brent Zimmer, systems specialist at the university, was working with attorneys and archivists to determine which data was most important to keep and for how long. But it soon became clear that it was just as important to identify which data should be thrown away.

Zimmer was aware of the importance of being able to quickly produce required information during litigation, "but the thing we never thought about was keeping data too long," he says. The risk is keeping data that you wouldn't otherwise be required to produce, but as long as it's discoverable, it could be used as evidence against you.

Like many organizations, East Carolina had its share of data to purge. "We never made anyone throw away anything unless they ran out of space on their quota," Zimmer says. Some users, he says, had e-mail dating back to 1996.

East Carolina is not unusual; many organizations hang on to more data than they need, for much longer than they should, according to John Merryman, services director at GlassHouse Technologies Inc., a storage services provider in Framingham, Mass. One reason is fear. "Companies are really sensitive because there's a perceived underhandedness to purging data," he says. "People might wonder, 'Why aren't you keeping all your records?'"

Another is the low cost of storage. Organizations have historically preferred to buy more disks than spend time and resources sorting through what they do and don't need. "Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes," says Kevin Beaver, founder of Principle Logic LLC in Acworth, Ga.

But thanks to e-discovery risk and burgeoning data volumes -- 20% to 50% compound annual growth rate for some companies -- the tide is starting to turn, according to Merryman. The average cost companies incur for electronic data discovery ranges from $1 million to $3 million per terabyte of data, according to Glasshouse. While you need to pay attention to retaining data, at the same time, "all indications are that you need to be keeping less," Merryman says.

A recent report from Gartner Inc. concurs. It states that the current explosion of data is outpacing the decline in storage prices, even before the resource costs for maintaining data are taken into account. Estimating that the average employee might generate 10GB per year, at a cost of $5 per gigabyte to back it up, Gartner says a 5,000-worker company would face annual costs of $1.25 million for five years of storage.

At a cost of $5 per gigabyte, a 5,000-worker company would face annual costs of $1.25 million for five years of storage.
Gartner Inc.

And considering that many companies maintain multiple copies of data, thanks to test data, operational data and disaster recovery copies, not to mention backups, "there's an explosion of data in most companies," Merryman says.

Aside from the costs, keeping all those records indefinitely is a gold mine for attorneys looking for evidence, he adds.

Getting policy straight

The "2007 Litigation Trends Survey Findings" report (download PDF) by Fulbright & Jaworski LLP, which had a base of 253 U.S. and 50 U.K. corporate counsels, described the following findings:

  • The number of lawsuits filed against companies appears to be down from last year, returning to levels similar to 2005. However, suits with $20 million or more at stake are on the rise. All of the respondents from small and midsize companies reported at least one lawsuit of that magnitude in the past year. Twenty percent of the largest companies surveyed had 21 to 50 lawsuits of that size.
  • Almost 40% of the largest companies surveyed spent $5 million or more annually on litigation, excluding settlements and awards.
  • In the records-retention area, 31% of all the companies in the survey now log or retain instant messages, and 40% retain voice mail.

One way to address this problem is to set retention policies that reduce exposure to legal problems. But don't try to boil the ocean, Merryman advises. Instead, create policies from the application or business level down, rather than looking across the whole data landscape and letting policy bubble up. Also, create black-and-white rules that are easy to deal with.

For instance, roll all data types -- such as e-mail, application and file data -- into 10 to 30 categories of big-picture policies rather than hundreds of granular ones. "You need broader rules like 'Accounting data needs to be retained six years,' not 'This annual report needs to be retained [for] five years,'" he says.

According to research from Enterprise Strategy Group Inc. in Milford, Mass., the average required retention period for files, e-mails and databases is on the rise. Most companies retain data for four to 10 years, says Brian Babineau, a senior analyst at ESG.

East Carolina University started with the low-hanging fruit, setting retention and purging policies for e-mail, medical records and security video. It archived that data on a new system based on Symantec Corp.'s Enterprise Vault storage management software and EMC Corp.'s Centera content-addressed storage (CAS) array. E-mails from the chancellor or dean are saved for seven years, Zimmer says, while faculty and staff e-mail gets purged after three years.

Meanwhile, security video is archived for 30 days -- a good thing, since university police collect a terabyte per day. Patient records from the medical school need to be kept for 20 years after the patient is deceased, but East Carolina now uses EMC Rainfinity to take that data off primary storage and archive it to the Centera device so it's out of the backup environment.

Beyond that, the job will get more difficult, Zimmer acknowledges. "There's a lot of other stuff that we don't know the retention [requirements] for, so that will be more tricky," he says.

The key to reducing data volumes, Gartner says, is a process called "content valuation," which involves examining factors such as authorship authority, usage patterns, nature of content and business purpose. According to Gartner, there are many ways to approach content valuation, including electronic records management, content management, enterprise search to identify what's a record and what's not, legal preservation software and policy management.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Storage White Papers
Datacenter Consolidation Best Practices Whitepaper
The benefits of storage consolidation are being realized by companies and seen as a way to streamline many storage-driven applications. Learn why the...
Eliminating VMware / Storage Related Performance Challenges
How to proactively monitor the performance in a Fibre Channel SAN / vSphere environment is always a concern. Understand the importance of a...
Cloud Environments Have Familiar Storage Challenges
Cloud environments have many storage challenges that are familiar to data center managers, but due to their density and abstraction, the issues become...
Eight Considerations for Evaluating Disk-Based Backup Solutions
In the past, the movement from tape- to disk-based backup has been less compelling due to the expense of storing backup data on...
ExaGrid Helps U.S. Federal Government Agencies Reduce Backup Windows and Improve Data Protection
The U.S. Government has been the largest user of tape-based backup systems since the 1970s. Most agencies have begun to deploy disk storage...
All Storage White Papers
Storage Webcasts
Understand Your Data: The Future of Backup and Archiving
Archiving and Backup are the foundation of the next generation of information governance. However, commodity data protection tools and basic archives are only...
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
All Storage Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs