Skip the navigation

When to shred: Purging data saves money, cuts legal risk

E-discovery ranges from $1 million to $3 million per terabyte of data

By Mary Brandel
September 18, 2008 12:00 PM ET

Computerworld - A funny thing happened on East Carolina University's journey to creating a data-retention strategy. As part of a compliance project launched one and a half years ago, Brent Zimmer, systems specialist at the university, was working with attorneys and archivists to determine which data was most important to keep and for how long. But it soon became clear that it was just as important to identify which data should be thrown away.

Zimmer was aware of the importance of being able to quickly produce required information during litigation, "but the thing we never thought about was keeping data too long," he says. The risk is keeping data that you wouldn't otherwise be required to produce, but as long as it's discoverable, it could be used as evidence against you.

Like many organizations, East Carolina had its share of data to purge. "We never made anyone throw away anything unless they ran out of space on their quota," Zimmer says. Some users, he says, had e-mail dating back to 1996.

East Carolina is not unusual; many organizations hang on to more data than they need, for much longer than they should, according to John Merryman, services director at GlassHouse Technologies Inc., a storage services provider in Framingham, Mass. One reason is fear. "Companies are really sensitive because there's a perceived underhandedness to purging data," he says. "People might wonder, 'Why aren't you keeping all your records?'"

Another is the low cost of storage. Organizations have historically preferred to buy more disks than spend time and resources sorting through what they do and don't need. "Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes," says Kevin Beaver, founder of Principle Logic LLC in Acworth, Ga.

But thanks to e-discovery risk and burgeoning data volumes -- 20% to 50% compound annual growth rate for some companies -- the tide is starting to turn, according to Merryman. The average cost companies incur for electronic data discovery ranges from $1 million to $3 million per terabyte of data, according to Glasshouse. While you need to pay attention to retaining data, at the same time, "all indications are that you need to be keeping less," Merryman says.

A recent report from Gartner Inc. concurs. It states that the current explosion of data is outpacing the decline in storage prices, even before the resource costs for maintaining data are taken into account. Estimating that the average employee might generate 10GB per year, at a cost of $5 per gigabyte to back it up, Gartner says a 5,000-worker company would face annual costs of $1.25 million for five years of storage.

At a cost of $5 per gigabyte, a 5,000-worker company would face annual costs of $1.25 million for five years of storage.
Gartner Inc.

And considering that many companies maintain multiple copies of data, thanks to test data, operational data and disaster recovery copies, not to mention backups, "there's an explosion of data in most companies," Merryman says.

Aside from the costs, keeping all those records indefinitely is a gold mine for attorneys looking for evidence, he adds.

Getting policy straight

The "2007 Litigation Trends Survey Findings" report (download PDF) by Fulbright & Jaworski LLP, which had a base of 253 U.S. and 50 U.K. corporate counsels, described the following findings:

  • The number of lawsuits filed against companies appears to be down from last year, returning to levels similar to 2005. However, suits with $20 million or more at stake are on the rise. All of the respondents from small and midsize companies reported at least one lawsuit of that magnitude in the past year. Twenty percent of the largest companies surveyed had 21 to 50 lawsuits of that size.
  • Almost 40% of the largest companies surveyed spent $5 million or more annually on litigation, excluding settlements and awards.
  • In the records-retention area, 31% of all the companies in the survey now log or retain instant messages, and 40% retain voice mail.

One way to address this problem is to set retention policies that reduce exposure to legal problems. But don't try to boil the ocean, Merryman advises. Instead, create policies from the application or business level down, rather than looking across the whole data landscape and letting policy bubble up. Also, create black-and-white rules that are easy to deal with.

For instance, roll all data types -- such as e-mail, application and file data -- into 10 to 30 categories of big-picture policies rather than hundreds of granular ones. "You need broader rules like 'Accounting data needs to be retained six years,' not 'This annual report needs to be retained [for] five years,'" he says.

According to research from Enterprise Strategy Group Inc. in Milford, Mass., the average required retention period for files, e-mails and databases is on the rise. Most companies retain data for four to 10 years, says Brian Babineau, a senior analyst at ESG.

East Carolina University started with the low-hanging fruit, setting retention and purging policies for e-mail, medical records and security video. It archived that data on a new system based on Symantec Corp.'s Enterprise Vault storage management software and EMC Corp.'s Centera content-addressed storage (CAS) array. E-mails from the chancellor or dean are saved for seven years, Zimmer says, while faculty and staff e-mail gets purged after three years.

Meanwhile, security video is archived for 30 days -- a good thing, since university police collect a terabyte per day. Patient records from the medical school need to be kept for 20 years after the patient is deceased, but East Carolina now uses EMC Rainfinity to take that data off primary storage and archive it to the Centera device so it's out of the backup environment.

Beyond that, the job will get more difficult, Zimmer acknowledges. "There's a lot of other stuff that we don't know the retention [requirements] for, so that will be more tricky," he says.

The key to reducing data volumes, Gartner says, is a process called "content valuation," which involves examining factors such as authorship authority, usage patterns, nature of content and business purpose. According to Gartner, there are many ways to approach content valuation, including electronic records management, content management, enterprise search to identify what's a record and what's not, legal preservation software and policy management.



Our Commenting Policies