Why IT should start throwing data away

It can be a storage nightmare: Given expanding regulatory requirements and the key role that electronic records now play in lawsuits, some enterprises are saving every bit of data they have, just to be safe. As a gauge of storage demand, IDC says the total amount of disk storage shipped last year grew 40.5 percent from 2007.

Sure, storage media are getting less expensive. The cost of a gigabyte of disk storage fell more than 27 percent from 2007 to 2008, according to IDC. But with the storage requirements of average enterprises rapidly growing, keeping it all forever can create long-term management challenges and lead to headaches when something needs to be found. Analysts, attorneys, and vendors say enterprises are better off getting rid of some data -- but doing it judiciously. For IT departments, that means planning, carefully executing, and not going it alone.

"More companies are sensitive to the fact that we can't just keep throwing storage at the issue," says IDC analyst Rick Villars. Letting fast-growing data stores build up year after year isn't sustainable, he says. Companies that save everything are often those that aren't sure what needs to be saved or deleted, he adds -- a sign that they're setting themselves up for trouble.

On top of the price of disks, tapes, networks and management, piling up too much data can come back to haunt a company in case of a lawsuit. It may cost $1 million to find and compile the data requested in "e-discovery," the process of collecting electronic data as evidence, says Andrew Cohen, vice president of e-discovery and compliance at EMC. Those who've studied this dilemma recommend a variety of steps to make sure that a company is neither saving too much nor endangering itself by improperly deleting information. There are technologies out there to help, but it also takes human input, they note.

The data-storage problem
Every organization needs to save information for its own purposes, such as institutional memory, transaction lookup and analysis, and so on. Plus, regulations such as the Sarbanes-Oxley Act and Health Insurance Portability and Accountability Act (HIPAA) require enterprises to save certain kinds of content for a prescribed period. And more such regulations are in the works, notes Enterprise Strategy Group analyst Brian Babineau. "If you're in business, you're going to be regulated somehow," he says.

But storing information sets you up for a risk: E-discovery requests in lawsuits expose a company's data and data management to close scrutiny. The more you store, the more they can ask for, increasing the odds of damaging findings. But if your information storage and deletion policies aren't rigorous and consistent, you can lose the benefit of doubt in court as to why some information is not available.

In some cases, regulations tell you what you must do, but often they tell you just the minimum you must do or, in the case of the Federal Rules for Civil Procedure, provide guidelines how to make storage decisions but not prescribe them.

Some companies that have had to comply with data regulations for many years have departments that specialize in records management, which began with filed pieces of paper. Because the laws tend to be complicated and vary between countries, these experts often have a thick manual of requirements for different types of data, which can't be easily translated into a set of procedures by the IT department.

Start with the storage analysis, not the storage technology or procedures
The first step is to determine what data has to be kept, then what are the means to retain it. IT can play a key role here, using data analysis software, Babineau says. Companies such as Exterro, Vivisimo, Autonomy, and Digital Reef sell software that helps identify what data a company has and how its employees typically use that data. That information can help in deciding what should be collected and retained, both for legal and corporate purposes.

IT departments can and should fine-tune their retention policies to minimize how much they have to store, says Forrester Research analyst Andrew Reichman. For example, it's possible to segment the user population into categories such as executives, back-office employees, and people who deal with the company's intellectual property, and treat their e-mail differently. But this has to be done with a precise, consistent policy to be legally defensible, he says.

"The more you can separate the small piece of data that really is sensitive from the rest of the data that's not sensitive, the costs are going to be way lower," Reichman says.

But in the end, technology can't write policies for you. "There has to be a meeting of minds between IT and the compliance officers," Babineau says.

This process can be difficult because the two groups essentially speak different languages. IT administrators tend to look at a bit of data based on what application or department it's associated with, while records managers may think about it in terms of intellectual property or other concepts. Getting them to work together is an organizational challenge that some have solved by merging the groups, while other companies have the records experts simply advise the IT team, IDC's Villars says.

Backup and archiving should not be confused
Once the teams have come together to craft policies, it's a good idea to separate backup from archiving, ESG's Babineau says. The purpose of backup is to make a copy of everything, so the business can get back on its feet after an unexpected loss of data. "Somehow we've morphed that into 'It's a great way to save data,'" he says.

A backup system doesn't have the granular control needed to save some types of information for a short time and others for longer, he says. For example, if a certain business record needs to be saved for seven years, the wrong place to save it is on a backup tape with 55,000 other files.

"If you want to save the business record for seven years, you have to save the 55,000 files for seven years as well," Babineau says. And using separate backup processes for data types with different retention requirements is expensive and complicated, he adds.

Archiving should be used only to selectively retain information for specific periods of time. Separated from backup and done with specialized tools, the process runs more smoothly and prevents saving too much or too little information, Babineau says. IBM, Symantec, and others offer separate applications that analyze data for backup and archiving based on policies, and CommVault offers a single tool that can separate candidates for backup and archiving.

Addressing the e-discovery challenge is tough
The prospect of e-discovery can present a harder problem to solve. E-discovery involves one party in a lawsuit seeking electronic records from the other about any number of things that it believes are relevant to the case. Typically the biggest part of an e-discovery request is e-mail involving employees who may be connected to the case, but the request may also include word-processing documents, source code, or other types of data, says Wendy Curtis, special counsel for e-discovery at Orrick Harrington & Sutcliffe.

As soon as a company can reasonably anticipate it will be sued, it has an obligation to hold on to any records involving people or projects related to the allegations of the suit. That means all purging of that data has to stop, even if it's routine and automatic, Curtis says.

Why? Because if the plaintiff seeks some information through e-discovery and learns it was purged after the date when it shouldn't have been, that can lead to complications that needlessly increase the cost of litigation or even hurt the company's case, Curtis says.

Fear of this kind of scenario is what drives some organizations to save all their data. But that's not necessarily a good idea, say Curtis and others. If everything is saved, finding the relevant pieces becomes harder and more expensive.

Generally, courts won't frown on deletion of records if it was done according to a well-established policy and schedule -- and if the company had no reason to think it would be sued over something related to those records. "The law and courts recognize a safe harbor for the destruction of records according to well-established policies so long as the company was not involved in or anticipating litigation," Curtis says.

However, that policy must be precise and followed to the letter, she adds. If the IT department is asked how often it purges someone's old e-mail, "generalizations are not enough," she says. "It's got to be exact." In companies that tend to be sued frequently, such as financial services firms, IT may need to work out its retention policies in cooperation with the company's attorneys, she says.

E-discovery is where the best-laid plans for holding on to data, complying with regulations, and gradually purging it can all be dashed, says EMC's Cohen. For example, a company may have a three-year purging cycle to comply with laws that call for three years of retention. If a lawsuit comes along and some documents have to be put on legal hold -- which lasts indefinitely -- a company has to be able to separate those from the rest so that it can continue with its purging policy. "If you can't segregate what's on legal hold, you'll never get to three years," Cohen says.

All major archiving vendors  offer tools that promise to handle such segregation so that you can keep purging information not subject to e-discovery. They are not infallible, but most can check their own work, says ESG's Babineau. For example, an administrator can run the analysis more than once and the software will deliver a report on any discrepancies among the results.

How to think about the storage hardware and software
Once businesses have figured out their backup, retention, and e-discovery strategies and policies, they can begin investigating products, says Ovum analyst Tim Stammers. In making those product assessments, they should first find out how fast their own storage needs are actually growing. In addition, they should keep in mind that storage prices are falling, so it makes little sense to buy more than they currently need.

Also, they should look at a strategy involving data deduplication, which eliminates common elements of documents that have many copies and can dramatically cut the amount of capacity they need, Stammers notes. And they should consider alternatives to in-house storage, such as cloud storage, that may be more economical. Although disks will get less expensive over time, in-house staff to manage them won't, Stammers points out.

Companies that haven't dug through those kinds of questions or examined their retention policies may make the wrong choices and end up investing in more storage than they need, Babineau says.


Copyright © 2009 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon