Data hoarding: The consequences go far beyond compliance risk

What’s the oldest file on your computer? A scan of my hard drive turned up a handful from the late 1990s as well as a 2GB trove of interview notes and call recordings from a book I wrote in 2006.

The chance that I’ll ever use this information is next to zero, but I keep it around because, well, I can. The relentless decline in the cost of storage has made it cheaper to retain information than to throw it away. That plays well to the human propensity for keeping stuff around, a hoarding instinct that has made the US self-storage industry a $40 billion business.

Hoarding isn’t such a good idea when it comes to data, however. If I were working for a corporation and some California residents I interviewed in 2006 exercised their legal right to be forgotten, my company could be on the hook for my pack-rat behavior.

“Human beings don’t like to delete stuff,” said Bill Tolson, vice president of global compliance and e-discovery, Archive360, a data migration, and management company.

Organizational ROT

The result is that, by some estimates, as much as 80% of the information businesses and their employees have is outdated or useless. Information governance professionals have a term for this: ROT (redundant, obsolete, trivial).

There’s a myth that companies that aren’t subject to industry-specific regulations like FINRA or HIPAA are immune from liability for keeping old data on hand, but nearly every organization is regulated these days. Under the General Data Protection Act in Europe, similar legislation in California and Virginia, and privacy restrictions being enacted in more than 120 countries around the world, keeping data longer than it’s needed is a risk to any organization.

Regulation is just one of several reasons to clean out your hard drive. The most well-defended corporate databases can’t protect against a malware attack on a home PC or information unintentionally left in the open on a cloud server. The more data a company collects, the bigger the attack surface.

“Why spend money to protect data you don’t need and why keep it someplace a hacker can take advantage of?” said Sue Trombley, managing director of thought leadership at data and records management giant Iron Mountain. Ransomware doesn’t distinguish between good and bad data, and no one wants to pay to recover something that shouldn’t have been there in the first place.

Costs can be deceptive

Then there’s cost.

“Storage is cheap but the people to manage it aren’t cheap,” said Trombley. Data needs to be protected and backed up and the cost mounts with volume. And if the information is ever subject to a legal proceeding, costs can skyrocket. The cost of simply collecting data to meet a legal discovery request “can exceed $500 per [gigabyte] even before attorneys review the data,” said John Roman, president of IT risk management firm FoxPointe Solutions.

In an oft-cited 2002 analysis of electronic discovery costs covering nine cases, DuPont reported that half of the more than 75 million pages of documents that were reviewed were past the company’s required retention period, resulting in nearly $12 million in unnecessary review fees. It’s safe to say the figure would be much higher today.

Other costs are harder to estimate, such as the impact of poor business decisions based on outdated information, confusion caused by conflicting information, or time spent sifting through useless data looking for something of value. “If the average employee spends two hours per week looking for information, what does that contribute to the overall cost?” asks Tolson. “What revenue could they have generated instead?”

Despite compelling arguments for throwing away unnecessary data, few organizations restrict the use of personal storage devices or cloud file shares. “They don’t think about it,” Tolson said. “It’s at the bottom of the list of things they may address someday.”

AI to the rescue?

Technology offers a partial solution. Data catalog software automates the process of discovering and categorizing data across an organization. Most data catalog vendors also offer discovery features that can find data on corporate servers, individual PCs, and cloud storage. Many even flag or automatically delete old records based on company policies.

A more lasting solution is to implement data governance standards that define how users should manage data responsibly, including the use of meta-tags, limits on making copies, and record-retention schedules. Thanks to the wake-up call of privacy regulations, “large organizations have become savvy about records retention,” said Trombley.

In the long term, Tolson believes technology will find a solution. “You have to change the company culture to actively manage old data and put policies in place to cull it when it’s no longer needed,” he said. “An artificial intelligence system should be able to do this transparently.”

As long as it doesn’t touch those old audio files on my PC.

Next read this:

Related:

Copyright © 2021 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon