Content addressed storage systems may be at risk
The MD5 hashing algorithm sometimes used has a security flaw
January 28, 2005 12:00 PM ETComputerworld -
Security experts are warning about a flawed hashing algorithm, MD5, used by some vendors for digital signatures to store data securely on increasingly popular content addressed storage systems. The warnings come as more companies unveil CAS systems to meet the need for disk-based backup of fixed data such as e-mails and medical images.
"It really is time for [the industry] to stop using MD5," said Dan Kaminsky, a security consultant at Avaya Inc. in Basking Ridge, N.J. "MD5 has been a deprecated hashing algorithm for almost a decade. The U.S. government agreed."
According to Kaminsky, MD5 has been decertified for secure operations by the National Institute of Standards and Technology since at least 1998. "The industry has clung to the algorithm, partially out of inertia, partially out of scarcity of computer power," he said.
There are currently three major vendors of CAS storage: EMC Corp., Permabit Inc. in Cambridge, Mass., and Archivas Inc. in Waltham, Mass. Both EMC and Archivas use the MD5 hashing algorithm; Permabit does not.
Just this week, Storage Technology Corp. announced that it would use OEM Permabit's technology for e-mail archival. And Sun Microsystems Inc. is currently developing its own CAS, called Honeycomb, with several beta testers and plans to release it toward the end of the year.
Sun wouldn't say which algorithm it will use to store data.
Kaminsky published a report last month on the MD5 algorithm pointing out that an attack could be used to create two files with the same MD5 hash, one with "safe" data and one with "malicious" data. When both of those files are saved to the same system, a so-called collision can result, leading to data loss or dissemination of bad data, Kaminsky said.
CAS systems store metadata and data along with management policies to create an object that is quickly retrievable, no matter where its stored on a disk subsystem. CAS also uses write once, read many (WORM) capability to ensure that once data is stored it cannot be overwritten, which satisfies several regulatory requirements. Hashing is a way to create a shorter fixed-length key or index that represents the original data stored in a device. A multidigit number, for example, could be a hash representing a person's longer name or a specific document.
The vast majority of CAS is being purchased by the financial services and medical industries to store data regulated by the U.S. Securities and Exchange Commission under Rule 17a-4 and Health Insurance Portability and Accountability Act regulations. EMC's $200,000 Centera CAS array uses
Storage
Additional Resources



Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.
White Papers & Webcasts
Achieving Flexible Storage Scalability: The Case for Enterprise Modular Storage Arrays
Download it today!
Key Strategies for Managing Data Growth
What are you storage challenges?
Data Manager Report Excerpt: File System Inventory
Cut storage costs and boost operational efficiencies.
Data Protection is not an insurance policy -you cannot buy-back lost data
Find out why you need to maintain access to critical information to run your business and remain competitive.
Reducing Storage Costs with F5 ARX
Save money- deploy ARX Solutions.
Strategic ECM Webinar
Learn what new strategic business benefits can be realized through ECM!
Rethinking Business Continuity and High Availability in Storage - HP and Forrester Pre-Recorded Webcast
Download it.
Essential Archive Requirements for E-Discovery
Register Now!
5 Architecture Issues that Impact BES performance
Register to attend this LIVE Webinar to learn 5 Architecture Issues that Impact BES performance!
