Vendor disk failure rates: Myth or metric?
Disk problems contribute to 20% to 55% of storage subsystem failures
Computerworld - The statistics of mean time between failures (MTBF) and average failure rate (AFR) have gotten lots of attention lately in the storage world, especially with the release of three much-discussed studies devoted to the topic in the last year. And for good reason: Vendor-stated MTBFs have risen into the 1 million-to-1.5 million-hour range, equaling 114 to 170 years, a lifespan that no one is seeing in the real world.
Three studies over the past year on MTBF include the following:
- Google Inc.'s "Failure Trends in a Large Disk Drive Population"
- Carnegie Mellon University's "Disk Failures in the Real World"
- University of Illinois' "Are Disks the Dominant Contributor for Storage Failures?"
Indeed, "how do these numbers help a person who wants to evaluate drives?" says Steve Smith, a former EMC Corp. employee and an independent management consultant in Bellevue, Wash. "I don't think they can.
Even storage system maker NetApp Inc. acknowledges in a response to an open letter on the StorageMojo blog that failure rates are several times higher than reported. "Most experienced storage array customers have learned to equate the accuracy of quoted drive-failure specs to the miles-per-gallon estimates reported by car manufacturers," the company says. "It's a classic case of 'Your mileage may vary' -- and often will -- if you deploy these disks in anything but the mildest of evaluation/demo lab environments."
Study results
The upshot of the recent studies can be summarized this way: Users and vendors live in very different worlds when it comes to disk reliability and failure rates.Consider that MTBF is a figure that's reached through stress-testing and statistical extrapolation, Harris says. "When the vendor specs a 300,000-hour MTBF -- which is common for consumer-level SATA drives -- they're saying that for a large population of drives, half will fail in the first 300,000 hours of operation," he says on his blog. "MTBF, therefore, says nothing about how long any particular drive will last." In other words, MTBF does a very poor job communicating what the actual failure profile looks like, he says.
It's like providing the average woman's height in the U.S. but without showing the numbers used to derive that average, Smith says. "MTBF became the standard because it was perceived as a simpler answer to the question of reliability than showing the data of how they arrived at it," Smith says. "It's an honest-to-God simplification."
- The 20 Best iPhone/iPad Games of 2013 So Far
- 9 Steps to Build Your Personal Brand (and Your Career)
- 7 Consumer Technologies Coming to an Enterprise Near You
- 11 Signs Your IT Project is Doomed
- A walking tour: 33 questions to ask about your company's security
- 15 social media scams
- The 7 elements of a successful security awareness program
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- What does it take to deliver Security, Privacy and Trust at Mimecast? This whitepaper explains the process and controls that Mimecast put in place to deliver a secure, private and trusted SaaS platform for your...
- IDC: Generating Proven Business Value with EMC Next Generation Backup and Recovery Read this IDC analysis of ten midsize companies that have deployed EMC backup and recovery solutions to learn key IDC findings including average...
- Backup and Recovery Changes Drive IT Infrastructure and Business Transformation This IDC Whitepaper provides an overview of the forces driving change within today's IT organizations and data centers and discusses how backup and...
- Taking a Single-System Approach to Enable Faster Backup and More Effective Long-Term Archiving Read the IDC report on how EMC is well positioned to help organizations that want to consider alternatives to tape for long-term backup...
- Backup for Oracle Interactive Desktop Explore why more people have chosen EMC Backup for Oracle and how EMC can help you transform your backup, with this interactive desktop...
- Data Protection and Disaster Recovery with iSCSI and VMware Get this on demand webcast now All Disaster Recovery White Papers | Webcasts