Ads by TechWords

See your link here
Receive the latest technology news and information.
Storage
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

Vendor disk failure rates: Myth or metric?

Disk problems contribute to 20% to 55% of storage subsystem failures

April 4, 2008 12:00 PM ET

Active Comments
Quinn the Eskimo says: Two million hours between failures! Well, that's over 228 years. What? The hard disk was invented in ... 1960's? Never...
Anonymous says: MTBF is often calculated with 0 real world experience, an item this article ignores...


Computerworld - The statistics of mean time between failures (MTBF) and average failure rate (AFR) have gotten lots of attention lately in the storage world, especially with the release of three much-discussed studies devoted to the topic in the last year. And for good reason: Vendor-stated MTBFs have risen into the 1 million-to-1.5 million-hour range, equaling 114 to 170 years, a lifespan that no one is seeing in the real world.

Three studies over the past year on MTBF include the following:

"MTBF is a term that's in growing disrepute inside the industry because people don't understand what the numbers mean," says Robin Harris, an analyst at Data Mobility Group who also runs the StorageMojo blog. "Your average consumer and a lot of server administrators don't really get why vendors say a disk has a 1 million-hour MTBF, and yet it doesn't last that long."

Indeed, "how do these numbers help a person who wants to evaluate drives?" says Steve Smith, a former EMC Corp. employee and an independent management consultant in Bellevue, Wash. "I don't think they can.

Even storage system maker NetApp Inc. acknowledges in a response to an open letter on the StorageMojo blog that failure rates are several times higher than reported. "Most experienced storage array customers have learned to equate the accuracy of quoted drive-failure specs to the miles-per-gallon estimates reported by car manufacturers," the company says. "It's a classic case of 'Your mileage may vary' -- and often will -- if you deploy these disks in anything but the mildest of evaluation/demo lab environments."

Study results

The upshot of the recent studies can be summarized this way: Users and vendors live in very different worlds when it comes to disk reliability and failure rates.

Consider that MTBF is a figure that's reached through stress-testing and statistical extrapolation, Harris says. "When the vendor specs a 300,000-hour MTBF -- which is common for consumer-level SATA drives -- they're saying that for a large population of drives, half will fail in the first 300,000 hours of operation," he says on his blog. "MTBF, therefore, says nothing about how long any particular drive will last." In other words, MTBF does a very poor job communicating what the actual failure profile looks like, he says.

It's like providing the average woman's height in the U.S. but without showing the numbers used to derive that average, Smith says. "MTBF became the standard because it was perceived as a simpler answer to the question of reliability than showing the data of how they arrived at it," Smith says. "It's an honest-to-God simplification."



Jump to comments

Mean time between failure

Additional Resources

WHITE PAPER
Approximately 60 percent of data migration projects overrun time or budget, while some fail completely. Download this white paper, "Enhancing Your Chance for Successful Data Migration," to learn the critical steps you need to take to execute a data migration project with minimum cost and risk to your business.
WHITE PAPER
Read the Gartner research note to learn why the TCO of a server-based computing deployment used to deliver all applications to users is around 50% lower than that of an unmanaged desktop deployment.
WHITE PAPER
Economic downturns have a tendency to accelerate emerging technologies, boost the adoption of effective solutions, and punish solutions that are not cost competitive or that are out of synch with industry trends. This IDC White Paper presents the results of an IDC survey of 330 companies in Western Europe, Asia/Pacific and the Americas that measures the receptiveness to Linux and takes into consideration changing views driven by the disruptive economic environment that businesses face today.

What People Are Saying

White Papers & Webcasts

Southern Company
Download Now  

Disaster Recovery 2008: Reduced Costs and Improved Performance
How long can your Enterprise afford to be without your data? With an accelerated disaster recovery program, you never have to answer this...

HP StorageWorks EVA4400 & Microsoft
Download this video, free, compliments of HP.

From Trust to Process: Closing the Risk Gap in Privileged Access Control
Download this Complimentary White Paper! Provided by BeyondTrust.  

Featured Zone
Business Continuity Zone
An organization's business continuity plan helps keep critical functions running during an emergency–the power fails, a virus is unleashed on your network, a natural disaster has occurred. Even the slightest downtime or loss of data can cripple your operation. CDW can help you prevent disaster by implementing a well-planned recovery strategy.
Click here to visit the Zone
See All Zones

 

Forrester Analyst Report: X86 Server Virtualization For High Availability and Disaster Recovery
According to a recent Forrester study, 49% of enterprises surveyed that are implementing or interested in x86 server virtualization. In particular, x86 server virtualization can improve the availability of business-critical systems that are important to the business but not critical enough to warrant the investment in expensive and complex resiliency technologies like fault-tolerant hardware or clustering.

Download this whitepaper 
Yankee Group. "Disaster Strikes! Is Your Business Ready? Disaster Preparedness for Mid-Sized Firms"
Mid-sized businesses have long struggled to protect their IT systems. Many firms are inadequately protected and mistakenly think that a disaster is rare and won't happen to them anytime soon. This custom Yankee Group Report studies the newest technology trends, such as virtualization and storage replication, which make powerful DR solutions attainable and affordable even for mid-sized businesses.

Download this whitepaper 
VMware White Paper: Transforming Disaster Recovery - VMware Infrastructure for rapid, reliable and cost-effective Disaster Recovery
VMware Infrastructure transforms disaster recovery by providing you fast, reliable and cost-effective disaster recovery. Why suffer from the slow, expensive and unreliable problems associated with traditional disaster recovery solution? VMware makes disaster recovery affordable through consolidation savings and re-use of existing servers for your disaster recovery site. Experience the speed of virtualization!

Download this whitepaper