QuickStudy: Mean time between failures (MTBF)
Computerworld - It's a cruel world out there in the data center. Nothing lasts forever, especially not mechanical devices with fast-moving parts, such as disk drives and printers. It would be very useful if we could predict when something might break or, at the very least, determine which of two similar products would be less likely to break in a given period. The answer is MTBF, short for mean time between failures, and the closely related MTTF, short for mean time to failure. Both are measures of reliability that are defined statistically as the number of hours a component, assembly or system will operate before it fails.
MTTF and MTBF are sometimes used interchangeably, but they are in fact different. MTTF refers to the average (the mean, in arithmetic terms) time until a component fails, can't be repaired and must therefore be replaced, or until the operation of a product, process or design is disrupted. MTBF is properly used only for components that can be repaired and returned to service. This introduces a couple of related abbreviations occasionally encountered: MTTR (mean time to repair) and, less common, MTTD (mean time to diagnose). With those notions in mind, we could say that MTBF = MTTF + MTTD + MTTR.
Calculating MTBF
MTBF sounds simple: the total time measured divided by the total number of failures observed. For example, let's wring out a new generation of 2.5-in. SCSI enterprise hard drives. We run 15,400 initial units for 1,000 hours each (thus our tests take a little less than six weeks), and we find 11 failures. The MTBF is (15,400 x 1,000) hours/11, or 1.4 million hours. (This is not a hypothetical MTBF; it represents current drive technology in 2005.)
What does this calculation really mean? An MTBF of 1.4 million hours, determined in six weeks of testing, certainly doesn't say we can expect an individual drive to operate for 159 years before failing. MTBF is a statistical measure, and as such, it can't predict anything for a single unit. We can use that MTBF rating more accurately, however, to calculate that if we have 1,000 such drives operating continuously in a data center, we can expect one to fail every 58 days or so, for a total of perhaps 19 failures in three years.
The MTBF figure for a product can be derived from laboratory testing, actual field failure data or prediction models such as MIL-HDBK-217 (the Military Handbook for Reliability Prediction of Electronic Equipment, published by the U.S. Department of Defense).
MIL-HDBK-217 contains failure-rate models for various parts used in electronic systems, such as integrated circuits, transistors, diodes, resistors, capacitors, relays, switches and connectors. These failure-rate models are based on a large amount of field data that was analyzed and simplified by the Reliability Analysis Center and Rome Laboratory at Griffiss Air Force Base in Rome, N.Y. (Instructions for downloading MIL-HDBK-217 are at www.t-cubed.com/faq_217.htm.)
Kay is a Computerworld contributing writer in Worcester, Mass. You can contact him at russkay@charter.net.
See additional Computerworld QuickStudies
Read more about Hardware in Computerworld's Hardware Topic Center.



- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- The Laptop Dilemma: How to Maximize Productivity and Lower the Burden on IT
- Download Now
- Overcome Top 7 Admin Challenges of Active Directory
- As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
- Insiders Can Ruin Your Company. Take Action.
- Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
- Top Solutions and Tools to Prevent Devastating Malware
- Custom malware frequently goes undetected. According to Forrester Research, the best way to reduce risk of breach is to deploy file integrity monitoring...
- Streamline Compliance and Increase ROI
- Streamline, simplify, and automate compliance related activities; especially those that impact multiple business units. This white paper from NetIQ, outlines solutions that will... All Hardware White Papers
- Optimizing Networks for the Cloud
- Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
- Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
- Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
- Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
- Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
- Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
- Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
- Virtualize Business-Critical Applications with Confidence
- Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere®... All Hardware Webcasts