Computerworld - It's a cruel world out there in the data center. Nothing lasts forever, especially not mechanical devices with fast-moving parts, such as disk drives and printers. It would be very useful if we could predict when something might break or, at the very least, determine which of two similar products would be less likely to break in a given period. The answer is MTBF, short for mean time between failures, and the closely related MTTF, short for mean time to failure. Both are measures of reliability that are defined statistically as the number of hours a component, assembly or system will operate before it fails.
MTBF sounds simple: the total time measured divided by the total number of failures observed. For example, let's wring out a new generation of 2.5-in. SCSI enterprise hard drives. We run 15,400 initial units for 1,000 hours each (thus our tests take a little less than six weeks), and we find 11 failures. The MTBF is (15,400 x 1,000) hours/11, or 1.4 million hours. (This is not a hypothetical MTBF; it represents current drive technology in 2005.)
What does this calculation really mean? An MTBF of 1.4 million hours, determined in six weeks of testing, certainly doesn't say we can expect an individual drive to operate for 159 years before failing. MTBF is a statistical measure, and as such, it can't predict anything for a single unit. We can use that MTBF rating more accurately, however, to calculate that if we have 1,000 such drives operating continuously in a data center, we can expect one to fail every 58 days or so, for a total of perhaps 19 failures in three years.
The MTBF figure for a product can be derived from laboratory testing, actual field failure data or prediction models such as MIL-HDBK-217 (the Military Handbook for Reliability Prediction of Electronic Equipment, published by the U.S. Department of Defense).
MIL-HDBK-217 contains failure-rate models for various parts used in electronic systems, such as integrated circuits, transistors, diodes, resistors, capacitors, relays, switches and connectors. These failure-rate models are based on a large amount of field data that was analyzed and simplified by the Reliability Analysis Center and Rome Laboratory at Griffiss Air Force Base in Rome, N.Y. (Instructions for downloading MIL-HDBK-217 are at www.t-cubed.com/faq_217.htm.)
Kay is a Computerworld contributing writer in Worcester, Mass. You can contact him at email@example.com.
See additional Computerworld QuickStudies
Read more about Hardware in Computerworld's Hardware Topic Center.
- Why Projects Fail CIOs are expected to deliver more projects that transform business, and do so on time, on budget and with limited resources.
- The New Business Case for Video Conferencing: 7 Real-World Benefits Beyond Cost-Savings This whitepaper provides insight into the value of video conferencing in today's business environment, and how organizations are using visual collaboration to find...
- Gartner Magic Quadrant for Client Management Tools The client management tool market is maturing and evolving to adapt to consumerization, desktop virtualization, and an ongoing need to improve efficiency.
- Audit Ready and Asset Optimized: The Solid Promise of an Intelligent Software Asset Management Solution In this paper Frost & Sullivan examines the benefits of enterprise-grade Software Asset Management solutions, and how these solutions serve as the convergence...
- Redefine Your IT Operations: Remote Office IT Has Never Been Simpler Join us to see why PC Pro named Dell PowerEdge VRTX the "2013 Server of the Year." PowerEdge VRTX may be just what...
- LIVE EVENT: 5/7, The End of Data Protection As We Know It. Introducing a Next Generation Data Protection Architecture. Traditional backup is going away, but where does this leave end-users? All Hardware White Papers | Webcasts