Hard disk drive reliability and MTBF / AFR

Seagate is no longer using the industry standard "Mean Time Between Failures" (MTBF) to quantify disk drive average failure rates. MTBF has proven useful in the past, but it is flawed.

To address issues of reliability, Seagate is changing to another standard: "Annualized Failure Rate" (AFR).

MTBF is a statistical term relating to reliability as expressed in power on hours (p.o.h.) and is often a specification associated with hard drive mechanisms.
It was originally developed for the military and can be calculated several different ways, each yielding substantially different results. It is common to see MTBF ratings between 300,000 to 1,200,000 hours for hard disk drive mechanisms, which might lead one to conclude that the specification promises between 30 and 120 years of continuous operation. This is not the case! The specification is based on a large (statistically significant) number of drives running continuously at a test site, with data extrapolated according to various known statistical models to yield the results.
Based on the observed error rate over a few weeks or months, the MTBF is estimated and not representative of how long your individual drive, or any individual product, is likely to last. Nor is the MTBF a warranty - it is representative of the relative reliability of a family of products. A higher MTBF merely suggests a generally more reliable and robust family of mechanisms (depending upon the consistency of the statistical models used). Historically, the field MTBF, which includes all returns regardless of cause, is typically 50-60% of projected MTBF.

Seagate's new standard is AFR.  AFR is similar to MTBF and differs only in units. While MTBF is the probable average number of service hours between failures, AFR is the probable percent of failures per year, based on the manufacturer's total number of installed units of similar type. AFR is an estimate of the percentage of products that will fail in the field due to a supplier cause in one year. Seagate has transitioned from average measures to percentage measures.

MTBF quantifies the probability of failure for a product, however, when a product is first introduced: this rate is often a predicted number, and only after a substantial amount of testing or extensive use in the field can a manufacturer provide demonstrated or actual MTBF measurements. AFR will better allow service plans and spare unit strategies to be set.

Hard drive reliability is closely related to temperature. By operational design, the ambient temperature is 86°F. Temperatures above 122°F or below 41°F, decrease reliability. Directed airflow up to 150 linear feet/min. is recommended for high speed drives.

The failure rate does not include drive returns with "no trouble found", excessive shock failure, or handling damage. 
Find specifications for reliability, operational shock and vibration by searching for your drive's Product Manual.

Here is an example excerpt from a Product Manual, in this case for the Barracuda ES.2 Near-Line Serial ATA drive:

The product shall achieve an Annualized Failure Rate - AFR - of 0.73% (Mean Time Between Failures - MTBF - of 1.2 Million hrs) when operated in an environment that ensures the HDA case temperatures do not exceed 40°C. Operation at case temperatures outside the specifications in Section 2.9 may increase the product Annualized Failure Rate (decrease MTBF). AFR and MTBF are population statistics that are not relevant to individual units.
AFR and MTBF specifications are based on the following assumptions for business critical storage system environments:

  • 8,760 power-on-hours per year.
  • 250 average motor start/stop cycles per year.
  • Operations at nominal voltages.
  • Systems will provide adequate cooling to ensure the case temperatures do not exceed 40°C. Temperatures outside the specifications in Section 2.9 will increase the product AFR and decrease MTBF.