Backblaze Drive Stats for Q3 2023
On the finish of Q3 2023, Backblaze was monitoring 263,992 onerous disk drives (HDDs) and stable state drives (SSDs) in our information facilities world wide. Of that quantity, 4,459 are boot drives, with 3,242 being SSDs and 1,217 being HDDs. The failure charges for the SSDs are analyzed within the SSD Edition: 2023 Drive Stats review.
That leaves us with 259,533 HDDs that we’ll concentrate on on this report. We’ll assessment the quarterly and lifelong failure charges of the info drives as of the top of Q3 2023. Alongside the way in which, we’ll share our observations and insights on the info introduced, and, for the primary time ever, we’ll reveal the drive failure charges damaged down by information middle.
Q3 2023 Onerous Drive Failure Charges
On the finish of Q3 2023, we have been managing 259,533 onerous drives used to retailer information. For our assessment, we eliminated 449 drives from consideration as they have been used for testing functions, or have been drive fashions which didn’t have not less than 60 drives. This leaves us with 259,084 onerous drives grouped into 32 totally different fashions.
The desk beneath critiques the annualized failure price (AFR) for these drive fashions for the Q3 2023 time interval.
Notes and Observations on the Q3 2023 Drive Stats
- The 22TB drives are right here: On the backside of the listing you’ll see the WDC 22TB drives (mannequin: WUH722222ALE6L4). A Backblaze Vault of 1,200 drives (plus 4) is now operational. The 1,200 drives have been put in on September 29, so that they solely have in the future of service every on this report, however zero failures to date.
- The previous get bolder: On the different finish of the time-in-service spectrum are the 6TB Seagate drives (mannequin: ST6000DX000) with a median of 101 months in operation. This cohort had zero failures in Q3 2023 with 883 drives and a lifetime AFR of 0.88%.
- Zero failures: In Q3, six totally different drive fashions managed to have zero drive failures in the course of the quarter. However solely the 6TB Seagate, famous above, had over 50,000 drive days, our minimal commonplace for making certain we’ve sufficient information to make the AFR believable.
- One failure: There have been 4 drive fashions with one failure throughout Q3. After making use of the 50,000 drive day metric, two drives stood out:
- WDC 16TB (mannequin: WUH721816ALE6L0) with a 0.15% AFR.
- Toshiba 14TB (mannequin: MG07ACA14TEY) with a 0.63% AFR.
The Quarterly AFR Drops
In Q3 2023, quarterly AFR for all drives was 1.47%. That was down from 2.2% in Q2 and likewise down from 1.65% a 12 months in the past. The quarterly AFR relies on simply the info in that quarter, so it will possibly usually fluctuate from quarter to quarter.
In our Q2 2023 report, we suspected the two.2% for the quarter was as a result of total growing old of the drive fleet and particularly we pointed a finger at particular 8TB, 10TB, and 12TB drive fashions as potential culprits driving the rise. That prediction fell flat in Q3 as almost two-thirds of drive fashions skilled a decreased AFR quarter over quarter from Q2 and any will increase have been minimal. This included our suspect 8TB, 10TB, and 12TB drive fashions.
It appears Q2 was an anomaly, however there was one huge distinction in Q3: we retired 4,585 growing old 4TB drives. The common age of the retired drives was simply over eight years, and whereas that was begin, there’s one other 28,963 4TB drives to go. To facilitate the continual retirement of growing old drives and make the info migration course of simple and protected we use CVT, our superior in-house information migration software program which we’ll cowl at one other time.
A Scorching Summer season and the Drive Stats Knowledge
As anybody ought to in our enterprise, Backblaze repeatedly displays our techniques and drives. So, it was of little shock to us when the oldsters at NASA confirmed the summer season of 2023 as Earth’s hottest on document. The consequences of this record-breaking summer season confirmed up in our monitoring techniques within the type of drive temperature alerts. A given drive in a storage server can warmth up for a lot of causes: it’s failing; a fan within the storage server has failed; different parts are producing further warmth; the air circulation is in some way restricted; and so forth. Add in the truth that the ambient temperature inside an information middle usually will increase in the course of the summer season months, and you may get extra temperature alerts.
In reviewing the temperature information for our drives in Q3, we seen {that a} small variety of drives exceeded the utmost producer’s temperature for not less than in the future. The utmost temperature for many drives is 60°C, aside from the 12TB, 14TB, and 16TB Toshiba drives which have a most temperature of 55°C. Of the 259,533 information drives in operation in Q3, there have been 354 particular person drives (0.0013%) that exceeded their most producer temperature. Of these solely two drives failed, leaving 352 drives which have been nonetheless operational as of the top of Q3.
Whereas temperature fluctuation is a part of operating information facilities and temp alerts like these aren’t remarkable, our information middle groups are trying into the basis causes to make sure we’re ready for the inevitability of more and more sizzling summers to return.
Will the Temperature Alerts Have an effect on Drive Stats?
The 2 drives which exceeded their most temperature and failed in Q3 have been faraway from the Q3 AFR calculations. Each drives have been 4TB Seagate drives (mannequin: ST4000DM000). Provided that the remaining 352 drives which exceeded their temperature most didn’t fail in Q3, we’ve left them within the Drive Stats calculations for Q3 as they didn’t improve the computed failure charges.
Starting in This fall, we are going to take away the 352 drives from the common Drive Stats AFR calculations and create a separate cohort of drives to trace that we’ll title Scorching Drives. This may enable us to trace the drives which exceeded their most temperature and examine their failure charges to these drives which operated throughout the producer’s specs. Whereas there are a restricted variety of drives within the Scorching Drives cohort, it may give us some perception into whether or not drives being uncovered to excessive temperatures may trigger a drive to fail extra usually. This heightened degree of monitoring will establish any improve in drive failures in order that they are often detected and handled expeditiously.
New Drive Stats Knowledge Fields in Q3
In Q2 2023, we launched three new information fields that we began populating within the Drive Stats information we publish: vault_id
, pod_id
, and is_legacy_format
. In Q3, we’re including three extra fields into every drive data as follows:
datacenter
: The Backblaze information middle the place the drive is put in, presently considered one of these values:ams5
,iad1
,phx1
,sac0
, andsac2
.cluster_id
: The title of a given assortment of storage servers logically grouped collectively to optimize system efficiency. Notice: At the moment thecluster_id
shouldn’t be all the time right, we’re engaged on fixing that.pod_slot_num
: The bodily location of a drive inside a storage server. The particular slot differs based mostly on the storage server kind and capability: Backblaze (45 drives), Backblaze (60 drives), Dell (26 drives), or Supermicro (60 drives). We’ll dig into these variations in one other publish.
With these additions, the brand new schema starting in Q3 2023 is:
date
serial_number
mannequin
capacity_bytes
failure
datacenter
(Q3)cluster_id
(Q3)vault_id
(Q2)pod_id
(Q2)pod_slot_num
(Q3)is_legacy_format
(Q2)smart_1_normalized
smart_1_raw
- The remaining SMART worth pairs (as reported by every drive mannequin)
Starting in Q3, these information information fields have been added to the publicly available Drive Stats files that we publish every quarter.
Failure Charges by Knowledge Heart
Now that we’ve the info middle for every drive we will compute the AFRs for the drives in every information middle. Under you’ll discover the AFR for every of 5 information facilities for Q3 2023.
Notes and Observations
- Null?: The drives which reported a null or clean worth for his or her information middle are grouped in 4 Backblaze vaults. David, the Senior Infrastructure Software program Engineer for Drive Stats, described the process of how we gather all the parts of the Drive Stats data each day. The TL:DR is that vaults will be too busy to reply in the intervening time we ask, and for the reason that information middle subject is nice-to-have information, we get a clean subject. We will return a day or two to search out the info middle worth, which we are going to do sooner or later after we report this information.
- sac0?: sac0 has the best AFR of the entire information facilities, nevertheless it additionally has the oldest drives—almost twice as previous, on common, versus the following closest in information middle, sac2. As mentioned beforehand, drive failures do seem to follow the “bathtub curve”, though lately we’ve seen the curve begin out flatter. Regardless, as drive fashions age, they do typically fail extra usually. One other issue may very well be that sac0, and to a lesser extent sac2, has among the oldest Storage Pods, together with a handful of 45-drive items. We’re within the technique of utilizing CVT to exchange these older servers whereas migrating from 4TB to 16TB and bigger drives.
- iad1: The iad information middle is the muse of our japanese area and has been rising quickly since coming on-line a few 12 months in the past. The expansion is a mixture of recent information and clients utilizing our cloud replication capability to mechanically make a replica of their information in one other area.
- Q3 Knowledge: This chart is for Q3 information solely and contains all the info drives, together with these with lower than 60 drives per mannequin. As we observe this information over the approaching quarters, we hope to get some perception into whether or not totally different information facilities actually have totally different drive failure charges, and, in that case, why.
Lifetime Onerous Drive Failure Charges
As of September 30, 2023, we have been monitoring 259,084 onerous drives used to retailer buyer information. For our lifetime evaluation, we gather the variety of drive days and the variety of drive failures for every drive starting from the time a drive was positioned into manufacturing in considered one of our information facilities. We group these drives by mannequin, then sum up the drive days and failures for every mannequin over their lifetime. That chart is beneath.
Some of the necessary columns on this chart is the arrogance interval, which is the distinction between the high and low AFR confidence ranges calculated at 95%. The decrease the worth, the extra sure we’re of the AFR acknowledged. We like a confidence interval to be 0.5% or much less. When the arrogance interval is greater, that’s not essentially dangerous, it simply means we both want extra information or the info is considerably inconsistent.
The desk beneath incorporates simply these drive fashions which have a confidence interval of lower than 0.5%. We’ve sorted the listing by drive dimension after which by AFR.
The 4TB, 6TB, 8TB, and among the 12TB drive fashions are not in manufacturing. The HGST 12TB fashions particularly can nonetheless be discovered, however they’ve been relabeled as Western Digital and given alternate mannequin numbers. Whether or not they have materially modified internally shouldn’t be recognized, not less than to us.
One ultimate observe concerning the lifetime AFR information: you might need seen the AFR for the entire drives hasn’t modified a lot from quarter to quarter. It has vacillated between 1.39% to 1.45% % for the final two years. Principally, we’ve a number of drives with a number of time-in-service so it’s onerous to maneuver the needle up or down. Whereas the lifetime stats for particular person drive fashions will be very helpful, the lifetime AFR for all drives will most likely get much less and fewer attention-grabbing as we add an increasing number of drives. In fact, a couple of hundred thousand drives that by no means fail may arrive, so we are going to proceed to calculate and current the lifetime AFR.
The Onerous Drive Stats Knowledge
The entire information set used to create the data used on this assessment is accessible on our Hard Drive Stats Data webpage. You possibly can obtain and use this information without cost in your personal objective. All we ask are three issues: 1) you cite Backblaze because the supply in the event you use the info, 2) you settle for that you’re solely answerable for how you employ the info, and three) you don’t promote this information to anybody; it’s free.
Good luck and tell us in the event you discover something attention-grabbing.