The Drive Stats of Backblaze Storage Pods
Since 2009, Backblaze has written extensively about the data storage servers we created and deployed which we name Backblaze Storage Pods. We not solely wrote about our Storage Pods, we open sourced the design, printed a elements checklist, and even offered directions on construct one. Many individuals did. Of the six storage pod variations we produced, 4 of them are nonetheless in operation in our information facilities right now. Over the previous few years, we started utilizing storage servers from Dell and, extra not too long ago, Supermicro, as they’ve confirmed to be economically and operationally viable in the environment.
Since 2013, we have now additionally written extensively about our Drive Stats, sharing experiences on the failure charges of the HDDs and SSDs in our legion of storage servers. We have now examined the drive failure charges by producer, measurement, age, and so forth, however we have now by no means analyzed the drive failure charges of the storage servers—till now. Let’s check out the Drive Stats for our fleet of storage servers and see what we will be taught.
Storage Pods, Storage Servers, and Backblaze Vaults
Let’s begin with just a few definitions:
- Storage Server: A storage server is our generic title for a server from any producer which we use to retailer buyer information. We use storage servers from Backblaze, Dell, and Supermicro.
- Storage Pod: A Storage Pod is the title we gave to the storage servers Backblaze designed and had constructed for our information facilities. The primary Backblaze Storage Pod model was introduced in September 2009. Subsequent variations are 2.0, 3.0, 4.0, 4.5, 5.0, 6.0, and 6.1. All however 6.1 had been introduced publicly.
- Backblaze Vault: A Backblaze Vault is 20 storage servers grouped collectively for the aim of information storage. Uploaded information arrives at a given storage server inside a Backblaze Vault and is encoded into 20 elements with a given half being both an information blob or parity. Every of the 20 elements (shards) is then saved on one of many 20 storage servers.
As you assessment the charts and tables right here are some things to learn about Backblaze Vaults.
- There are at the moment six cohorts of storage servers in operation right now: Supermicro, Dell, Backblaze 3.0, Backblaze 5.0, Backblaze 6.0, and Backblaze 6.1.
- A given Vault will all the time be made up from one of many six cohorts of storage servers famous above. For instance, Vault 1016 is made up of 20 Backblaze 5.0 Storage Pods and Vault 1176 is fabricated from the 20 Supermicro servers.
- A given Vault is made up of storage servers that include the identical variety of drives as follows:
- Dell servers: 26 drives.
- Backblaze 3.0 and Backblaze 5.0 servers: 45 drives.
- Backblaze 6.0, Backblaze 6.1, and Supermicro servers: 60 drives.
- All the arduous drives in a Backblaze Vault will likely be logically the identical measurement; so, 16TB drives for instance.
Drive Stats by Backblaze Vault Cohort
With the background out of the way in which, let’s get began. As of the tip of Q3 2023, there have been a complete of 241 Backblaze Vaults divided into the six cohorts, as proven within the chart beneath. The chart consists of the server cohort, the variety of Vaults within the cohort, and the proportion that cohort is of the entire variety of Vaults.
Vaults consisting of Backblaze servers nonetheless comprise 68% of the vaults in use right now (shaded from orange to crimson), though that quantity is dropping as older Vaults are being changed with newer server fashions, sometimes the Supermicro techniques.
The desk beneath reveals the Drive Stats for the totally different Vault cohorts recognized above for Q3 2023.
The Avg Age (months) column is the common age of the drives, not the common age of the Vaults. The 2 could appear to be associated, that’s not fully the case. It’s true the Backblaze 3.0 Vaults had been deployed first adopted so as by the 5.0 and 6.0 Vaults, however that’s the place issues get messy. There was some overlap between the Dell and Backblaze 6.1 deployments because the Dell techniques had been deployed in our central Europe information heart, whereas the 6.1 Vaults continued to be deployed within the U.S. As well as, some migrations from the Backblaze 3.0 Vaults had been initially carried out to six.1 Vaults whereas we had been additionally deploying new drives within the Supermicro Vaults.
The AFR for every of the server variations doesn’t appear to observe any sample or correlation to the common age of the drives. This was surprising as a result of, typically, as drives pass about four years in age, they start to fail more often. This could imply that Vaults with older drives, particularly these with drives whose common age is over 4 years (48 months), ought to have a better failure charge. However, as we will see, the Backblaze 5.0 Vaults defy that expectation.
To see if we will decide what’s occurring, let’s develop on the earlier desk and dig into the totally different drive sizes which can be in every Vault cohort, as proven within the desk beneath.
Observations for Every Vault Cohort
- Backblaze 3.0: Clearly these Vaults have the oldest drives and, given their AFR is almost twice the common for the entire drives (1.53%), it might make sense emigrate off of those servers. After all the 6TB drives appear to be the exception, however in some unspecified time in the future they may more than likely “hit the wall” and begin failing.
- Backblaze 5.0: There are two Backblaze 5.0 drive sizes (4TB and 8TB) and the AFR for every is nicely beneath the common AFR for the entire drives (1.53%). The common age of the 2 drive sizes is almost seven years or extra. When in comparison with the Backblaze 6.0 Vaults, it might appear that migrating the 5.0 Vaults may wait, however there’s an operational consideration right here. The Backblaze 5.0 Vaults every include 45 drives, and from the attitude of information density per system, they need to be migrated to 60 drive servers sooner somewhat than later to optimize information heart rack area.
- Backblaze 6.0: These Vaults as a gaggle don’t appear to make any of the 5 totally different drive sizes comfortable. Solely the AFR of the 4TB drives (1.42%) is simply barely beneath the common AFR for the entire drives. The remainder of the drive teams are nicely above the common.
- Backblaze 6.1: The 6.1 servers are just like the 6.0 servers, however with an upgraded CPU and quicker NIC playing cards. Is that why their annualized failure charges are a lot decrease than the 6.0 techniques? Perhaps, however the drives within the 6.1 techniques are additionally a lot youthful, about half the age of these within the 6.0 techniques, so we don’t have the complete image but.
- Dell: The 14TB drives within the Dell Vaults appear to be an issue at a 5.46% AFR. A lot of that’s pushed by two explicit Dell vaults which have a excessive AFR, over 8% for Q3. This seems to be associated to their location within the information heart. All 40 of the Dell servers which make up these two Vaults had been relocated to the highest of 52U racks, and it seems that initially they didn’t like their new location. Current information signifies they’re doing a lot better, and we’ll publish that information quickly. We’ll must see what occurs over the following few quarters. That stated, should you take away these two Vaults from the Dell tally, the AFR is a decent 0.99% for the remaining Vaults.
- Supermicro: This server cohort is usually 16TB drives that are doing very nicely with an AFR of 0.62%. The one 14TB Vault is price our consideration with an AFR of 1.95%, and the 22TB Vault is simply too new to do any evaluation.
Drive Stats by Drive Dimension and Vault Cohort
One other manner to have a look at the information is to take the earlier desk and re-sort it by drive measurement. Earlier than we do that allow’s set up the AFR for the totally different drive sizes aggregated over all Vaults.
As we will see in Q3 the 6TB and 22TB Vaults had zero failures (AFR = 0%). Additionally, the 10TB Vault is certainly just one Vault, so there aren’t any different 10TB Vaults to check it to. Given this, for readability, we’ll take away the 6TB, 10TB, and 22TB Vaults from the following desk which compares how every drive measurement has fared in every of the six totally different Vault cohorts.
At the moment we’re migrating the 4TB drive Vaults to bigger Vaults, changing them with drives of 16TB and above. The migrations are carried out utilizing an in-house system which we’ll develop upon in a future publish. The particular order of migrations relies on failure charges and sturdiness of the present 4TB Vaults with an eye fixed in direction of eradicating the Backblaze 3.0 techniques first as they’re practically 10 years previous in some instances, and lots of the non-drive alternative elements are not obtainable. Whether or not we give away, destroy, or recycle the retired Backblaze 3.0 Storage Pods (sans drives) remains to be being debated.
For the 8TB drive Vaults, the Backblaze 5.0 Vaults are up first for migration when the time comes. Sure, their AFR is decrease then the Backblaze 6.0 Vaults, however keep in mind: the 5.0 Vaults are 45 drive models which aren’t as environment friendly storage density-wise versus the 60 drive techniques.
Talking of techniques with lower than 60 drives, the Dell servers are 26 drives. These 26 drives are in a 2U chassis versus a 4U chassis for the entire different servers. The Dell servers aren’t fairly as dense because the 60 drive models, however their 2U kind issue provides us some flexibility in filling racks, particularly if you add utility servers (1U or 2U) and networking gear to the combo. That’s one of many causes the 2 Dell Vaults we famous earlier had been moved to the highest of the 52U racks. FYI, these two Vaults maintain 14TB drives and are two of the 4 14TB Dell Vaults making up the 5.46% AFR. The AFR for the Dell Vaults with 12TB and 16TB drives is 0.76% and 0.92% respectively. As famous earlier, we count on the AFR for 14TB Dell Vaults to drop over the approaching months.
What Have We Discovered?
Our objective right now was to see what we will be taught in regards to the drive failure charges of the storage servers we use in our information facilities. All of our storage servers are grouped in operational techniques we name Backblaze Vaults. There are six totally different cohorts of storage servers with every vault being composed of the identical kind of storage server, therefore there are six kinds of vaults.
As we dug into information, we discovered that the totally different cohorts of Vaults had totally different annualized failure charges. What we didn’t discover was a correlation between the age of the drives used within the servers and the annualized failure charges of the totally different Vault cohorts. For instance, the Backblaze 5.0 Vaults have a a lot decrease AFR of 0.99% versus the Backblaze 6.0 Vault AFR at 2.14%—despite the fact that the drives within the 5.0 Vaults are practically twice as previous on common than the drives within the 6.0 Vaults.
This implies that whereas our preliminary foray into the annualized failure charges of the totally different Vault cohorts is an efficient first step, there’s extra to do right here.
The place Do We Go From Right here?
On the whole, the entire Vaults in a given cohort had been manufactured to the identical specs, used the identical elements, and had been assembled utilizing the identical processes. One apparent distinction is that totally different drive fashions are utilized in every Vault cohort. For instance, the 16TB vaults are composed of seven totally different drive fashions. Do some drive fashions work higher in a single Vault cohort versus one other? Over the following couple of quarters we’ll dig into the information and allow you to know what we discover. Hopefully it should add to our understanding of the annualized failures charges of the totally different Vault cohorts. Keep tuned.