
Backblaze Hard Drive Stats Q1 2020 - gnabgib
https://www.backblaze.com/blog/backblaze-hard-drive-stats-q1-2020/
======
Paul-ish
I always love the Backblaze hard drive stats, even if I'm not in the market.

> During this quarter 4 (four) drive models, from 3 (three) manufacturers, had
> 0 (zero) drive failures. None of the Toshiba 4TB and Seagate 16TB drives
> failed in Q1, but both drives had less than 10,000 drive days during the
> quarter. As a consequence, the AFR can range widely from a small change in
> drive failures. For example, if just one Seagate 16TB drive had failed, the
> AFR would be 7.25% for the quarter. Similarly, the Toshiba 4TB drive AFR
> would be 4.05% with just one failure in the quarter.

Backblaze should consider adding interval estimates in addition to point
estimates. It might help the reader understand the uncertainty of the point
estimate.

~~~
tidywell
More important than interval estimates would be survival curves to know how
the drives fail as they age, which could give you a sense of when they should
be replaced.

------
UI_at_80x24
I look forward to these reports with the same zeal and enthusiasm as I used to
reserve for comic strips in Sunday newspapers.

This data is significant and appreciated. Thank-you so much, and please keep
up the good work.

~~~
sixothree
While I don't _quite_ share your enthusiasm. I sure do share your
appreciation. Nobody else at their level appears to be sharing this data. This
is not something they had any obligation to share with us.

It certainly is useful and interesting.

~~~
DizzyDoo
Every time I read through their update articles I'm reminded that Backblaze is
a service I could possibly buy! I only buy a hard drive once every few years,
but I enjoy seeing the data. The time it takes to write up and share what they
are no doubt tracking anyway is probably quite worthwhile, I'd guess this is
an effective piece of marketing. I'd be quite interested to see how many new
sign-ups they get each time they post these stats.

~~~
brianwski
Disclaimer: I work at Backblaze.

> I'd guess this is an effective piece of marketing.

Yes, it really has been good to us. :-)

It is data we would collect internally for our own decision making and
tracking even if we didn't release it, and it is only a small amount of work
for us (mainly for Andy the author of that blog post) to format it up every 3
months and write some observations down. Since it isn't what we "sell", it
would have just gone to waste being hidden. And it results in our name getting
"out there" and inevitably every quarter somebody asks "hey, what do these
people do that requires 130,000 hard drives anyway?" and then we get a little
bump in customers. Cost vs. benefit has been WELL worth it to us.

Along the way (unrelated to making money) it makes us happy that people get
some use out of it. Like we find out somebody does their PhD thesis on the raw
SMART numbers of the drives or something, we enjoy hearing about it.

Until my dying day I will never understand why Google Storage and Amazon S3
and Microsoft Azure don't release their drive failure statistics. I just don't
get it. Those companies have SO MANY DRIVES and they employ people with PhDs
in statistics, they MUST have the same info internally but at 100x the scale.
I don't get why they don't release the numbers?! But hey, if they want to
continue to give us this marketing gift of exclusivity, I'll take it. Maybe
they like us and are just doing us a solid.

~~~
RyJones
A long time ago, WANOPS put a couple racks of machines in the parking lot of
the (former) Building 11 on Microsoft Campus.

We had a lower failure rate across the board, for all components. I think the
only protection that was in place was some blue tarps and a chain link fence.

I can't find a public reference, sorry.

------
Scramblejams
Another Backblaze stats release, another time I wish someone on the inside at
Seagate would drop in and tell us what the deal is with their drives.

I have a lot of old Hitachis that still work, but every last one of my
Seagates died years ago. Yes, they've gotten better since then, but they're
still reliably trounced by Hitachi.

Where are the failures coming from? What did they cheap out on? Has the
C-suite decided it's not financially worth increasing reliability? What are
the internal feelings on being the outlier, year in and year out, in these
stats? Does anyone care?

~~~
aaronax
They will get sued if they say why. Hence they will never admit why.

------
greendave
Interesting data. Overall surprised at how few failures they saw.

From an interview last year, it sounds like they don't use SMR drives[1]. I
would be interested if there is any good source for failure rates of those.

[1] [https://www.backblaze.com/blog/how-backblaze-buys-hard-
drive...](https://www.backblaze.com/blog/how-backblaze-buys-hard-drives/)

~~~
atYevP
Yev from Backblaze here -> Yea, we've tested them but found they didn't play
nice so we aren't deploying them in droves. If you find a good place for data
on SMRs send it our way, would be fun to read up on it!

~~~
scottlamb
Did you try host-managed SMR? Seems more promising, although significantly
more effort.

~~~
atYevP
Not sure on that one actually. I think one of the main things for us was the
rewrite performance, since we're constantly moving/copying/deleting/changing
data we need that to be fairly performant - but not sure about the host-
managed drives!

------
paypalcust83
I recently bought 4x WD HGST WUH721414ALE6L4 512e 14TB. The 4Kn's were the
same price but on lengthy backorder/dropship from WD, so that wasn't going to
work. Also, I absolutely refuse to buy the WD Gold (WD141KRYZ) that is
effectively the same product but at a much higher price ($480 vs. $346).
Marketing people can take a long walk from a short pier.

~~~
kohtatsu
I'm (unfortunately) boycotting WD and their subsidiaries;
[https://news.ycombinator.com/item?id=22935563](https://news.ycombinator.com/item?id=22935563)

~~~
greggyb
Who are you buying from? Seagate also had undisclosed SMR, and I believe
Toshiba did as well.

~~~
fomine3
Seagate and Toshiba are sane compared to WD because they never adopt SMR on
NAS drives.

------
ksec
Still wondering what happen to those HDD roadmaps. HAMR and MAMR and those
40TB promised by 2023. And as far as I am aware even the 18TB and 20TB coming
in late 2020 / early 2021 are still CMR and SMR.

If it wasn't for Helium Sealed tech moving more platters into HDD, we would
have zero capacity improvement in the past 4-5 years.

~~~
StillBored
I'm expecting a resurgence of quantum bigfoot style drives.

So, while the idea was ridiculed in the press, the advent of SSD's and the
usecase for harddrives these days actually means it makes a lot of sense given
the area (and therefor the capacity) increases with the square. Combined with
the slower spin rates also increasing the density, and you get a device which
is far more generally useful for bulk storage than SMR.

~~~
dylan604
I miss my full-height Micropolis drives (said nobody, ever)

~~~
StillBored
If you consider a 2x area increase going from a 3.5->5.25, and another 4x
going from a 3.5" HH to a 5.25 FH, your talking a 8x capacity increase. That
is ballpark over 100TB in a 5.25" drive bay. 5 of those standing vertical
would fit in a 19" rack for ~600TB in the front panel. Of course a 3U is
exactly 5.25", so you would have to modify the original form factor a bit to
make it fit or use 4U and waste a half inch.

Bottom line, assuming 20T 3.5" drives, you have doubled the front capacity of
a 3U case.

OTOH, if you really wanted to play games consider how much capacity could be
stored on a 19" platter like the old IBM mainframe disk units.

~~~
bigiain
> or use 4U and waste a half inch.

... or use 4U so you can use the half inch clearance to try and help cool them
enough...

:-)

------
StillBored
I'm thankful for these reports as well, despite them being trailing edge/etc.
For a while they mirrored some problems we were having at work with a
particular vendor (and helped to justify switching products a year or two into
the nightmare).

What I really wish is that they would make an effort to go beyond just
reporting their experiences and see what they are doing as also providing a
service in the form of model reliability data. AKA toss a few pods of WD's/etc
in there even if they are slightly more expensive/whatever. If the data were
broken out by production location and manufacture date, its likely it would be
something that they could sell on the open market for a small fee. I know I
would have gotten the company I worked for to pay such a fee for a somewhat
scientific look at the failure rates of certain models/etc.

AKA, pay a bit more for a broader set of drives, summarize the data for free,
and then get people to pay for the detail data. If you work for a company
buying a thousand or so drives a year, avoiding a 10% AFR is going to be worth
a lot of money. AKA too small to have a good view of the state of things, but
big enough that buying 1k bad drives and fighting daily RAID rebuilds for the
next three years is real nightmare. Think of it as a bit of insurance, or at
least validation of a problem when things start to go south.

~~~
brianwski
Disclaimer: I work at Backblaze.

> AKA toss a few pods of WD's/etc in there even if they are slightly more
> expensive/whatever.

If you see "low numbers" of one particular drive model in the stats, that is
usually us trying out that drive model because at some point in the future the
price might drop making it worth purchasing it in bulk worth it, or the price
is ALREADY worth it but we're being careful in the rollout to make sure that
drive model performs in our particular application - nobody else's
application, just for us.

> make an effort to go beyond just reporting their experiences

We are VERY careful to report what we are seeing in our datacenter for our
particular application, and no more. This isn't a scientific study, we aren't
Consumer Reports, we're just publishing data we would collect whether or not
we released it. We have a core business we super happy focusing on, somebody
else is WELCOME to sell drive testing and drive predictions and we won't
compete with them or get in their way. Heck, we would totally subscribe to
that service!

------
tgsovlerkhgsel
When there were still stats being published for WD, they were always
significantly worse than HGST, even long after WD bought HGST.

I'm wondering why they keep maintaining two product lines that are
sufficiently different to result in one basically being consistently the best,
the other being consistently the worst, in terms of reliability.

------
jfjrbfbf
I actually knew the company from those reports. Migrated from S3 to b2, not
the same featureset, but good enough for me.

Over the same features I find b2 to be way simpler and cheaper.

Thanks

~~~
atYevP
Yev from Backblaze here -> Nice! Welcome aboard :)

Edit -> what are the feature's your missing most? We're collecting feedback
at: b2feedback@backblaze.com if you want to send some notes over!

~~~
atYevP
Edit -> My grammar was terrible in the above sentence. Yikes.

------
mikepurvis
It's surprising to me that they use actual boot drives and don't just boot
their machines off of USB sticks or via PXE or like those fancy Dell dual
redundant internal sdcard modules.

But perhaps the boot drives are just leftovers that are too small to be used
any more for storage?

~~~
atYevP
Yev here -> The boot drives are helpful for log storage/collection so the
extra capacity is nice - plus they're not "large" drives so the costs
associated are pretty minimal!

~~~
eximius
Is the output of what you'd normally write to these boot drives significant
enough to impact the network? I'm surprised you can't just stream it elsewhere
to consolidate the cost.

~~~
atYevP
Interesting question! I don't know but I'll ask the dev team.

*Edit -> asked around here's whats up -> we do currently stream some logs/data off the device but we also like it to be written to disk - there's something about having multiple copies that we like ;-)

~~~
eximius
I guess that's fair. If anything happens to the network, you don't want to
lose your logs in the mean time. :)

~~~
pnutjam
local logs also make troubleshooting much easier, especially if you have to
bring a system up offline for forensic analysis.

------
philjohn
Once again Seagate putting on a poor show reliability wise - although, with
only three companies really making hard drives these days I guess Backblaze
doesn't have much choice but to also include their drives in their service.

~~~
pkaye
Why doesn't Backblaze then switch to more of the models of the other
companies. It almost seems like the more of a particular model they have the
higher the associated failure rates.

~~~
simcop2387
My understanding is that though they're less reliable, it's cheaper to buy
more of them and expect some to fail instead.

~~~
brianwski
Disclaimer: I work at Backblaze.

> though they're less reliable, it's cheaper to buy more of them

Exactly. Each month our buyers go out and get bids for more drives. The cost
is input into a little spreadsheet, and the SPREADSHEET tells us which drive
to buy. It isn't about picking the most reliable drive, we have a software
layer and redundancy for that! It is about picking up the least expensive
drive.

The spreadsheet isn't complicated. If drive model X fails 1% less but costs 2%
more then we don't buy it that month. It might change the next month. If you
want to win our business, just look at the failure rates and under bid the
competitors by the failure rate of your drive.

There are some other things in the spreadsheet I should mention. If a drive is
twice the density (let's say a 16 TByte drive vs an 8 TByte drive) it is still
the same physical size and we pay for physical space rental in datacenters.
And another thing is that drives that are twice as dense uses approximately
the same amount of electricity, and power costs are an ENORMOUS amount of our
overall cost of operation. So the spreadsheet will choose a more dense drive
even if it is slightly more expensive per TByte just because of the other
savings it implies. Again, the spreadsheet isn't complicated, but the
spreadsheet tells us which drive to buy.

If you are only going to purchase one drive, and you aren't going to back it
up, you should sort by reliability of that drive. You are also insane ->
always backup your drives!! And as long as your drive is backed up and you
trust the backup, then who cares if the failure rate is 1% or 2%?

~~~
bigiain
> And as long as your drive is backed up and you trust the backup, then who
> cares if the failure rate is 1% or 2%?

I would choose to pay a premium to reduce the chance of me needing to restore
from backup as often. Not a big premium, but one that makes sense at the one
or two drive purchase scale that might make a whole lot less sense at a one or
two thousand drive purchase.

I'd pay an extra $20 or $30 on a ~$300 drive to drop from a 2% to 1% failure
rate... (As it turns out, for home I pretty much buy drives in pairs and
mirror them for all my not-in-a-laptop storage - so I guess I pay a 100%
premium for reliability...)

------
siscia
Little offtopic, but the new S3 integration works for you guys?

I tried to connect with s3cmd but it wasn't working. Unfortunately their
support was not very helpful neither...

~~~
atYevP
Yev here -> we're working on it! We've heard a lot from folks using s3cmd and
I believe we have some changes coming shortly. If you want to chat with the PM
team directly, we're grabbing feedback at: b2feedback@backblaze.com!

~~~
bigiain
Where should I keep an eye out to know when you've resolved this? Will you (or
Gleb) announce this on your blog? Or is there somewhere else I should look?

~~~
atYevP
Hey! Best thing to do would be to write to that email address and ask for an
update when there is one! We're trying to get to everyone there! :)

------
loeg
I love these. It'd be quite nice if they made some attempt to publish
confidence intervals.

~~~
andy4blaze
Andy at Backblaze here: Good to know. We used to do this with the lifetime
stats, but that got lost somewhere along the way. I'll look at getting them
back. Thanks.

~~~
tidywell
Can you publish the survival (Kaplan-Meier) curves? I think it's more useful
to analyze hard drive failure rate as a function of drive-age and/or lifetime-
hours and not how many hours the drive was used this year. The annual failure
rate of a drive is different between a new drive and a 10-year old drive.

------
cinquemb
Unrelated, but anyone use B2 in SEA? How is the dl speed? I'm on the market to
try to move away from cloudfront + s3 (I use cdn77 in front of cloudfront now,
but have like 10% cache misses in a key area everyday that would theoretically
hit b2)

~~~
cinquemb
For anyone that comes across this: they only have 4 data-centers, and the
first out out side of the US was in Europe asof EOM august 2019[0].

[0] [https://www.backblaze.com/blog/announcing-our-first-
european...](https://www.backblaze.com/blog/announcing-our-first-european-
data-center/)

------
martinald
I wonder how long it will be before backblaze moves to SSDs. Obviously a lot
more expensive right now, but with the lower power consumption/heat and
ability to cram way more in, might be sooner than we think?

~~~
andy4blaze
Andy at Backblaze here. We use SSDs in our core servers and more recently in
boot drives as they both need to speed. To store data in our case we don't
need the speed so its not worth it yet. Given the amount of data growth, most
predictions have HDDs still with about 50% of the storage market in 2025.

~~~
magicalhippo
Seems SSD has potential for higher density though, no?

Though cross-over point is still some time away, a decade or so?

~~~
jeffbee
In terms of volumetric density, flash is already much higher. A 4TB m.2 2280
SSD is ~70 times smaller than a 16TB 3.5" HDD.

~~~
magicalhippo
Yes, I was imprecise, I meant at roughly the same cost.

Just seems that the spinning rust guys have to jump through some pretty
extreme hoops to get density up, while the flash guys "just" have to reduce
cost, which gives them more angles of attack (ie either further density
increase or process optimization).

------
b3lvedere
I see Seagate is still tops the list of failures after all these years. I
wonder what we would see if we put all those stats together which company
actually comes out as the worst.

~~~
mleonhard
One company will always be worst. Using the worst drives often makes sense for
a business. It all depends on price, availability, and deployment plans.

Seagate's drive longevity has improved a lot. Compare today's report to
Backblaze's 2015Q1 report:

[https://www.backblaze.com/blog/hard-drive-
reliability-q1-201...](https://www.backblaze.com/blog/hard-drive-
reliability-q1-2015/)

------
Havoc
Is there a SSD version of this sort of report?

~~~
CamperBob2
And if so, would the TL,DR version of it still be, "Buy Intel?"

~~~
yjftsjthsd-h
For a while, Samsung was ahead.

