
Hard Drive Reliability Review for 2015 - chmars
https://www.backblaze.com/blog/hard-drive-reliability-q4-2015/
======
crispyambulance
This is interesting...

"A relevant observation from our Operations team on the Seagate drives is that
they generally signal their impending failure via their SMART stats. Since we
monitor several SMART stats, we are often warned of trouble before a pending
failure and can take appropriate action. Drive failures from the other
manufacturers appear to be less predictable via SMART stats."

~10 years ago, I remember google research put out a highly cited paper wherein
they found that SMART stats were not a particularly strong indicator of
impending drive failure (50% of drives had no SMART indications of problem
before failure).
[http://research.google.com/pubs/pub32774.html](http://research.google.com/pubs/pub32774.html)

Has this now changed (at least for Seagate)?

Reliability/longevity is nice but a signal of impending failure is far more
valuable from an operations point of view.

~~~
budmang
From our experience, we've found 5 SMART stats that are useful in predicting
failure: [https://www.backblaze.com/blog/hard-drive-smart-
stats/](https://www.backblaze.com/blog/hard-drive-smart-stats/)

Many SMART stats aren't particularly useful in predicting failure as they
simply correlate to the age of the drive in some fashion.

Also, here is our data on every single SMART stat for all of the drives we
have: [https://www.backblaze.com/blog-smart-
stats-2014-8.html](https://www.backblaze.com/blog-smart-stats-2014-8.html)

Gleb (CEO, Backblaze)

~~~
samstave
Gleb,

First thanks for all your company's sharing of such data, as well as the pod
open platform. Kudos.

Second, can you do a Writeup specifically and only about SSDs?

Thanks

~~~
geostyx
As far as I know, they don't really use SSD's because of the higher cost/GB.
So they probably don't have much to say about them :/

~~~
ekianjo
Is there any good use case of having ssds in a data center, if you did not
care about cost?

~~~
hrrsn
They're used in DCs plenty when speed is required

~~~
jen729w
Yep, I work for a very large government department here in AU and we have a
tiny bit of SSD in our DC for the stuff that really needs it.

It's probably not 5% of our total storage, though.

------
roddux
HGST ( _or Hitachi Global Storage Technologies_ ) are again topping the charts
for drive reliability! They must be doing something right.

Also, the fact that backblaze are publishing most of their data online is very
cool.

~~~
currysausage
HGST, formerly Hitachi Global Storage Technologies, part of WD as of 2012. The
cool thing about them was that their consumer Deskstars were at least as
reliable as enterprise disks by other manufacturers. I still have 12-year-old
PCs here with HGST 80 and 160 GB drives that were subject to daily use and a
lot of inappropriate handling. The Deskstars don't mind.

Very unfortunately, HGST has apparently scaled back Deskstar sales and
development significantly since the acquisition. I guess it has to do with WD
selling off some of HGST's 3.5" assets to Toshiba in order to appease
competition authorities. See also
[https://news.ycombinator.com/item?id=10057519](https://news.ycombinator.com/item?id=10057519)

~~~
frik
Yet:

"In May 2012, WD divested to Toshiba assets that enabled Toshiba to
manufacture and sell 3.5-inch hard drives for the desktop and consumer
electronics markets to address the requirements of regulatory agencies."

[https://en.wikipedia.org/wiki/HGST](https://en.wikipedia.org/wiki/HGST)

~~~
currysausage
Added link to an older comment of mine that addresses the HGST/Toshiba thing.
To me, it looks like newer Toshiba 3.5" models are based on Fujitsu tech (if
the enclosure design is any indication). Also, Toshiba might abandon their HDD
business completely. [1]

[1] [http://seekingalpha.com/article/3827746-seagate-western-
digi...](http://seekingalpha.com/article/3827746-seagate-western-digital-soon-
true-duopoly-hdds)

~~~
rasz_pl
Fujitsu :( I worked at Fujitsu wholesale distributor around 1999, EVERY SINGLE
drive sold between 1999-2001 died within 3 years (PB15/PB16). Those were great
drives, cheap, silent, fast, and smelled great fresh from the factory due to
pine sap rosin.

Allegedly Cirrus Logic controller had a manufacturing defect and died due to
heat. Myself I always suspected that very peculiar and strong smelling rosin
flux. PCB was drenched in it, this type of flux is usually highly activated
and requires cleaning, otherwise acid will eat solder joints and copper away,
especially in humid and hot environments.

------
gradstudent
I don't really understand their methodology for computing failure rate. The
page says they calculate the rate on a per annum basis as:

([#drives][# failures]) / [operating time across all drives]

Wat? The numerator and denominator seem unrelated. What is being measured
here?

To me, it would make more sense to look at time to failure. Together with data
on the age of the drive and the proportion of failures each year one could
create an empirical distribution to characterise the likelihood of failure in
each year of service. That would give a concrete basis from which to compare
failure rates across different models.

~~~
snaily
Are you referring to the "(100*drive-failures)/(drive-hours/24/365)"? There's
no multiplication of total # of drives and # of failures in there.

It's all just a scaling: you have a number of broken drives in a corner of the
datacenter in the wire bucket that says "broke during 2015", you count them,
divide by total hours of that type of disk running (since they may have been
brought in commission at different points), and then scale it so you get it in
percent-per-year, not likelihood-per-hour.

It smells of someone explaining code, rather than illustrating an important
engineering formula, but there's nothing wrong with the rescaling calculation
per se.

~~~
gradstudent
> Are you referring to the "(100 _drive-failures) /(drive-hours/24/365)"?
> There's no multiplication of total # of drives and # of failures in there.

Perhaps the problem is the specific example given. 100 is the size of the
drive fleet and also the multiplier required to convert to percentages. Let's
assume you are right and the 100 in the equation is not #drives.

Even so, I find the approach questionable. If the point is calculate the
proportion of failures then that (overly simplistic) calculation is:

[#failures] / [ #drives] = 5 / 100 = 5% failure rate.

But this isn't what's calculated. Instead the author calculates the proportion
of drive-years per annum affected by failure. For the 100 drives in the
example the cumulative number of operational hours given in 2015 is 750K hours
(out of a possible 876K hours, had the drives been operating 100% of the
time).

That's a problem because 750 / 876 = 85.6% of total time.

5 / 85.6 = 5.84% "failure rate" which seems to me an overstatement.

The problem gets worse as the number of operational hours decrease. Imagine
for a moment the 100 drives only operated 50% of the time in 2015. We have:

100_ (5 / ((875K*0.5) / 875K)) = 10% "failure rate". This despite only 5% of
the drives having failed.

Wat?

------
SixSigma
A useful additional metric is the age of the drive at failure.

This would determine if the failure rate was constant for the life of the
drive (meaning random failure) or is it age related (infant mortality or old
age).

25 drives that fail after 1 week plus 25 that fail after 50 weeks is different
to 50 drives that fail one per week.

~~~
dsp1234
Luckily they open source their operational and smart status for all of their
drives[0]. This means that you can do this additional analysis (and more).
Which is awesome.

[0] - [https://www.backblaze.com/hard-drive-test-
data.html](https://www.backblaze.com/hard-drive-test-data.html)

~~~
SixSigma
Brilliant, thanks for letting me know. I'm studying Logistics Engineering and
reliability analysis is part of my degree. This will make a great case study
:)

------
tshannon
"...give or take a Petabyte or two"

As one does.

~~~
atYevP
Yev from Backblaze -> when you have 200 of 'em you lose count :P

------
akulbe
Color me skeptical. I bought into this, at first. After reading some other
stuff, not so much.

Like this, for example: [http://www.tweaktown.com/articles/6028/dispelling-
backblaze-...](http://www.tweaktown.com/articles/6028/dispelling-backblaze-s-
hdd-reliability-myth-the-real-story-covered/index.html)

~~~
gareim
I read the same thing a year ago and I came away actually upset at the
tweaktown article. Off the top of my head, I remember some of the complaints
being that the drives were subject to abnormal amounts of heat and that the
drives were consumer-level drives.

I remember a study Google did on harddrive reliability and it seemed to show
that heat had little to no effect on it. I also don't regard consumer-level as
being a bad thing. As a consumer, I kind of want to know which drives are
built for abuse better. All drives fail; which drives fail more and at what
cost?

~~~
ngoede
The tweaktown article did talk about temperature. I think you were right to
feel they were being silly with that. Temperature MAY correlate with failure
but Backblaze found it did not do so within the ranges they actually see in
their environment. Something about which it appears they would have more than
enough data to be able to compute.

~~~
hga
Google's study some time ago found that temperature either didn't correlate
with failures or, in the ranges they ran their machines, had an inverse
correlation with failure. It would appear to be one problem disk manufacturers
have largely surmounted.

------
Ezhik
Are Backblaze the company that bought all the hard drives in the Bay Area
during the 2011 crisis?

~~~
DanBC
Possibly - there's a blog post where they trawled consumer stores to buy
drives and drives in enclosures, and then removed the enclosures.

[https://www.backblaze.com/blog/farming-hard-
drives-2-years-a...](https://www.backblaze.com/blog/farming-hard-
drives-2-years-and-1m-later/)

And the discussion on HN:

[https://news.ycombinator.com/item?id=6801334](https://news.ycombinator.com/item?id=6801334)

[https://news.ycombinator.com/item?id=4631027](https://news.ycombinator.com/item?id=4631027)

~~~
bigiain
Not just the Bay Area either. They coined the phrase "hard drive shucking",
'cause they were buying consumer external usb drives, then digging the drive
out and thorwing away the enclosures.

That story is linked down near the end of the article:

[https://www.backblaze.com/blog/backblaze_drive_farming/](https://www.backblaze.com/blog/backblaze_drive_farming/)

------
slowhands
Good data, but I wish they would have rendered these tables using HTML. Not
fun typing these out myself to search.

~~~
brianwski
You can download the raw data from [https://www.backblaze.com/hard-drive-test-
data.html](https://www.backblaze.com/hard-drive-test-data.html)

------
cableshaft
I've head a lot of bad luck with Western Digital hard drives lately. Nice to
see some data back that up. I didn't know HGST existed, though.

~~~
sithadmin
HGST and its cousin HDS don't get nearly the recognition they deserve in the
North American Enterprise storage market. Their products have, in my
experience, always offered phenomenal value and rock-solid reliability at very
reasonable prices. HDS arrays in particular are pretty great at outperforming
'big name' storage vendors at far lower prices.

~~~
kijin
I think they changed brands too many times to keep their enterprise reputation
intact. Few people even remember that HGST used to be Hitachi which used to be
IBM.

On the other hand, those who do remember IBM hard drives probably remember
them as the Deathstar, so HGST might not want to be associated with their old
home so much.

------
gist
What would be really helpful is if they could simply put some amazon links on
this report to the drives with the best reliability according to their tests.

~~~
Someone1234
Then people would accuse them of shilling/being biased.

~~~
gist
People will always be sour and accuse you of things.

But actually I don't see how this makes them biased in any way. All drives
essentially sell for the same amount (and Amazon pays a percentage of that) so
if you trust the info as being accurate (and why wouldn't it be?) then how
could it biased then given there is such little lattitude in pricing?

And who is going to accuse them anyway? People who read HN? If so, so what?

The data presented is a nice shortcut answering the question of "which drive
should I buy" without having to read all of the charts and most importantly
think.

Lastly, you don't have to buy from amazon just because they give you a link
but it does make it easier to see a price and compare to whatever vendor you
might typically use (or provide several links to different vendors).

------
pkaye
I wish there were similar statistics publicly available for SSDs. From these
failure rates, hard drives don't look as reliable as one would imagine.

~~~
budmang
I actually am blown away by the reliability of hard drives. We've found that
after 4 years, nearly 80% of hard drives are still working, and the median
life is about 6 years: [https://www.backblaze.com/blog/how-long-do-disk-
drives-last/](https://www.backblaze.com/blog/how-long-do-disk-drives-last/)

Considering a 4 TB hard drive has to track 32,000,000,000 individual bits,
allowing reading and writing repeatedly of each one, on platters that are
spinning 120x per second, spaced a hair's width from their heads...I think
it's actually incredible.

As for SSDs, we keep wishing that we could switch to them, but they're still
10x more expensive on a $/TB basis. That may change in the next few years, and
if it does, we'll look forward to sharing data on SSD usage at scale as well.

Gleb (CEO, Backblaze)

~~~
pkaye
I guess my point about hard drives is most people never back them up and kind
of always expect them to hold up over 5-10 years. They have years of photos,
videos and documents stored on them. Then there are friends savvy enough to
setup a raid system and invariably the raid hardware fails before the drive
does and they can't get a replacement.

Thanks again for sharing the drive reliability statistics.

~~~
atYevP
Well those folks should have Backblaze ;-)

------
sandworm101
Is it possible for this data to ever be useful? Given the time necessary to
acquire the data, and the rate at which improvements are made to drives,
cannot we make the assumption that drives purchased today probably won't
operate in exactly the same manner as drives purchased a year ago?

I don't mean to insult, just to ponder the relevance of such long-term studies
on tech that changes so quickly.

~~~
baruch
When the data is consistent for several years you should already figure out
that Seagate is not going to improve so fast, when they do improve in
BackBlaze data you can start buying them again.

Large companies may buy Seagate due to the price advantage and the fact that
their storage systems can better handle the drive failure rate.

~~~
kirian
The Seagate drives do seem to be improving in reliability though. The higher
capacity Seagate drives which I presume are newer models have better failure
rate numbers than the lower capacity drives. The 4 and 6TB drives seem to have
reasonable failure rates compared to the other manufactures - only HGST is
better than Seagate for the 4TB and Seagate 6TB drive has a lower failure rate
than the HGST 8TB. FOr >4TB drives the Seagate 6TB has the lowest failure
rate.

6TB 1.89% 4TB 2.19/2.99% (depending on model) 3TB 5.1/28.34% (depending on
model) 2TB 10.1% 1.5TB 10.16%/23.86% (depending on model)

------
eps
It'd be interesting and quite helpful to see the failure rate vs. drive age,
per manufacturer.

For example, for less reliable manufacturers there might be a "if you get past
first N weeks, you are fine" pattern, or a failure cliff exaclty 1 week past
the warranty period, or something equally entertaining.

------
Quequau
Just in time for me.

I've got 5 Western Digital drives which have failed out of original purchase
of 6. Now I'm wondering if it's really worth it trying to go through the RMA
process (I need to figure out exactly how old they are and how long the
warranty is) or if I should just give up on Western Digital and go with a
different manufacturer... though I am not looking forward to spending that
amount of money all at once.

~~~
aembleton
The best performing brand seems to be HGST which is a subsidry of Western
Digital

------
jamesblonde
Great stuff. Does anybody have any stats for drives' Bit Error Rates (BER) /
maximum unrecoverable read errors (URE) / non-recoverable read error rates ?
By my understanding, manufacturer quoted BERs for commodity drives, often
10^14, tend to be 10^15 or higher in practice.

------
zanny
This information is super useful. I have an ST3000DM001 and only trust it
because its smart stats are still all in the green (and of course I have local
and cloud backups of anything important).

I've had it for four years now and there are no warnings of any kind yet, so I
guess I got one from a good batch.

~~~
m3rc
On the basis of buying a single personal hard drive this data is interesting
but wouldn't have much impact on your purchasing. As usual the advice is to
have multiple backups of everything

------
ck2
Always look forward to this report, thanks for sharing the data.

Amazing the 4TB hitachi with twice the platters of the 2TB fail less.

(and I will never buy seagate again for home pc or servers, even before this
report I could have told you they are unreliable)

~~~
dsr_
That's the wrong message to take away. The right message is: every
manufacturer goes through periods of good and bad disks. Don't depend on any
drive to be perfect.

The Seagate 3TB were awful, but their 4TB seem to be just fine.

~~~
voltagex_
I've had two Seagate drives "fail" recently after <2 months - the drive is
fine but the (OEM) USB3 enclosure is dead. No idea who manufactures those for
Seagate but I'm not impressed.

~~~
hinkley
Similar experience with WD. I bought a western digital USB drive. It failed, I
RMA'd it for a new one. It failed. A year later I bought a bigger one, and it
failed after another year.

I cracked open the enclosures and the drives are just fine. I still use them
for backups with no errors.

------
dghughes
Total 213,355 Terabytes or 213 Petabytes, that's quite a bit.

~~~
jmnicolas
Honestly not that much : to feel comfortable at home I would need 20 TB of
storage (of course it's only Linux ISO ;-). A bit more than 10 thousand people
like me and they would have to reorder more drives.

~~~
dghughes
At the datahoarder subreddit according to his flair one of the mods claims to
have 1.4PB.

------
mozumder
Would be more interesting to find out reliability figures for high-throughput
data-center models of hard drives instead of backup drive models, with low
access rates.

------
64bitbrain
Are there similar results or survey for SSDs?

------
pbreit
Sort of off-topic and apologies for the commercial nature, but can you really
get a 2 TB thumb drive for $17? Do they work?

[https://www.wish.com/search/2%20tb%20thumb#cid=5683434cce922...](https://www.wish.com/search/2%20tb%20thumb#cid=5683434cce922026a8eae933)

~~~
pipeep
It's a common scam to sell flash drives with modified firmware that causes it
to report a larger size than the underlying flash chips provide. Usually once
you write past that point, the writes will loop around or drop the data
entirely. It's likely that the $17 flag drive you linked to is a scam.

Beyond that, flash drives tend to have low write durability and horrible
performance on large writes (because of poorly implemented garbage
collection).

------
contingencies
I have 2 x ST40000DM and 1x ST40000VX in my desktop, plus one 4TB Seagate
'surveillance' drive as a USB luggable, though OSX to which it is currently
connected doesn't want to give me the specifics (neither right-click Info, nor
DiskUtil).

