
Hard Drive Stats for Q2 2016 - ehPReth
https://www.backblaze.com/blog/hard-drive-failure-rates-q2-2016/
======
Veratyr
Something important that I don't think many people bring up is the cost of
these drives in a RAID, taking reliability into account.

When you're using this as the sole drive in a desktop machine or something
like that, paying the extra for a HGST is pretty straightforward, but I see
too many people passing over Seagate or WD when they intend to put them in a
RAID, for "reliability"'s sake, without (it seems to me) much thought into
whether it's really worth it.

Sorting by $/GB for Hitachi, Seagate and WD on drives >= 4TB
([http://pcpartpicker.com/products/internal-hard-
drive/#m=19,3...](http://pcpartpicker.com/products/internal-hard-
drive/#m=19,34,38&sort=a7&page=1&S=4000000,10000000)), you get:

\- 4TB WD: $129

\- 4TB Seagate: $114

\- 4TB Hitachi (HGST): $158

So in order to get the reliability of a HGST, you're paying a 38% premium, is
the extra reliability helpful in a RAID?

If you look at say a RAID6 array, aiming for usable storage of say 16TB,
HGST's longest running drive (at 0.11% failure rate) would have a roughly
0.000016% chance of failing in a given year (0.0011^3 * 6 * 5 * 4) (chance of
first drive failing = 6x(single), second=5x(single) etc.), assuming nobody
replaced a failed disk, since you need 3 drives to fail.

Now an array of Seagate's longest running drive (2.66% failure rate) of the
same size would have a chance of failure of ~0.225853% in a given year.

But, the cost of the disks in the Hitachi array would be 6x158 = $948 while
the price of the disks in the Seagate array would be 6x114 = $684. For the
price of the Hitachi disks you can get 8 Seagate disks.

So what happens if you just add the extra Seagate disks to the array as extra
parity? Now you need 5 drive failures, giving an equation that looks like ((8
* 7 * 6 * 5 * 4 * (0.0266^5)) * 100), and a chance of failure of 0.008949%,
still way higher than the Hitachi.

In the end buying Hitachi/HGST seems like the right choice anyway but I
thought it interesting since I hadn't seen anyone else look at things this
way.

If anyone has any problems with my math, please feel free to point it out, my
stats background is pretty limited.

~~~
haasn
I think a better way to look at it is not the chance of total data loss, but
the expected running cost over time. Even with just RAID6 it's hard to imagine
a scenario in which you would be unable to replace one of the failed drives
and resilver before getting two further failures, so the naive reliability
calculations are extremely biased.

So if we assume that we can safely operate at some fixed level of redundancy
(say 3 copies) across all hard drive vendors, the only question is (how many
drives you need to replace per year) × (the price of those drives).

~~~
aidenn0
RAID6 is unusable for many types of "always on" data, in particular databases
backing high volume sites; the performance drops so much during a rebuild that
for many uses it might as well be offline.

~~~
throwaway7767
Then you use mirrors, you can apply a similar calculation there. Lower price
per disk means you can put more disks in each mirror and compensate for lower
per-disk reliability.

Obviously the type of redundancy used will depend on your specific application
requirements, including performance.

------
JoshGlazebrook
Looks like Hitachi (HGST) is still leading in terms of reliability. I actually
have a 400GB Hitachi drive that came in my Dell Dimension 9100 a little over
eleven years ago. It has a few bad sectors but it's still chugging along just
fine.

Looks like it's this one:
[http://www.newegg.com/Product/Product.aspx?Item=N82E16822145...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822145067)

------
cm3
I used to shy away from Seagate drives, and even though the stats here are for
3.5", it doesn't look like WD drives are more reliable than the Seagate. I was
planning to get a 4TB 2.5" portable usb drive, and right now it's just Seagate
that offers one without scary features. WD has one but not the Elements, just
the MyPassport which includes the risky AES feature, and I don't want to
accidentally lock or encrypt the disk because of that and then need a Windows
tool.

Would anyone advise against getting a
[http://www.seagate.com/consumer/backup/expansion-
portable/?s...](http://www.seagate.com/consumer/backup/expansion-
portable/?sku=STEA4000400) drive?

------
slededit
I do wish they'd spend more time on their client. I choose them specifically
because of how they publish results like these, but when your install can get
corrupted like this [1], it doesn't inspire confidence.

1\. [http://imgur.com/a/3XmEL](http://imgur.com/a/3XmEL)

~~~
rickyc091
Yeah, their software was really buggy so I ended up changing to crashplan.
I've been using them for years without any regrets.

~~~
swinglock
The CrashPlan client is horrific but at least it runs on Linux unlike
Backblaze. I was hoping Backblaze was better, too bad.

~~~
click170
This really kills me about Backblaze. I'd love to support them, I'd prefer no
other backup vendor. That's why this is so hurtful:

[https://help.backblaze.com/hc/en-us/articles/217664628-Is-
Ba...](https://help.backblaze.com/hc/en-us/articles/217664628-Is-Backblaze-
going-to-offer-Linux-support-)

Spoiler alert, the answer is a surprisingly firm no. Not even an optimistic
"we're working on it!".

Doesn't even have to be "Linux support", just open an API, we'll do the rest!
Use quotas for API users if the concern is around abuse, but please help us
Linux folks use your service! _Please!_

~~~
brianwski
Brian from Backblaze here.

We support Linux now with our command line tool found here:
[https://www.backblaze.com/b2/docs/quick_command_line.html](https://www.backblaze.com/b2/docs/quick_command_line.html)
There is no GUI, but the command line tool can "sync" your files with an
arbitrary roll back time (the Mac/Windows client are hard coded to 30 days).

For good or bad, syncing to B2 charges "per GByte". If you have less than 1
TByte this will be cheaper than the $5/month for the Mac or Windows client.
But if you have more than 1 TByte on Linux, it will cost you about
$5/TByte/Month to back it up.

------
advisedwang
It looks to me like this analysis is ignore the age of drives. Specifically it
might be comparing the current failure rate of years old drives against the
failure rate of months old drives. This is likely to make newer disks look
much better.

I'd love to see some attempts at calculating mean time to failure (which is
admittedly difficult to estimate until you have had a disk long enough to see
a whole generation of them fail).

~~~
jxcl
In case anybody missed it, Backblaze publishes their raw data, so it should be
possible to do this analysis:

[https://www.backblaze.com/b2/hard-drive-test-
data.html](https://www.backblaze.com/b2/hard-drive-test-data.html)

------
tkinom
I love to see similar stat for SSD.

Maybe DigitalOcean can publish their SSD failure rates if any.

~~~
retrack
This study from Google is interesting already:

[http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f...](http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-
schroeder.pdf)

~~~
honkhonkpants
The problem with the Google information is their rig is full-custom
everything: silicon, firmware, and interface. It can't possibly influence your
buying decisions.

------
donatj
I love and look forward to these. I bought the drives for my NAS based on the
previous one.

------
asher_
I love these writeups from BackBlaze.

Does anyone know why some of the confidence intervals around the ARF are
asymmetric?

------
DKnoll
Can somebody tell Seagate what SSHD means.

~~~
kirian
This the the acronym Seagate uses for solid state hybrid drives -
[http://www.seagate.com/gb/en/solutions/solid-state-
hybrid/](http://www.seagate.com/gb/en/solutions/solid-state-hybrid/)

I don't think Backblaze uses any of these so likely they've just pulled a
stock photo of a hard drive from somewhere.

~~~
DKnoll
I know what they mean, but the initialism has another well known use so it's
annoying.

Not to mention that a device being solid state by definition precludes it from
being a hard disk. Logically it just makes no sense to me.

 _shakes fist_ get off my lawn!

~~~
jimminy
To clarify what dknoll is talking about:

'sshd' is the reference command to the 'ssh daemon'.

To dknoll, SSHD for the Solid State Hybrid Disk, means that it has components
of both drive systems, spinning platters and a functional amount of NAND for
feequently used applications.

You could use it purely for it's SSD assuming the drive controller continues
to function. Though that is a ridiculous proposal.

~~~
DKnoll
I understand what the device does and why they named it as they did. It's an
awkward use of language, that's all I'm pointing out. I think 'Solid State
Hybrid Disk' is a syntaxically incorrect name as it suggests the entire device
is solid state when it is not. Hybrid State Disk perhaps?

~~~
jimminy
I read it as "Solid State Hybrid" -Disk.

Which when regarded under the conventional drives at the time it was announced
being primarily HDD's it makes it more clear.

You have an HDD that is a hybrid with solid state components. Logically clean
cut, imo.

~~~
DKnoll
I'm being super pedantic here, but see how you said hybrid before solid state
in your third sentence when explaining it? That's basically what I just
suggested would be a clearer name.

~~~
jimminy
Likewise, I'm being pedantic.

Hybrid disk would be the generic term.

There has been work on liquid-state storage.

So Solid State becomes the adjective modifier for providing the classification
of the type of Hybrid Disk.

You could in theory have a Liquid State Hybrid Disk, or a Liquid Solid State
Hybrid Drive.

