
A Study of SSD Reliability in Large Scale Enterprise Storage Deployments - zdw
https://www.usenix.org/conference/fast20/presentation/maneas
======
PhantomGremlin
I didn't read the paper, just the slides. The study, which looked at
enterprise drives, found them to be quite reliable:

 _§ The average ARR across the entire population is 0.22%, but rates vary
widely (0.07 - 1.2%)!_

ARR: annual replacement rate.

My anecdotal experience with SSD drives is that they are quite prone to
catastrophic failures. But that is for consumer quality, not enterprise.

~~~
emmelaich
Predictable catastrophic failures or other?

There are stats you can get from the SSD which will indicate the end of life.

~~~
jplayer01
I worked as a technician for a while recently.

> There are stats you can get from the SSD which will indicate the end of
> life.

I've seen too many consumer SSDs which failed completely without warning from
one day to the next with no chance of data recovery. At least if an HDD
_starts_ to fail, there are signs that you can act on (files not
opening/saving, Windows behaving weird) and you usually have plenty of time
before it actually happens that you can start data recovery and get a decent
amount (or even most) of data out of it. And even once it fails, data recovery
is possible. SSDs are just dead black boxes. It's why when I upgraded a
customer to an SSD, I always put a big emphasis on making backups onto an
external drive or a cloud service or, well, anything.

Out of all the failed SSDs I've seen, only a single one managed to put itself
into an RO mode for data to be recovered. And they often give the weirdest
stats. Like, some of them will report everything is okay even though you're
getting garbage or nothing out of it and can't write to it.

When they work, they're great, but the failure states are just obscene.

~~~
baruch
I've worked with plenty of HDDs and SSDs (software engineer for enterprise
storage products, working on the hw-sw integration), HDDs and SSDs have pretty
rapid failures. Sometimes you get several days or weeks of low-frequency
errors or high latencies but more often than not you'll get an almost one-
instant failure.

Most of the HDDs I worked with were 7200RPM enterprise drives (SATA & SAS) and
the SSDs were enterprise quality too (SAS & NVMe).

------
Mave83
we from croit.io use and suggest SSDs on a daily basis in our Ceph based
Software Defined Storage clusters. So far, SSD is much better than HDD due to
good indicators like wear level. Even if a drive fails, any enterprise class
storage should be able to heal itself.

I personally don't know a person that would go back to HDD after having SSD in
operation. Far superior, much more productivity,..

------
mkj
Anyone got a guess which manufacturer is which?

