
Endurance experiment kills six consumer SSDs over 18 months - geoffgasior
http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
======
ars
I'm impressed with how long they lasted.

I am NOT impressed with the behavior at end of life. In fact it makes it wary
of using any SSD.

A bad SSD should go read only, and stay that way. Not self-brick, and make the
data unrecoverable.

~~~
acqq
It sounds easy but it's not: imagine you're the software controling the SSD.
Before you do write you don't know if the write will succeed. Once you've
written, exactly the information that gives the pointers to the valid data
could be the one that is destroyed. Then the "raw" data you can access can be
"out of order" but that would still be better than nothing. I can imagine that
there would have to be some special "recovery mode" which would allow ythe
user to rescue the data blocks even with the uncertain order, in case you are
willing to piece some of them together. But almost certainly there aren't much
people willing to pay for that.

"Read only" after some fixed number of writes would be safer. But then the
complaints would be "why can't they just allow me to write as much as I can, I
have the backup somewhere anyway." Which is also a valid wish. So it would be
the best to be able if user could select the mode.

~~~
TheLoneWolfling
And what about the drive that went into read-only mode, and then on reboot
bricked itself? By design?

I can easily see someone having their computer lock up / etc, restarting, and
losing their data. As restarting is often one of the first debugging steps for
so many things.

~~~
acqq
You know, the drives have their own CPU and RAM, and execute their own
software. The software keeps the copy of some table needed to do the proper
_reads_ in RAM and tries to update it to the flash as the response of the
normal data write. The flash fails and the software gets the notification
about that. The software can provide the reads from the info in RAM as long as
the info remains in RAM so it enters the read-only mode. After the reboot
there are only the bad bits on the flash, the RAM content is fully wiped away
by the reboot.

~~~
TheLoneWolfling
So have a small area of flash that's only used to dump the RAM to at EOL. I,
for one, would much rather have a little bit less space available, with a
better chance of not bricking at EOL, than the current situation.

~~~
acqq
Looks like a good idea to me, the question is how small the area would be. I
think I've read somewhere that writing that data takes some 40 seconds on some
Samsung given the write speeds around 400 MB/s which is approx 16 GB reserved
for the feature of user being able to read the failed drive which notified the
user long before that the declared number of counts was already spent.
Somebody who actually has better (industry insider) info is welcome to correct
me.

------
passive
That title seriously misrepresents the findings. These drives lasted orders of
magnitude longer than they were rated for. For 99% of people on the internet,
you could write all your daily traffic to any one of these drives for the rest
of your life (assuming it stayed constant).

------
wmf
This headline is misleading. This experiment put something like 18 _years_
worth of wear or more on those SSDs.

~~~
shenanigoat
Headline is accurate. A cursory glance at the article gives you a clear
picture of what the drives were put through. I'm happy to know that SSDs can
handle as much r/w as they can.

~~~
Goronmon
I would argue while the headline is _technically_ accurate, it's easily
misunderstood.

------
S_A_P
really impressive showing from all drives involved. I would have liked to see
at least one of the drives fail to read only, but its nice to know that at the
rate I use SSD sectors I should be ok for a pretty good while. While there is
not great way to test this other than wait and see, I wonder if age AND total
writes has an appreciable difference vs. marathon testing like this.

------
ChuckMcM
Interesting report. What I got from this is that in production you should swap
out an SSD as soon as it starts reallocating sectors.

------
gtwy
Disappointed that the Crucial MX100 / MX200 wasn't included on this. We've
been using Crucial exclusively for just over 5 years with customer laptops. We
swap the hard drives out for Crucial SSDs when setting them up for the first
time. In 5 years, with average use, we've only ever had a single failure...
out of literally hundreds of laptops we setup per year.

------
rjbwork
I bought a 1TB Samsung 840 Pro for my new rig in December. Glad to see my
research paid off in making the best choice!

~~~
acqq
Apparently that 840 Pro from the article endured more than 2PB. However the
new 1TB 850 Pro is declared to endure just 300 TB written:

[http://www.samsung.com/global/business/semiconductor/minisit...](http://www.samsung.com/global/business/semiconductor/minisite/SSD/downloads/document/Samsung_SSD_850_PRO_Data_Sheet_rev_2_0.pdf)

~~~
MertsA
The 840 Pro that lasted more than 2 PBW was rated for 73TBW.

>With twice the endurance of the previous model _, the 850 PRO will keep
working as long as you do. Samsung 's V-NAND technology is built to handle 300
Terabytes Written (TBW)_* which equates to a 40 GB daily read/write workload
over a 10-year period. Plus, it comes with the industry's top-level ten-year
limited warranty. * 840 PRO: 73 TBW < 850 PRO: 150 TBW __850 PRO 120 /250 GB :
150 TBW, 500/1TB(1,024 GB) : 300 TBW

~~~
acqq
The drive has 400 MB/s on the interface. That means that it's possible to
deliver 34 TB per day to the disk and the rated 300 TB are achieved in less
than 9 days. But the scariest effect is the "write multiplication:" the
possibility to deliver the small chunks of the data through the 400 MB/s
interface which can result in much more data actually overwritten on the flash
(as the atomic size of the writes on the flash is quite big).

Of course an average consumer can't produce that. A lot of the data I write to
any medium at home is never overwritten. But not every use is the use of an
average consumer, knowing the actual limits is important.

~~~
sfilipov
That's why enterprise SSDs are rated at number of full writes per day rather
than GB per day. If you are going to write at 400 MB/s for 24 hours a day then
you need an enterprise SSD and not a consumer one.

~~~
acqq
So Samsung 850 PRO is not an enterprise SSD?

~~~
wmf
No, it's not:
[http://www.samsung.com/global/business/semiconductor/minisit...](http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/index.html)

(Also, drive writes per day and GB per day are the same metric.)

------
mrmondo
Very impressive - I've been following the story since the start with quite a
vested interest - I'm currently building mixed tier SSD only SANs, one tier
consisting of high end PCIe NVMe SSDs, the other consisting of Sandisk Extreme
Pro III 'consumer - available' drives.

------
CaseFlatline
I vote for wierd al to sing the song at the end

[http://techreport.com/review/27909/the-ssd-endurance-
experim...](http://techreport.com/review/27909/the-ssd-endurance-experiment-
theyre-all-dead/4)

------
helper
Inspired by some of their earlier articles, we did a similar test in-house on
an ssd model we were planning on deploying into production. The goal was to
make sure we knew exactly what to look for when our ssds were on their last
legs. We actually bought a smaller capacity version of the same model so the
test wouldn't take as long.

I feel a lot better about that production hardware now that I have seen first
hand what SMART reports as the drive is running out of reallocation sectors.

------
chisleu
I would love to see how this stacks up to enterprise SSDs. It is my
understanding that they will last years under constant full-bus-speed
rewriting.

~~~
trhway
>I would love to see how this stacks up to enterprise SSDs.

the difference is number of writes an individual flash cell can take. Consumer
- multi-level cell (MLC) - 1000-3000 writes (thus 240GB disks start to fail
around 600-700TB written) where is enterprise SSD flash - single-level cell
(SLC) - up to 100000 writes.

~~~
chisleu
I'm talking eMLC. I don't see many SLCs on the market anymore.

------
PaulHoule
840 pro rocks!

------
digi_owl
Not sure if this would be a problem, if one could easily check the status of a
drive without having to grab specialized tools.

------
ocdtrekkie
The great experiment is finally over.

Very proud to exclusively use 840 Pro SSDs in my towers.

------
twotwotwo
2.4 PB of writes to the 840 Pro is pretty nice.

------
higherpurpose
Disappointing he didn't test the new Samsung 850 SSD. The old 840 is known to
not have such a high endurance so I'm automatically excluding it from my list
of choices.

~~~
AceJohnny2
Did you even read the article? The 840 Pro was the top performer, reliability-
wise.

~~~
sp332
Maybe they got lucky. They only tested one of each of these drives, and there
was already a case (with the Kingston drives) where nearly identical models
got wildly different lifetimes.

 _The low number of reallocated sectors suggests that the NAND deserves much
of the credit. Like all semiconductors, flash memory chips produced by the
same process—and even cut from the same wafer—can have slightly different
characteristics. Just like some CPUs are particularly comfortable at higher
clock speeds and voltages, some NAND is especially resistant to write-induced
wear.

The second HyperX got lucky, in other words._

So it's definitely possible that the 840 Pro has reliability problems even
though one of them won this test.

~~~
sevensor
Thank you for this! Yours is the most astute comment I've seen here. As
someone who's worked in flash memory manufacturing, I know how much variation
is inherent in the process. It gets evened out somewhat by binning, but
there's inevitable variation in dielectric thickness, aspect ratios, corner
profiles, dopant diffusion, grain structure of polysilicon, and on and on,
even discounting the density of particulate defects and watermarks.

Even if you don't know that, basic stats should tell you that testing a single
unit from each manufacturer is going to be meaningless at best, misleading at
worst.

~~~
vacri
_testing a single unit from each manufacturer_

Is less expensive than testing a dozen drives of each model, and better than
testing no drives at all. It's not meaningless at all; it just has to be read
with the right caveats.

