
An SSD Endurance Experiment: They're All Dead (2015) - philngo
http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
======
martey
> " _Intel doesn 't have confidence in the drive at that point, so the 335
> Series is designed to shift into read-only mode and then to brick itself
> when the power is cycled._"

I don't understand why Intel wouldn't just configure these drives to go into
read-only mode permanently. If I realized my hard drive had become read-only
and didn't suspect hard drive failure, my first inclination would be to reboot
my computer, not immediately back up all data.

~~~
cuchulain
The article is wrong on this point, and on Intel's intentions, as far as I can
tell. Intel has a "Supernova" feature ([http://itpeernetwork.intel.com/data-
integrity-in-solid-state...](http://itpeernetwork.intel.com/data-integrity-in-
solid-state-drives-what-supernovas-mean-to-you/)) which will cause some drive
models to brick themselves if certain conditions are met - errors in the
control path, for example, which basically mean you cannot trust the drive at
all. The supernova feature is only claimed for enterprise drives, and the 335
series is not an enterprise drive.

I have a lot of experience with long-running Intel SSDs of various models,
including pushing them to the same kinds of extreme that the SSD endurance
experiment did, and I have never observed them to self-brick simply because
they reached their flash endurance point.

What I have observed is a number of firmware bugs (or possibly just the
supernova feature) that caused the drive to brick on power cycle, even for
drives in perfect health.

I liked the SSD endurance articles, because they went a long way to allaying
fears about SSDs, but I think it's a shame they've left this point in.

~~~
rsmoz
> I have a lot of experience with long-running Intel SSDs of various models

Hey, could I get your help selecting an Intel SSD model? Overwhelmed by the
number of SKUs.

~~~
cuchulain
Sure. Can you see the email in my profile?

------
cjensen
Continuous Integration systems can really burn through SSD endurance. If you
have a large, compiled code base which rebuilds on every checkin, you will be
creating and deleting object code constantly. Use smartmontools or HDD
Guardian to keep an eye on endurance.

Our code base creates around half a gig of compilation product on every build.
We used up the endurance on a consumer-level Micron SSD in about a year. No
data loss occurred.

~~~
peller
If it's not too big, could you just use tmpfs for compilation, and copy the
final results to persistent storage if needed?

Just food for thought; your point remains valid.

~~~
toomuchtodo
DevOps here: This is the solution. Its also stupid fast if everything is done
in RAM (just ensure you're not swapping out to disk).

~~~
cgag
People keep saying this but compiling seems super CPU bound so im confused.

~~~
nitrogen
Disk seeks when lopking up the next file to compile used to be a huge
bottleneck.

~~~
hvidgaard
At least for C#, it is entirely CPU bound. Faster CPUs and more cores will
increase the IO, but not enough to matter anything.

But the point about SSD endurance still stands, a ramdisk solves that problem.

------
tonyplee
Just wondering how it is compare to HDD: (Here's my calculation base on some
assumptions, feel free to correct it if you see any errors.)

2.5PB = 2500TB = 2,500,000 GB

2,500,000 GB / (80MB /s typical HDD Speed ) = 31,250,000 seconds = 8680 Hours
= 361 days.

It will take HDD 361 days to write 2.5PB at 80MB/s.

I wonder how many HDD can survive 361 days of 80MB/s non stop?

~~~
RandomBK
> I wonder how many HDD can survive 361 days of 80MB/s non stop?

I wonder how many _consumer_ HDDs can survive that load. I would be shocked if
datacenter-grade drives fail after only 361 days of continuous load.

~~~
userbinator
80MB/s sequential reads or writes is probably something consumer HDDs can
survive for several years. The platters are always spinning, the only
difference is that now the drive is continuously reading or writing what's
under the head. It's the random accesses (and associated seeks) which stresses
them.

There are various comparisons out there which conclude "datacenter-grade" is
largely a marketing/warranty thing; the drives themselves may be nearly
identical in design.

------
WalterBright
I modified some programs of mine that generate a lot of files to read the old
version of the file first, compare it with the new version in the buffer, and
only write out the new file if it is actually different. This cuts way down on
the write cycles to the SSD. It's faster, too!

------
xenadu02
Why do the SSDs all brick themselves when this happens? It seems like a huge
mis-feature; HDDs are almost never recoverable when they fail but if you can't
reallocate blocks on flash just go into read-only mode.

~~~
wtallis
In principle, going into read-only mode should work and it should take a while
for read disturb errors to corrupt the data. But there's a trade-off that if
you're trying to keep servicing writes as long as possible (and retiring bad
blocks as they wear out), the risk rises that an earlier-than-expected
unrecoverable error will corrupt the critical data structures that keep track
of the mapping between logical and physical addresses. Playing it safe means
quitting early and thus giving your drive an endurance rating that suggests it
is _less_ reliable than the competition.

And it's no surprise that the aspects of SSD firmware that by nature get the
least real-world testing and are the most tricky to design would be quite
buggy in practice. Even ZFS doesn't try to avoid catastrophic data loss in the
face of unreliable RAM.

------
gerdesj
It's a fun article but it would have been outrageously good if say 30 examples
of each SSD sourced "randomly" had been tested. OK a bit expensive.

"Over the past 18 months, we've watched modern SSDs easily write far more data
than most consumers will ever need."

They tested six SSDs and got "...far more data than most consumers..." \-
that's the takeaway.

~~~
digikata
Would you be willing to pay for a subscription to a quarterly report for that
info?

~~~
reitanqild
For me?

Kind of yes.

Only I have subscription overload. Every newspaper and their dog wants to sell
me _subscriptions_ but I generally don't read newspapers daily.

I'd love to have access to this data through some spotify-for-text service or
Blendle or something though.

I guess I'm not alone in wanting both to pay researchers, bloggers,
journalists etc etc, but based on what I read, not based on a monthly
subscribtion to every company that I ever want to read something from?

~~~
digikata
I keep my recurring subscriptions to a minimum too, so I understand. But,
funding the procurement of statistically significant numbers of multiple
models of SSD drives, running them through to end-of-life characterization,
and keeping that all updated as new models come out is a higher spending
profile than your typical blogger. It seems more like a business research
report or recurring lab test type of service.

------
helper
I loved this series. It inspired us to do similar experiments with SSDs as we
were spec'ing out new servers. I highly recommend doing this so you get a feel
for what SMART looks like for your specific SSDs. Its nice to be able to
monitor that to have some idea when your SSDs are going to die, especially if
most of your drives are aging together.

------
userbinator
If you divide the data written at the point where reallocated sectors _start_
appearing by the size, you can figure out the actual average endurance of the
flash. That results in:

    
    
        400 Samsung 840 Series
        2344 Samsung 840 Pro
        2400 Kingston HyperX 3K
        2800 Intel 335 Series
        4400 Corsair Neutron GTX

------
paullth
its nice when a tech article ends with a song

------
aurizon
WTF, This a repost of a 2 year old dead article

