
Analysis of SSD Reliability during power outages - yapcguy
http://lkcl.net/reports/ssd_analysis.html
======
JohnTHaller
The author of this study submitted it to Slashdot with the following summary:
"After the reports on SSD reliability and after experiencing a costly 50%
failure rate on over 200 remote-deployed OCZ Vertex SSDs, a degree of paranoia
set in where I work. I was asked to carry out SSD analysis with some very
specific criteria: budget below £100, size greater than 16Gbytes and Power-
loss protection mandatory. This was almost an impossible task: after months of
searching the shortlist was very short indeed. There was only one drive that
survived the torturing: the Intel S3500. After more than 6,500 power-cycles
over several days of heavy sustained random writes, not a single byte of data
was lost. Crucial M4: failed. Toshiba THNSNH060GCS: failed. Innodisk 3MP SATA
Slim: failed. OCZ: failed hard. Only the end-of-lifed Intel 320 and its newer
replacement, the S3500, survived unscathed. The conclusion: if you care about
data even when power could be unreliable, only buy Intel SSDs."

Source: [http://hardware.slashdot.org/story/13/12/27/208249/power-
los...](http://hardware.slashdot.org/story/13/12/27/208249/power-loss-
protected-ssds-tested-only-intel-s3500-passes)

~~~
colechristensen
Did he do any control studies with spinning disks?

I have my doubts that many of them would survive thousands of power cycles per
day.

~~~
AmVess
There's probably not much of anything that would survive that many power
cycles in a day.

The test is pretty pointless.

~~~
rlpb
What if I have thousands of disks deployed, and I suffer a single power loss?
What are the chances that at least one disk will contain corrupt data after
such an event?

Instead of buying thousands of disks to test this scenario, one reasonable
shortcut might be to repeatedly test a single disk.

Even a spinning disk can handle a few thousand power cycles in its lifetime,
surely?

~~~
AmVess
On power up, there's a pretty big spike in power usage as the controller gets
its shit together. Normally, this is not an issue, since the SSD housing
simply absorbs the heat and then goes about its business. The same holds true
for spindle drives.

What this guy did was test the drives in a manner they weren't designed for.
Sure, I can drop Corvettes off a 20 story building, then bitch about the
results, but that wouldn't change the fact that my test was flawed from the
onset.

All he did was subject something to an environment is wasn't designed for.

Sure, I can drop a Corvette from a 20 story building, but there's nothing to
be gained when the crumple zone is packed into the tail lights.

~~~
zAy0LfpBZLC8mAC
Hu? Where should that spike in power usage come from?

~~~
rational_indian
It comes from the decoupling/filter capacitors used in the DC power circuit.
When the power is turned on after a sufficiently long time these capacitors
are all uncharged and appear as "shorts" to the power source thus drawing
large amounts of current. This initial surge current drops off as the
capacitors get charged.

~~~
zAy0LfpBZLC8mAC
Filter capacitors producing excessive heat due to inrush current? That would
be bad filter capacitors indeed ;-)

The claim was that an SSD somehow converted more electric energy into heat
immediately after power-up which would damage the SSD, so real consumption,
not just a current peak that goes into storage for later consumption. Normal-
ESR electrolytics might have a heat problem when used at a few kHz in
switching applications, but certainly not at 0.1 Hz.

~~~
CamperBob2
It's long been standard practice with tantalum filter capacitors to feed them
through an inductor or at least a resistor to prevent inrush current failures.
That, and/or you derate the crap out of them when you design the board. Newer
drives are probably using multilayer ceramics that can put up with just about
any abuse including inrush.

Executive summary: powerup stress is not an issue unless the drive was
designed by a moron.

------
strlen
What I find rather annoying is that manufacturers/marketers rarely state
whether or not an SSD has a supercapacitor or not. Here have been my finding:

\- OCZ Vertex 4 SSD does (but this is not advertised)

\- OCZ Vertex 3 does not, Vertex 3 Pro does

\- Ditto for Vertex 2 / Vertex 2 Pro

\- OCZ Deneva(2) R does, other OCZ Deneva/Deneva 2 do not -- and in the case
of Deneva 2 R this is advertized

\- Intel 320 does (but is only sata II), but this is not advertised at all

\- Intel 520/530 does not

\- Intel 330/335: unclear

\- Intel S3700/S3500: does and this is advertised

Background: I was looking for an SSD to use for a ZFS intent log ("ZIL" \--
ZFS's write ahead log) -- my requirements were a sandforce controller (or
equivalent) and toggle NAND (so that I could use the same disk for both ZFS
cache and the ZIL), and a supercap. This was surprisingly hard to find.

What I'd like to see is:

1) Data on which SSDs have supercap

2) Data on which SSDs actually honour cache flush request (then a UPS +
forcing a cache flush request upon power failure + redundant power
supplies/multiple replicas in a distributed system) would suffice.

3) Best yet: have an API for for check whether or not an SSD has a supercap,
whether or not the supercap still holds charge, and policy for honouring cache
flush requests. Let the OS decide based on a policy I set.

If you are a consumer, the practical recommendation is not very different from
the practical recommendation I'd give to anyone using spinning disks:
RAID1/1+0/RAIDZ(-2) (with SSDs coming from different patches so that they do
not wear out at the same time), UPS (for power outages), backups (against
yourself and against power supply failures).

For production: obviously use a UPS, put the WAL of your database on an SSD
with a supercap, make sure that your database fsync()'s the WAL at a
reasonable interval (on every transaction is probably unreasonable, but so is
once an hour), _and_ use a distributed system that replicates the WAL. If
using a distributed system with (semi-)synchronous WAL replication is not an
option and losing incremental data is not acceptable, use redundant power
supplies.

~~~
AnthonyMouse
> If using a distributed system with (semi-)synchronous WAL replication is not
> an option and losing incremental data is not acceptable, use redundant power
> supplies.

Also be sure to plug each redundant power supply into a separate UPS, and
avoid equipment that has redundant power supplies on a single feed. UPS
failures are a bear.

~~~
nexox
And also make sure no UPS is ever loaded over ~40%, so when one of the pair
fails the other can handle the full load. And always set them up in isolated
pairs, with no equipment plugged into a pair member and a non-member, to
prevent cascading failures.

------
Mithaldu
Terrible title, terrible science, terrible conclusion.

Spoiler: He only tested these five drives, only Intel survived, so if they are
your candidates, apply his conclusion:

    
    
        Crucial M4
        Toshiba THNSNH060GCS 60gb
        Innodisk 3MP Sata Slim
        OCZ Vertex 32gb
        Intel 320 and S3500
    

Notably missing is the Intel runner-up Samsung and probably others i'm not
aware of, as well as other models.

~~~
jbri
The author mentions that he only evaluated drives that have some form of power
loss protection - doing some quick searching around, I couldn't find any
Samsung drives in the given price range that claimed to have that.

Did I miss an appropriate Samsung drive in my quick searches? Or is there
reason to suspect that a Samsung drive that doesn't claim to have power loss
protection would nonetheless handle this case better than the non-Intel drives
that did make that claim? Because if not, then I don't think not evaluating
Samsung drives compromises the results in any way.

~~~
vegardx
The Samsung SM843T has power loss protection, among other things, and is
priced really competitively to the Intel S3500.

------
rlpb
Why must SSDs have "power loss protection" in the form of a battery or
supercapacitors to finish writes? Can they not simply cache writes but not
falsely claim to the OS that they aren't fully written yet, and do some
internal housekeeping (journal-like) for recovery? Does SATA/SCSI support the
concept of "sync", or would allowing the disk to fall behind without deceiving
the OS kill performance in some way?

~~~
gvb
Inside the SSD are flash memory chips. It takes a relatively long time
(typically 10mSec) to tunnel the charge that changes a '1' bit (the erased
state) to a '0' bit. If the power goes away during that programming time, the
result in indeterminate. Not just bad, but bad in a potentially unknown way.

The worst case, which I've experienced with direct writing to flash chips, is
that a totally unexpected flash location is corrupted. My guess was that the
CPU started the write cycle just as power was lost, the CPU glitched the
address lines as it was losing its brains, and the flash corrupted a random
location.

Very, very bad.

If a power loss causes the flash to scribble on the wrong SSD location (e.g.
the tables that keep track of good and bad blocks), the SSD "dies".

~~~
zAy0LfpBZLC8mAC
That's still no reason why you would need "power loss protection", in the
sense of energy storage. What is needed is proper brown-out detection and
properly set up reset circuitry so that the write gets aborted before lines
start glitching (that is to say: energy needs to be dumped so it can not cause
any damage once the CPU starts losing its brains).

That the storage location where a write was in progress is indeterminate
afterwards shouldn't matter - between the time that some software initiates a
write() and the time an fsync() on the same file returns, there is no
guarantee what the written location will contain after a power failure, and if
your software relies on the value in any way whatsoever, your software is
broken.

~~~
gvb
Yes there is. The amount of hold up time is specified in the flash
manufacturer's data sheets.

The problem is that the flash has an internal state machine that performs the
charge tunneling as an iterative process: it tunnels some charge, checks the
level on the floating gate, and repeats as necessary. If the power to the
flash chip goes away or glitches during this internal programming process, the
flash write fails in indeterminate and sometimes very unexpected ways.

~~~
zAy0LfpBZLC8mAC
Well, yeah, of course, you need some limit on the speed at which the power
supply voltage drops, I was talking about the "flushing the cache" kind of
"power loss protection", not claiming that pulling a circuit into the reset
state could happen without latency ;-)

So, yes, you of course have to have some low pass in the power supply rail to
make sure that power drops no faster than you can handle shutting down the
circuit in an orderly fashion - all I am saying is that there is no need to
guarantee that a read of a region where a yet-unacknowledged write was
happening when the power supply failed returns non-random data, so it is
perfectly fine to interrupt the programming process and leave cells where user
data is stored in an indeterminate state. It's not OK to glitch address lines
while programming is still going on, of course :-)

~~~
nexox
The "super capacitors" (almost nobody uses actual super capacitors after early
models discovered that super capacitor lifetime at server temperature was
inadequate) are just a low-pass filter - they usually only keep the drive
online for a couple dozen milliseconds after main power goes down.

Most reasonable SSDs do not write cache at all, but thanks to the wear-
leveling issues, they need to have a sector-mapping table to keep track of
where each sector actually lives. That table takes many, many updates, and
since it's usually stored in some form of a tree, it's expensive to save to
media which does not support directly overwriting data (IE NAND, which
requires a relatively long erase operation to become writable.) This table is
typically what is lost during power events, and it is not usually written out
when you sync a write.

So what happens is you write, sync, get an ack, lose power, reboot, and
magically that sync'd data is either corrupt, or, even worse, it's regained
the value it had before your last write, with no indication that there is a
problem. This can cause some extremely interesting bugs.

------
PhantomGremlin
In general, Intel is a "class act".

A few years ago my employer worked with Intel on a joint IC project (not
Flash). My overall impression was that Intel engineers were meticulous and
smart. This internal culture probably applies to many different Intel
divisions. So I'm not surprised that Intel SSDs are reliable.

~~~
zerohp
I had an old Intel G1 SSD die during a power outage that went a little too
long for my cheap UPS. Of course one event in uncontrolled conditions isn't
meaningful. I still bought another Intel SSD because every Intel device I've
owned was top notch.

------
colin_mccabe
It is good to see someone actually testing the power-loss protection claims
made by manufacturers.

However, uninterruptible power supplies are usually a better investment than
power-loss resilient storage media. The problem is, even if your SSD or hard
drive behaves perfectly during a power-loss scenario, your server software may
not. Almost every database, filesystem, etc. includes some amount of buffering
in memory, because sending every write directly to disk is a performance
killer.

Also, the best-case scenario with power-loss resilient media is that your
system shuts down cleanly. With a UPS, you can keep the system up until diesel
generators kick in, a much better endgame for everybody.

I once asked someone who had worked in the hard drive business what a hard
drive would do when power was lost. "Try to park the drive head immediately
before it crashes on to the platter," was the immediate response. Trying to
flush the cache contents wasn't even remotely on his mind. In practice, losing
power while writing to a hard disk does often corrupt sectors-- even sectors
that weren't being written to during the power loss incident.

It's good to see that (some) SSDs are at least trying to flush the cache, but
you really have to ask yourself: can you really trust the manufacturer's
claims? And if you can trust them, can you trust your specific software
configuration under this unusual scenario? I think it's just too long a
frontier to guard with too few sheriffs. Dude, you're getting a UPS.

~~~
miahi
The problem of SSDs, as others have mentioned here, is that writing during
power loss can have more important effects: it can overwrite firmware bits or
mapping bits. In the first case the whole SSD is dead, in the second case,
much more data than the one currently written is lost.

~~~
colin_mccabe
My impression was that even cheap SSDs should have ultracapacitors or small
batteries that allow them to survive a power loss event without being bricked.
Of course, the stuff in the cache is lost at that point, but that's no worse
than the situation with a hard drive. Also, as I mentioned, "much more data
than the one currently written" can be lost when power fails in a hard drive.
So the situation is really no different, unless the manufacturer screwed up.

~~~
nexox
Spinning disks have enough rotational momentum to keep spinning (which keeps
the heads floating) for long enough to park the heads via a weak spring, with
zero electricity. A head crash doesn't corrupt a few sectors so much as cause
catastrphic damage - that disk would likely never read another sector again.
Properly-functionioning spinning disks haven't had issues with random data
loss on power failure for at least a decade now.

And your impression of cheap SSDs is dead, flat wrong. They're cheap - every
unnecessary part is left off to save money. And we've all (all of us who pay
attention) known for years that SSDs (even some with power fail protection)
will lose data (even bits which it has reported to have sync'd) on power loss.

A UPS is not enough, if you need to have your data, you need multiple layers
of backup, and an SSD must have some method of writing out voltatile data
(mostly internal metadata, not cache) before it shuts down.

~~~
colin_mccabe
_Properly-functionioning spinning disks haven 't had issues with random data
loss on power failure for at least a decade now._

Source?

 _And your impression of cheap SSDs is dead, flat wrong. They 're cheap -
every unnecessary part is left off to save money. And we've all (all of us who
pay attention) known for years that SSDs (even some with power fail
protection) will lose data (even bits which it has reported to have sync'd) on
power loss._

I think you misread what I wrote. I wrote that I would expect cheap SSDs to
"survive a power loss event without being bricked." I did not write that they
would retain all data, which seems to be what you are arguing against.

I have heard rumors that some cheap SSDs do not honor the SATA SYNC command.
Unfortunately I do not have a reliable source for this theory, do you?

 _A UPS is not enough, if you need to have your data, you need multiple layers
of backup, and an SSD must have some method of writing out voltatile data
(mostly internal metadata, not cache) before it shuts down._

I don't think anyone is arguing that a UPS is a replacement for backups.

------
CamperBob2
It's long been conventional wisdom that you'd have to be crazy to buy anything
but Intel when it comes to SSDs. This study isn't too surprising, in that
regard.

~~~
joenathan
I've never had any issues with Samsung or Kingston SSDs. The 840 EVO series
and Pro series are the best performance for the money you can get.

~~~
userbinator
Kingston used to rebrand Intel SSDs (same controller, just a smaller amouht of
flash) for the "value market", so not surprising they were good.

------
herf
The Crucial M4 had no power-loss protection. The newer model (M500) does and
would be a more interesting test.

------
maerF0x0
As many have pointed out: There are many applications where power cycling is
extremely rare and also not a big deal.

1\. Laptops have a built in UPS incase they're unplugged 2\. Servers should
have UPS incase they're unplugged or small outtages 3\. Desktop drives
shouldnt be trusted as the only copy. Though I imagine the data corruption
would propagate to backups?

~~~
mike_esspe
Laptops are vulnerable to this problem, if you don't know about it.

On my notebook I've lost a partition with Crucial M4. OS hanged, I did a hard
reset and after reboot discovered data loss.

~~~
userbinator
I got an Intel SSD (G1 80GB) in my laptop to do OS development (drivers),
which incidentally is a use case that requires _lots_ of hard resets. Never
had any problems with it, and it's been through dozens if not hundreds of hard
reset cycles. In fact I got the SSD specifically because I didn't want to be
spinning up/down a disk that much, and laptops tend not to have reset buttons
(they really need one, IMHO.)

~~~
emn13
Those tend to be very safe hard resets from a drive perspective. First of all,
you're not losing power, so even though there's a reset, the drive firmware
maintained power. Secondly, I'm guessing you see far more hard locks and
manual resets than random, sudden reboots - and if that's the case, then the
drive firmware probably didn't even notice. By the time you press reset, an
eternity has passed and any ongoing activities have long finished.

I can imagine a software-fault causing drive-level problems if the drive has a
large cache and a broken fsync, or if the bios does some kind of unsafe hard
drive reset _very_ quickly after starting.

In any case, it's probably more likely to be file-system level reliability
you'd need in the face of driver instability.

------
_mikz
Would be nice to see this test for Samsung drivers which should be closer to
Intel's quality.

------
arh68
Where are all the dates in this article? When were these drives
purchased/manufactured?

Some (all?) of the 320 drives (pre-2012) had a bug that basically bricked the
drive after power loss. See more on google at '8mb bug' and this Intel thread
[1]. The existence of this bug, the well-known reliability reputation of
Intel, and the sheer size of this sampling number (N=500?) make the
distinction in time important. Were all these drives more recent, or is the
Intel failure rate, even with buggy firmware, still ~.5% ?

> _However, given that deployment of over 500 Intel 320 SSDs has been carried
> out and only 3 failures observed over several years, it would be reasonable
> to conclude that Intel S3500s could be trusted long-term as well_

[1]
[https://communities.intel.com/message/133499](https://communities.intel.com/message/133499)

~~~
nexox
That bug was not as prominent as you make it sound - I was unable to reproduce
the issue (or any other issue) across tens of thousands of reboots on the
buggy firmware (running Linux and a quality SAS HBA.) The circumstances to
produce a corruption were much more rare than simply "power loss," and many
users with "safe" platforms could easily expect 0.5% AFR.

Even with "risky" OS and controller combinations, there was an element of
probability involved, so most (probably almost all) power loss events would
not hit the bug.

Plus the SSD320 was difficult to obtain back then, and reasonable operators
upgraded to the firmware version with this bug fixed, so only a small
percentage of the units were ever even vulnerable.

------
slyall
I really hate when people are sloppy with notation.

20GB = Twenty Gigabytes

NOT

20gb = Twenty gram bits

and:

20MB/s = Twenty Megabytes per second

NOT

20mbytes/sec = Twenty milli-bytes per second ( If you got you B's and b's
correct about you would need to write out "bytes" )

~~~
illicium
If you want to be really pedantic, differentiate SI giga/megabytes as GB/MB
and "binary" gibi/mebibytes as GiB/MiB

~~~
optimiz3
This is actually a major annoyance in some fields - Xilinx for example likes
to use GB to represent 1024^3 for storage on FPGAs while HDD manufacturers
like to use 1000^3. IMHO [ZYEPGMK]?iB is the way to go to end this non-sense.

~~~
6cxs2hd6
I'm old enough to remember what MB and GB meant before sleazy marketers
started to redefine them. They should have been sued for deceptive
advertising. Instead the computer press of the time was spineless, because
guess who paid for the ads.

So I refuse to be a wuss and use MiB and GiB.

(Also, get off my lawn.)

~~~
dragonwriter
Since mega- and giga- haf well-established meanings as prefixes to measures,
and the original common usages of MB and GB were inconsistent with those
meanings, I prefer MiB and GiB for those uses, _even though_ it took marketers
using the correct versions for devious reasons to get terms popularized that
distinguished the base-2 prefixes from the close-but-not-the-same base-10
prefixes.

------
ghshephard
I read through the report a couple times, but couldn't figure out how many
drives he had in each test batch. I'm a little concerned that he just took a
single drive, and hoped it would effectively represent the entire model.

Clearly there will be some variability - and he may have gotten a good, or bad
drive - and a larger population of SSDs might behave quite differently in
terms of reliability.

------
mandeepj
My brand new crucial M4 ssd stopped working after few days. To be precise, the
system would show the drive for about 2 minutes after each reboot then after
that the drive would not show up anywhere at all. Whether you try in windows
explorer or device manager or any where else. I contacted their support team
but they never replied.

~~~
sitkack
Crucial reportedly has better support than that. My M4 would crash after 21
minutes (exactly-ish) over a year continuous service (100+GB/day), the latest
firmware fixed the problem and increased speeds. The USB firmware updater was
a god send compared to the vile vile hoops I had to jump through to upgrade my
OCZ drive.

I currently have an Samsung 840 1TB and has been rock solid, it replaced a
250GB intel 320. Afaik Samsung is the only manufacturer that owns the whole
supply chain, flash + controller.

~~~
kalleboo
Toshiba also do both NAND and controllers.

------
MatthewElvey
I have a Viking VRFS21100GBCNS which supposedly features Super Capacitor power
failure protection. Manufacturer PDF -
[https://docs.google.com/file/d/0B9JXvW974L4JUkxJMGJIUjNtRDg/...](https://docs.google.com/file/d/0B9JXvW974L4JUkxJMGJIUjNtRDg/edit?usp=sharing)

------
rythie
I've found that the Intel 320 drives are much than the Samsung 830/840,
Crucial M500 when setup for sync writes on Linux (as a NFS server) - with the
Intel drives being about 4x faster than the others in my testing.

------
ricardobeat
Would be interesting to know how Samsung SSDs, very popular now and used in
Apple's line, fare on that test.

