

How SSD power faults scramble your data (2013) - daw___
http://www.zdnet.com/how-ssd-power-faults-scramble-your-data-7000011979/

======
oakwhiz
Luckily, there are filesystems like ZFS to [help] counter the myriad of
bizarre failure modes in today's storage devices...

~~~
baruch
ZFS would help to handle some of the failure types of SSDs. I doubt that it
can help in the case of a multiple disk failures due to a power failure.

As for the original report, if you rely on your storage devices to survive a
power outage you should be testing them strongly for this feature. It is a
feature that is easy to get wrong and when it fails on you it is quite nearly
the end of the world as this is a correlated failure of multiple devices at
the same time with no time for any rebuild operation.

I've seen different devices even of high end caliber that had flaws in their
power-hold circuitry. If you rely on it, test it. The same is true for HDDs
too. If you rely on your UPS to hold you for the power outage, be sure to test
that one too. And test all of these also as they age, the age affects the
ability to hold power in both battery and capacitors.

~~~
chongli
>ZFS would help to handle some of the failure types of SSDs. I doubt that it
can help in the case of a multiple disk failures due to a power failure.

It can if there's enough redundancy. If your pool is a stripe of mirrors, then
you can mitigate this effect by putting each half of a mirror on separate
power supplies.

~~~
rodgerd
With enough redundant devices any RAID scheme will survive a large number of
drive losses. ZFS is not magic.

~~~
chongli
Sure, but ZFS can survive intermittent drive power failure on bother halves of
a mirror. As long as one good copy of a block exists, ZFS will repair the
mirror by resilvering its counterpart. This is the advantage of checksumming
every block on disk and using copy-on-write.

~~~
sargun
Yeah, but as opposed to data corruption, your data will throw an error when
you try to read it. It's just a different failure mode. In production, SSDs
will often accept a write, take a sync/flush (acknowledge they've been written
to durable storage), and when reading them back, they will not return the same
data that was previously written out to them.

Mirroring is also part of ZFS, but it's not ZFS magic. My preference is the
Dynamo / Hadoop / layered approach in which the application is aware of
multiple underlying storage devices, and is written in a way that's defensible
to underlying device failure.

~~~
chongli
_Yeah, but as opposed to data corruption, your data will throw an error when
you try to read it._

No, the data will be repaired transparently and you won't notice a thing.
Errors will be logged, of course, but you'll know that the data is intact.

 _My preference is the Dynamo / Hadoop / layered approach in which the
application is aware of multiple underlying storage devices, and is written in
a way that's defensible to underlying device failure._

And ZFS should be the underlying file system upon which these applications are
built. If you don't have protection against silent data corruption at the
lowest level then you aren't going to have any idea when it happens.

~~~
sargun
Only if you mirror the data. If you only have one copy of the data, it wont
magically repair itself.

------
NatW
This study performed an automated cut off DC power directly to the drive --
which seems reasonable to me - and is more practical for testing, and I
personally like that it's testing the hard drive as a more "self-contained"
unit.

Power fault often occur at the AC power, however, and only later gets to the
DC power. This is impractical to test as there are too many
configurations/variables and it's much harder to test accurately, but it would
theoretically be interesting to see if AC power faults to a system power
supply had a different affect on the data than DC power faults.

------
raws
[http://hardware.slashdot.org/story/13/12/27/208249/power-
los...](http://hardware.slashdot.org/story/13/12/27/208249/power-loss-
protected-ssds-tested-only-intel-s3500-passes)

------
ilaksh
Isn't this the definition of FUD?
[http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt](http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt)

> Fear, uncertainty and doubt (FUD) is a tactic used in sales, marketing,
> public relations,[1][2] politics and propaganda.

>FUD is generally a strategic attempt to influence perception by disseminating
negative and dubious or false information. An individual firm, for example,
might use FUD to invite unfavorable opinions and speculation about a
competitor's product; to increase the general estimation of switching costs
among current customers; or to maintain leverage over a current business
partner who could potentially become a rival.

>The term originated to describe disinformation tactics in the computer
hardware industry but has since been used more broadly.[3][dubious – discuss]
FUD is a manifestation of the appeal to fear.

>About Robin Harris

>Robin Harris has been a computer buff for over 35 years and selling and
marketing data storage for over 30 years in companies large and small.

>Robin Harris is a president of TechnoQWAN, a consulting and analyst firm in
Sedona, Arizona. He also writes StorageMojo.com, a blog which accepts
advertising from companies in the storage industry, and has a 30 year history
with IT vendors. He has many industry contacts, many of whom are friends and
all of whom he has opinions about.

The poor performance of mechanical disks is the number one performance
bottleneck for most IT systems. I believe that FUD like this is the main
reason SSD costs are still relatively high.

~~~
JohnTHaller
The findings in this article match the findings of "Analysis of SSD
Reliability during power outages" that we discussed 2 weeks ago:

[https://news.ycombinator.com/item?id=6973179](https://news.ycombinator.com/item?id=6973179)

[http://lkcl.net/reports/ssd_analysis.html](http://lkcl.net/reports/ssd_analysis.html)

~~~
ilaksh
You didn't answer the question. Isn't this the definition of FUD? Maybe that
previous report was the same thing.

~~~
JohnTHaller
It's only FUD if it's manufactured. SSDs not being able to properly handle
power outages is a problem for many of us. And even if you have a batter
backup between your box and the mains, the UPS could still fail. It happened
to me (hard crashes of a server) a few times within a couple months at
RackSpace years back when they had issues in a wing of their main TX data
center.

