
SSDs: A gift and a curse - mirceasoaica
https://laur.ie/blog/2015/06/ssds-a-gift-and-a-curse/
======
mrb
Of course SSD firmware is buggy. You know why? Because any half-decent
electrical and computer engineering team can slap a NAND flash controller and
some flash chips on a PCB, take the controller's manufacturer's reference
firmware implementation, tweak the dozens of knobs provided by the reference
implementation (ignore FLUSH commands, change the amount of reserved sectors,
disable this, enable that, etc), change the device ID strings to "Company
Foobar SSD 2000", and release the product on the market, with minimal QA and
testing. And that's exactly what happened in the last 5-7 years with dozens
and dozens of companies around the world designing SSDs.

But with traditional HDDs, the amount of engineering and domain-specific
knowledge to manufacture and assemble the platters, moving heads, casing, etc,
is such that there are only a handful of companies around the world who can do
this (Seagate, WDC, Hitachi, etc). They have decades of experience doing that,
so the firmware part of HDDs happen to be very robust as these companies have
seen everything that can and will go wrong in an HDD.

So it boils down to this: which would you trust more, an HDD firmware code
base that is 20 years old, or an SSD firmware code base that is 4 years old?

Combine this with the fact SSD firmware is much more complex (a flash
translation layer must minimize write amplification, do wear leveling, spread
writes on multiple chips, etc), and you are guaranteed that many SSDs on the
market are going to be very buggy.

~~~
zokier
That is a nice theory, except for the fact that firmware bugs have hit big
players equally badly, if not even worse. Intel and Samsung come easily to
mind, and Sandforce was notorious for not getting their own firmware to work
(for their own controllers).

~~~
scott_karana
Since when has Intel been hit equally?

~~~
zokier
The "8MB bug"
([https://communities.intel.com/thread/24205](https://communities.intel.com/thread/24205))
was the most notorious, but I remember also some bricked Intel SSDs too.

~~~
drzaiusapelord
Ignoring that, Intel is rock solid. The larger issue isn't incompetents making
SSDs, but SSDs being sold soley on performance. Intel has always been a step
or two behind the performance game but manages to deliver some really reliable
SSD's. I have yet to see a consumer Intel fail, which is incredible as that
kind of reliability, outside of the super expensive PCIe cards is unheard of.

I think we're a few more years of shaking out this industry and leaving behind
all the growing pains. No idea who will be left standing, but I suspect Intel
will probably be around.

~~~
pja
I managed to brick an Intel 320 last week. Going to see if they’ll replace it
under warranty - those devices were upgraded to a 5 years so it should still
be covered.

(Apparently the drive _really_ didn’t like being issued with an 'hdparm
--security-set-pass X /dev/sdb' command. I was intending to do a secure wipe,
for which the password setting is a pre-requisite, but the drive never came
back from setting the password. After that it just returned junk data to any
command, including 'hdparm -I /dev/sdb' & failed the power-on BIOS SMART
test.)

------
yellowapple
Parts of this sound more like hardware RAID controller issues than SSD issues,
which is why I typically avoid hardware RAID in production environments unless
there's a specific reason for it. RAID controllers tend to be buggy pieces of
shit, usually implementing some RAID method that's more-or-less proprietary
and unique even between different RAID cards from the same vendor (meaning
that if your RAID controller fails, you might as well kiss your data goodbye,
since the replacement - 9 times out of 10 - won't be able to make sense of its
predecessor's RAID setup).

Also, RAID6 is a bad idea, almost as bad as RAID5. There have been numerous
studies and reports [0] indicating that both are _very_ susceptible to subtle
bit errors ("cosmic rays"), and this is made even worse when SSDs are
involved. If you need absolute data integrity, RAID1 is your only option; if
you need a balance between integrity, performance, and capacity, go with
RAID10, which is still leaps-and-bounds better than RAID5/6.

[0]:
[http://www.miracleas.com/BAARF/Why_RAID5_is_bad_news.pdf](http://www.miracleas.com/BAARF/Why_RAID5_is_bad_news.pdf)

~~~
JoeAltmaier
To expand: in my 20-year experience, no small-computer RAID 5/6 controller I
have had experience with has ever saved any data, ever. All failures are hard
and non-recoverable. Which begs the question: why use them at all? I mirror
only.

~~~
Laforet
RAIDs are in place to for your guaranteed uptime in the SLA. With the current
storage density RAID5/6 is probably more fragile than single disks because
UREs are very likely during a rebuild. Nonetheless having a degraded array is
probably better than having an offline system and it will buy you some time to
migrate. Mirrors are ideal, but it is hard to justify the upfront cost.

~~~
JoeAltmaier
For small home machines, the storage cost is minimal. All my home machines are
mirrored, for around $100 each. Its saved me several times.

~~~
cerberusss
I've never understood this. At home, you don't need uptime, thus you're much
better off with a real offline backup instead of a mirror.

~~~
yellowapple
You don't _need_ uptime, but it's nice to have. Repairing a degraded mirror
takes a lot less time than restoring from a backup in the vast majority of
cases.

------
Rafert
I am surprised they haven't mentioned Crucial SSDs. With cheap drivers like
the MX100 having features as power loss protection and Opal 2.0 support, I
preferred these over the slightly faster Samsung products at the time.

~~~
rosser
Turns out, the Crucial drives' "power loss protection" doesn't actually
preserve data "in-flight" at the moment of power loss. It just prevents data
"at rest" from being corrupted.

This appears to apply to _all_ the consumer-grade Crucial drives.

See [1], money quote:

"In the MX100 review, I was still under the impression that there was full
power-loss protection in the drive, but my impression was wrong. The client-
level implementation only guarantees that data-at-rest is protected, meaning
that any in-flight data will be lost, including the user data in the DRAM
buffer. In other words the M500, M550 and MX100 do not have power-loss
protection -- what they have is circuitry that protects against corruption of
existing data in the case of a power-loss."

[1]
[http://www.anandtech.com/show/8528/micron-m600-128gb-256gb-1...](http://www.anandtech.com/show/8528/micron-m600-128gb-256gb-1tb-
ssd-review-nda-placeholder)

~~~
userbinator
That is true of ordinary HDDs too. Until data is written to the flash or
platter itself, it hasn't been actually written.

People don't expect files they haven't saved to be written and magically come
back later if the power goes out, but they _do_ expect that their drives will
power back up with all the data that was last written intact.

~~~
rosser
Yes, that's true.

However, Crucial _explicitly_ marketed their SSDs as having, quote, "power
loss protection", pointing out the array of capacitors on the drive's PCB, and
strongly implying that this feature included in-flight writes surviving a
power loss (i.e., that the caps had enough capacity to enable flushing the
DRAM buffer to the NAND media, much like the non Sandforce-based Intel SSDs —
e.g., the 320 series from a few years back, or the current DC-3500 and -3700
drives).

That, it turns out, _isn 't_ true.

And that's a problem, because I and many others bought these drives on the
basis of Crucial's implying they were power-loss durable. When enough people
started reporting to Crucial support that their drives didn't, in point of
actual fact, offer this feature, their marketing literature changed so as not
to imply they did, and forum posts and reviews, like my previously-linked
Anand Tech article, started pointing out that they didn't.

------
rdl
The new flash DIMMs (which thus bypass PCIe bridges and ATA layer, since they
plug directly into the memory controller) are really interesting. Not a
commodity yet, but seems like a case where simpler -> better -> cheaper.

------
ColinDabritz
What a wonderfully informative article. I really appreciated all the specific
scenarios and cases.

------
justinsb
I am looking forward to the day when all SSDs ship minimal firmware, and
offload all the complex work to (main-CPU) software.

~~~
AYBABTME
Wouldn't that make performance a lot worse, for both CPU and disks? Offloading
the job to something closer to the flash chips and fully dedicated to
servicing them sounds like a better (and the current) path. Same for high
speed network links.

~~~
rwmj
This is appropriate: [http://www.catb.org/jargon/html/W/wheel-of-
reincarnation.htm...](http://www.catb.org/jargon/html/W/wheel-of-
reincarnation.html)

~~~
AYBABTME
Always have that in the back of my mind =]. However I don't think that's the
current trend right now. Except where special hardware is integrated to the
chip, but the work isn't left to the CPU itself.

I'm not so versed in the area so I'm just blind guessing here.

------
insaneirish
Moral of the story:

* Don't use hardware RAID controllers. * Don't buy hardware from people who are going to change SKUs out from under you, or worse, change what's actually delivered for a given SKU.

~~~
willejs
Totally agree with you. I was just about to write the exact same thing. I've
had problems with dell hardware, hardware raid controllers, and hundreds of
ssds. The most stable systems I've worked on use lsi sas controllers, perhaps
software raid, and Intel/crucial ssds in fault tolerant distributed systems.
Therefore, I also question if it's etsy's tendency to use non distributed
datastores, with less effective or complex fault tolerance makes ssd failure a
bigger deal? Failure is inevitable, and should be expected at every level.

------
mavhc
All these drives were on hardware RAID cards it seems, is it feasible to do
without them?

~~~
Freaky
Preferable, I'd say. The last thing you want between your OS and your disks is
a buggy unknowable black box you can only talk to with some crappy binary blob
management tools.

------
ksec
Am I missing something? Most of the issues seems to be Hardware Raid related.

------
Lagged2Death
_On the upside, [the ridiculously expensive HP SSDs] do have fancy detailed
stats (like wear levelling) exposed via the controller and ILO, and none have
failed yet almost 3 years on (in fact, they’re all showing 99% health). You
get what you pay for, luckily._

Call me a huge cynic if you must, but given the other problems observed, I
think there's a _really_ simple explanation for perfectly uniform "99% health"
after _three years of service_ that doesn't involve "you get what you pay
for."

~~~
wtallis
A SSD can be 85% of the way through its lifespan and still be operating with
the same performance and reliability expectations it had when it was almost
new, so the drive can reasonably be said to still be healthy. It's only when
it has to start retiring bad blocks and expending the spare area that the
drive is operating in a degraded mode, and it's not until you hit that point
that the drive can start making accurate predictions about how much more use
it can take until it dies.

~~~
baruch
Not quite.

An SSD at 85% wearout will take longer to program and erase the blocks and it
will require more refreshes (data retention suffers) so there will be
considerably more background operations.

In fact SSD vendors do quite a bit to make the disk perform its best early on
when they absolutely can and then start to break down further down the road
after the benchmark is over. After all the users are mostly only testing it
for a month or two and they never get to a real wearout condition during the
test and then they buy in bulk and run it for a few years but by that time the
SSD model is already outdated anyway (about 18 month cycle for SSD models).

------
deelowe
Isn't running raid1/5/6 on ssds silly b/c they'll all die at the same time?
And hardware raid on top of that? Why?

SSDs have a fairly consistent failure curve (exusing firmware bugs and other
random events) for a given model, so they'll wear evenly in a raid setup. This
means they'll all die at the same time as writes/reads are distributed fairly
evenly across the disks. Given the size of today's drives, you may not
complete a rebuild before losing another disk.

Has this been proven to not be true within the past few years? I don't run
redundant raid on ssds. It's either raid0 or jbod.

~~~
GeorgeBeech
Not necessarily. We've been running Intel SSDs in productions at Stack
Exchange for 4+ years, and just recently had our first 2.5" drive die.

That said, most of the drives in this article are consumer drives. The problem
with consumer drives is that they don't have capacitors. And since your writes
are cached by the drive before they go to the NAND, if you lose power all of
your drives will be corrupted in the exact same way at the exact same time.

If you don't care about the data, go ahead and use them. If you do pay the
extra for Enterprise drives. They really aren't _that_ much more expensive
these days.

~~~
deelowe
Interesting. Do you have more info on what you mean by "Not necessarily?" From
what I've seen during reliability studies on SSDs, they have a fairly tight
failure curve. This is very dissimilar from hard disks where there's much more
variance from drive to drive.

I'm genuinely interested.

~~~
GeorgeBeech
Nothing that would pass deep scrutiny. Just our experience running SSDs in
almost every server at Stack. We've only had one mass failure of drives. That
was when 5/8 Samsung drives died around the same time in our packet capture
box. The remaining 3 are still alive, although we don't really use them.

We have only had two Intel drives die on us. I'm interested (well
academically, not professionally) if they will die at the same time or keep
dropping off one at a time.

We tend to retire the machines or the drives in them before they fail.

------
AHHspiders
I bought two consumer intel.. 80gb? Ssds. They both lasted less than a year
before they stopped posting or allowing a system boot on separate devices.

The 850 pro is ok, but it's slowed down a lot lately. Might be an OS thing,
which i doubt.

All in all i keep a redundant backup on old school hdds too since the failure
rate of SSDs isn't so great in my experiences so far.

Anyone try one of the newer M2's yet? Or i think i mean the pci-E types?

------
abecedarius
So there are bugs in drive firmware. How about security bugs? Should we expect
quality drives to have had a security audit?

~~~
cmdrfred
If you are doing your security at the hard drive layer, haven't you already
lost?

~~~
abecedarius
So another reason for whole-disk encryption is to defend against your own
drive firmware? I guess that may make sense nowadays, though it hadn't
occurred to me.

