
When Solid State Drives Are Not That Solid - Shipow
https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/
======
ploxiln
Originally TRIM was an un-queued command; all writes had to be flushed, then
TRIM executed, then writes could continue. This was bad for performance with
automatic on-file-delete trim, so everyone wanted a trim command that could be
put in the command queue along with writes. Many new drives have this.

It turns out that Samsung 8XX SSDs advertise they support queued trim but it's
buggy. The old TRIM command works fine.

[https://lkml.org/lkml/2015/6/10/642](https://lkml.org/lkml/2015/6/10/642)

There are in fact lots of "quirks lists" and "blacklists" in the kernel and
virtually all computers require some workarounds in the linux kernel for some
buggy hardware they have. Pretty amazing when you think about it.

EDIT: another closely related example is macbook pro SSDs and NCQ aka native
command queuing. They claim they support it, but on many it's buggy. It gets
better though; the linux kernel just starting trying to use such functionality
by default relatively recently.

[https://bugzilla.kernel.org/show_bug.cgi?id=60731](https://bugzilla.kernel.org/show_bug.cgi?id=60731)

these sort of things are, as you can see, very confusing and frustrating to
track down, identify, and find a general fix for

EDIT2: now that I actually read the kernel bugzilla entry further, it's more
recently come to light the actual problem with recent macbook pro SSDs is MSI
(efficient type of interrupts)

~~~
digi_owl
In essence the Linux kernel put on display what is on Windows hidden by
proprietary device drivers.

~~~
_yosefk
The thing is, almost all hardware accessed through drivers has tons of bugs,
at least it's nowhere near as close to "bug-free" as are things like CPUs or
DRAMs which cannot hide their bugs behind drivers. The thing that one can hope
to work reasonably is a piece of hardware plus an accompanying driver which
knows to hide that hardware's issues.

So another way of putting what you said would be "on Linux there's no working
driver for that piece of hardware, unlike on Windows where the 'proprietor'
went to the trouble of supplying such a driver."

~~~
jjawssd
If you think CPUs do not come with a shit-ton of hardware bugs YOU ARE GRAVELY
MISTAKEN.

Google up the Intel errata for the i7

The list goes on and on.

~~~
digi_owl
Heh, i still recall my early encounters with Linux and reading the bootup
messages.

One of them contained a line related to having found a CPU bug and having put
a workaround in place.

I am not entirely sure, but i think it may have been the F00F bug.

[https://en.wikipedia.org/wiki/Pentium_F00F_bug](https://en.wikipedia.org/wiki/Pentium_F00F_bug)

~~~
aidos
Ha. Reminded me of the Pentium Floating Point Bug from the 90s. First (only?)
time a CPU bug has been an international press story?

[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

------
ChuckMcM
Nice debugging story. When I was at NetApp there were lots of times when drive
firmware for the 'less used' options would fail. On the fiber channel drives
the 'write zeros' command which was supposed to zero a drive was notorious in
its in ability to achieve something that simple. When Google looked at (I
don't know if they finally deployed it) the disk encryption technology it
worked differently disk to disk and firmware rev to firmware rev. I think it
was Brian Pawlowski at NetApp that said "You can count on two things working
right in a hard drive, read, write, and seek." The joke being that you needed
all three of them to work for reliable disk operation.

------
teraflop
Here's an Ubuntu bug tracker entry for what sounds like the same problem:
[https://bugs.launchpad.net/ubuntu/+source/fstrim/+bug/144900...](https://bugs.launchpad.net/ubuntu/+source/fstrim/+bug/1449005)

Linux 4.0.5 includes a patch that blacklists queued TRIM for the buggy drives.
Windows and OS X apparently don't support queued TRIM at all, so they're
unaffected.

~~~
adamsurak
The drives we have detected the issue had still un-queued TRIM. I have reached
to one of the kernel I/O developers for help and he confirmed that it is not
related.

~~~
asayler
But isn't the blacklist you link to in the article specifically for queued
TRIM? E.g.
[https://github.com/torvalds/linux/commit/9a9324d](https://github.com/torvalds/linux/commit/9a9324d).
SO either that blacklist has nothing to do with this issue (in which case it
probably shouldn't be linked from the article), or it does, and we're talking
about issues with queued TRIM.

~~~
lolgay5
The article very clearly states that the issue had nothing to do with queued
or unqueued TRIM:

 _Our affected drives did not match any pattern so they were implicitly
allowed full operation._

See the list:

    
    
      SAMSUNG MZ7WD480HCGM-00003
      SAMSUNG MZ7GE480HMHP-00003
      SAMSUNG MZ7GE240HMGR-00003

------
jlebar
To me, this sort of thing brings home the value of not running your own
machines. Sure, Amazon's/Google's clouds have quirks, but it's far less likely
that you're going to have to debug faulty hardware in this way. It sounds like
a team of more than one person worked on this at least part-time for weeks --
how much is that worth? It's not just the cost of hiring extra people to do
the work; often small companies simply can't hire enough good people -- when
you do find them, do you want to squander them twiddling servers?

~~~
madez
There is no cloud - just other people's computers.

Many use-cases just require the job to be done on your computers due to
security and privacy reasons. Yes, Amazon's and Google's services are in some
ways less secure than your own computer, because they are hosted by companies
which are subject to a government that doesn't value privacy, not even of it's
own citizens. That means said government can, just to give a concrete example,
NSL the companies to give up _all_ they have about you, and you wouldn't even
know notice.

When the government puts national security above fundamental human rights
there is something dangerously wrong.

~~~
derefr
Thinking about individual computers will lead you astray. There are, rather,
_sets_ of machines (from single boxes to entire data-centers) that are managed
by a given sysadmin staff. The more machines they manage, the more likely it
is that problems will have institutionalized and operationalized solutions.

A cloud is just a sysadmin staff with a Sufficiently Large Deployment to have
ironed out all the kinks in their hardware.

~~~
vidarh
> A cloud is just a sysadmin staff with a Sufficiently Large Deployment to
> have ironed out all the kinks in their hardware.

By that definition, I don't think there are any clouds.

~~~
derefr
True, by the literal definition. I continue to interpret "cloud" as "that
mysterious part in the middle of the diagram which is a clean encapsulation of
Somebody Else's Problem that never bothers you"; obviously, there are no
_true_ "clouds" (and there cannot be) by that definition.

But people can try, and they can get close; and one can say that something is
a cloud _to the degree_ that it manages to fulfill the "amorphous shape in
your diagram you don't have to worry about" promise. So there are some
80%-clouds, some 95%-clouds, some 99.995%-clouds, and so on.

The point I was trying to make is that the degree to which a cloud achieves
that promise is correlated to the size (and longevity, and homogeneity) of the
deployment. The more man-years have gone into taking care of a given server
type at a given DC, the more institutional knowledge is ready-at-hand to solve
a problem on _your_ machine of that type, and so the fewer _issues_ become
_emergencies_ that break out of the "cloud" abstraction to require your
attention.

And it was a reply to the parent precisely because a security problem is just
such an "emergency" that represents a failure of institutional knowledge: I
would much sooner trust AWS's KMS to not leak my private keys than I would
trust a machine I was running myself to not leak my private keys. I'm a much
worse sysadmin than AWS!

------
MrBuddyCasino
Not directly related to TRIM, but AeroSpike has a nice test suite for SSDs,
probing for IOPS and latency:
[https://github.com/aerospike/act](https://github.com/aerospike/act)

They share their test results for both physical and cloud-based storage, I
figured this would be of interest:

[http://www.aerospike.com/docs/operations/plan/ssd/ssd_certif...](http://www.aerospike.com/docs/operations/plan/ssd/ssd_certification.html)

------
madez
It feels like Samsung used the Linux community here as a free testbed.

Samsung knew that only Linux supported queued trim, so releasing it without
proper testing is just externalizing the disproportionately increased cost of
testing to the Linux community.

~~~
caf
Is the loss of reputation really worth less than the value of the externalised
testing?

~~~
pjc50
Loss of reputation isn't a real thing in this industry. Pretty much all hard
drive manufacturers have had high-profile "bad" models, for example.

~~~
vidarh
It's a real enough thing that the IBM DeathStar incident [1] [2] was a large
factor in making IBM exit the harddisk market (sold off to Hitachi)

[1]
[https://en.wikipedia.org/wiki/HGST_Deskstar](https://en.wikipedia.org/wiki/HGST_Deskstar)

[2]
[http://www.astro.ufl.edu/~ken/crash/index.html](http://www.astro.ufl.edu/~ken/crash/index.html)

~~~
pjc50
But they're still in business making hard drives, under the DeskStar name.
Seagate had a round of failures at one point. I'm sure there are people out
there who've sworn off WD as well.

~~~
vidarh
Who? If you're referring to IBM, then they're not. IBM sold off their entire
hard-disk division to Hitachi (which a few years ago sold it off to WD).

If you're referring to Hitachi, then they did continue it, yes, but they
bought it on a fire-sale, and their name was not attached to the original
affair, so they presumably did not see it as particularly risky.

------
cabirum
Strange, Samsung 840/850 evo/pro are considered [1][2] among the best consumer
SSDs. The issues article mentions do not exist on Windows, the SSDs are very
reliable there. I suspect it's not only Samsung fault. Are we sure Linux
handling of TRIM operations is absolutely correct?

[1] [http://techreport.com/review/27062/the-ssd-endurance-
experim...](http://techreport.com/review/27062/the-ssd-endurance-experiment-
only-two-remain-after-1-5pb)

[2] [http://www.anandtech.com/show/8216/samsung-
ssd-850-pro-128gb...](http://www.anandtech.com/show/8216/samsung-
ssd-850-pro-128gb-256gb-1tb-review-enter-the-3d-era/13)

~~~
drzaiusapelord
Personally, I find Samsung has a "it boots? Fine then ship," mentality to
pretty much all things. Their buggy phones, buggy SSD's, buggy TV's, etc. I
wouldn't recommend them, even though they do well on SSD speed tests (which
are often gamed by on-board ram caching).

~~~
scott_karana
The 840 Pro exceeded 2.4PB of writes before failing in Anandtech's tests over
18 months: [http://techreport.com/review/27909/the-ssd-endurance-
experim...](http://techreport.com/review/27909/the-ssd-endurance-experiment-
theyre-all-dead/)

Even if Samsung has _some_ systemic problems, it's more subtle than just
schlocky marketing, or targeted benchmarking.

------
sandGorgon
I have this running on my Ubuntu Thinkpad with A Samsung 840 Pro as a weekly
cron job. should I turn it off ?

    
    
      #!/bin/sh
      # call fstrim-all to trim all mounted file systems which support it
      set -e
      
      # This only runs on Intel and Samsung SSDs by default, as some SSDs with faulty
      # firmware may encounter data loss problems when running fstrim under high I/O
      # load (e. g.  https://launchpad.net/bugs/1259829). You can append the
      # --no-model-check option here to disable the vendor check and run fstrim on
      # all SSD drives.
      exec fstrim-all

~~~
teraflop
Probably, unless you're running a kernel that was released within the last
couple of weeks and includes this patch:
[https://github.com/torvalds/linux/commit/9a9324d](https://github.com/torvalds/linux/commit/9a9324d)

~~~
sandGorgon
wow - thanks. did NOT know about this.

------
notacoward
Pretty disappointing to see some of those Samsung drives on the list, because
in some of the other tests/surveys I've seen they seemed to be among the
better choices. _Sigh_ I guess Sturgeon's Law applies to SSDs too.

------
Aardwolf
"Samsung SSD 850 PRO 512GB recently blacklisted as 850 Pro and later in
8-series blacklist"

That's what I have in my home computer, with ArchLinux.

Do you think this problem only is something particular in the servers of the
author of that article, or should this be interpreted as:

linux + samsung 850 = you will lose your data?

Thanks...

~~~
adamsurak
Unless you run the latest kernel, I would disable TRIM.

------
cft
Using SAS SSD drives on a server is a bad idea for many reasons. One should
use PCIe cards, that sit directly on the PCIe bus, such as FusionIO or
SanDisk. They have been tested and retested (e.g. by Facebook), without the
unnecessarily added complexity of SAS/SATA protocols. The I/O performance is
also about 20x.

~~~
baruch
I don't think that testing by Facebook is going to help you unless you are
using the exact same model as they are and are assured of using their exact
firmware. At work we use SAS SSDs in large quantities and the firmware we use
is customized to us (based on the mainline one). Do not assume that a bug that
was fixed in our firmware was necessarily fixed in the normal one. One would
think it would but it is possible that it wasn't ported to the mainline
firmware.

------
andmarios
Been there, done that. :|

Sometime around the end of 2013 I started getting frequently lost data and
corrupted filesystems upon reboot. After much search and about 4-6 months into
the issue, I found out that the culprit were the queued TRIM commands issued
by the linux kernel to my Crucial M500 mSATA disk. The Linux kernel already
had a quirks list with many drives, including some of the M500 variants, just
not mine.

I added my model, compiled the kernel and the nightmare ended. I proceeded to
submit a bug report and a patch. The patch got accepted (yay!) and the bug
report turned to be very useful for other people with the same problem but
different disk as I included the dmesg output that was specific to the issue.
This meant that they could now google the errors and get a helpful result.

Such is the nature of free software; you are allowed to fix your computer
yourself. :)

------
mrmondo
I've worked on some interesting SSD deployments / experiments a lot over the
past 12 months. Quite honestly - I wouldn't go anywhere near Samsung products
regardless of their 'PRO' labelling or otherwise.

We have had great success with both Sandisk Extreme Pro SATA and Intel DC NVMe
series drives, we've also recently deployed a number of Crucial 'Micron' M600
1TB SATA drives that are performing very well and so far haven't given us any
issues.

~~~
u02sgb
I've done similar over the last three years and had good luck with the Crucial
drives. However if you take a look at the Linux Kernel patch they link to
(search for "don't properly handle queued TRIM"):
[https://github.com/torvalds/linux/blob/e64f638483a21105c7ce3...](https://github.com/torvalds/linux/blob/e64f638483a21105c7ce330d543fa1f1c35b5bc7/drivers/ata/libata-
core.c#L4109-L4286)

There are Crucial SSDs on the list. I'm going to be keeping a closer eye on
them now.

~~~
mrmondo
Yeah I saw that - although that's the older, now discontinued series that has
a different controller and doesn't show the same consistent performance as the
newer M600 drives.

------
suprjami
What a wonderful story. I wish everyone was this diligent at troubleshooting.
Then again, that would put me out of a job.

------
douglasheriot
Wow, that sucks. Another reason to use ZFS – you’d notice the corrupted files
a lot sooner.

~~~
ThatPlayer
Or Btrfs on Linux.

~~~
rleigh
In theory, yes. Unfortunately, every time my Btrfs filesystems have
encountered a hardware glitch, it has happily trashed the filesystem beyond
recovery (including both drives in a RAID1 mirror, one of which was perfectly
OK). I use ZFS now, and while some features are compatable with Btrfs, the
implementation quality, documentation, and feature completeness, and tool
quality set it well above where Btrfs is at.

~~~
naranja
I fully second that: I'm using btrfs for / and ZFS for /srv. So many rashed
filesystems beyond recovery on btrfs, so many joy, stability and easy tools
for ZFS.

I'm really about to consider to migrate / to ZFS now.

------
microcolonel
I've had issues with these samsung 8xx drives, unfortunately they all happened
at once. I gave up on their RMA/warranty process because I was bounced back
and forth between the same two numbers a few times. Either side said that the
other was in charge of this process(samsung bought the SSD division from
seagate... or was it seagate that bought the HDD division from Samsung? To
this day I have no clue.).

------
bbcbasic
I have a Samsung SSD 850 PRO 512GB in my Windows PC. And I have TRIM enabled
in Windows:

    
    
         > fsutil.exe behaviour query DisableDeleteNotify
         DisableDeleteNotify = 0
    

Should I be worried?

~~~
ploxiln
Released versions of Windows do not use queued trim.

(That's why serious bugs like this can happen ;)

~~~
bad_user
The problem exposed in the article is about un-queued trim.

------
lvs
Can someone clarify the article's claim that these Samsung drives are really
"broken" as such? We have a few of these on 3.13 and 3.16 kernels and ext4
with no problems. It seems that there must be something unique to their
application in order to expose these trim failures.

~~~
ploxiln
Do you have the "discard" mount option enabled? Do you have a cron job that
runs the "fstrim" command? It's possible your systems are not running trim. Or
maybe your ext4 filesystems have little activity and you haven't had enough
corruption to notice yet :)

Also, some Samsung 800 series drives only gained this bug in a recent firmware
update (840 EVO specifically).

~~~
guns
The 840 EVO joined the club with firmware EXT0DB6Q, which itself is a nasty
little hack around a fundamental design problem with the tightly packed NAND
cells.

Linux 4.0.5 ships with the patch linked above, but for a while you had to roll
with a kernel built from source.

EDIT: The blatant file corruption issues only manifested after updating to
firmware EXT0DB6Q.

~~~
Figs
Is there a list of firmware versions with release dates somewhere? I can't
seem to find a changelog.

------
Aardwolf
I'm so sick of this TRIM. Constant configurations needed because of it,
constant care like "this thing you better don't do on SSDs". And then problems
like this.

Do you think there'll ever be SSDs that don't need it?

~~~
yardie
I remember started incorporating SSDs into their computers and didn't support
TRIM. Windows users were telling Mac users their Macs were practically
obsolete because it couldn't do this one thing that was enabled for Windows.
of course they sent that back to Apple and Apple replied, for years, you don't
need it.

Eventually, they relented and enabled it on their SSDs. I'm pretty sure the
marketing and engineering butted heads over this one stupid bullet point.

~~~
drzaiusapelord
Except without TRIM you'll fill all your blocks and kill performance of your
fancy $1500 Apple when the SSD is performing a dozen operations to create a
space to perform writes instead of one operation on a properly TRIM'd drive.

Apple didn't do this because of "windows users whining" but because they knew
they didn't want an angry mob of customers wondering why their drive is 10x
slower than it was on day one.

Arguably, idle GC was "good enough," for some use cases but probably not for
drives that aren't sitting idle all the time and on many hours a day. Even
then, Apple probably didn't want to tell its customers "let it sit out
overnight" to regain performance when supporting plain-jane TRIM was a trivial
addition.

On-board GC + OS-driven TRIM are considered the optimal solution for SSD's.

------
kbar13
if one machine failed and failover kicked in correctly, why was the engineer
paged?

~~~
jimrandomh
Because it's hard to make an automatic monitoring system that reliably
distinguishes between "a failure occurred but everything is fine" and "a
failure occurred and now everything is on fire".

------
stream_fusion
I have one of the affected drives mentioned in the article in my development
laptop - the Samsung SSD 850 PRO 512GB.

As one of the most expensive SSD drives available on the market, it was
disconcerting to find dmesg -T showing trim errors, when the drive was mounted
with the discard option. Research on mailing lists, indicated that the driver
devs, believe it's a Samsung firmware issue.

Disabling trim in fstab, stopped the error messages. However it's difficult to
get good information about whether drive performance or longevity may be
impacted without the trim support.

~~~
hvidgaard
Trim really is only a helpful message when the drive is near full so the GC
can preemptively zero blocks and retain good write speed. Without trim, the
firmware must wait until it gets a write for a particular block before it know
it can be erased.

If your drive has reasonably with unprovisioned space, it can simply work
around the missing trim commands - this however, is theory, I do not know if
the firmware actually does this. This is the exact thing that makes some
drives better than others when working without trim.

~~~
stream_fusion
Thanks. I'll probably end up creating an unprovisioned partition. It's
frustrating, exactly because of the uncertainty re future performance.
Especially given the price premium for pro/enterprise level hardware.

~~~
hvidgaard
You can research if the firmware understands MBR and GPT - if it only
understands one, then you have to use that. Alternatively use Samsungs own
software (I think it's called Magician, can't remember exactly), it will make
sure you have the unprovisioned space setup correctly.

------
anigbrowl
Interesting! I sometimes work with SSDs as storage media for cameras (where
Sandisk is the most popular brand by a mile) and I seriously doubt any camera
firmware is doing drive maintenance. From what I know of digital imaging
technicians, neither are they - if a drive starts acting up in any way, the
usual policy is to just take it out of service immediately, recover anything
that was on it, dump it, and buy a replacement.

------
sengork
Given how many Samsung drives are listed in their findings, I can only
attribute this to the fact Samsung make their own SSD controllers.

------
Figs
How do you disable TRIM on common distros? Under Ubuntu, is it just preventing
/etc/cron.weekly/fstrim from running, or is there more to it? What about
CentOS, etc?

------
frik
What SSD do cloud hoster like DigitalOcean, Linode, Rackspace, Vultr, etc use?

I would some sites trade storage speed for more space (HDDs instead of SSDs).

------
Supersaiyan_IV
Undoubtedly the same issue happened to me on an 500GB 840 EVO with NTFS.

SSD zeroed out a part of the disk during runtime, as I watched this happen
music was playing from this drive. It was mounted from Ubuntu MATE 15.04 and
playing a music library through Audacious. Suddenly music glitched and IO
errors began appearing. Rebooted to a DISK READ ERROR (MBR was on the EVO).
Ran chkdsk from USB and it showed a ridiculous amount of orphaned files for
ca. 1h. Once finished the _most frequently accessed_ files had disappeared.
Download folder, Documents folder, some system files. Of course, some of the
files could've been recovered had I not ran chkdsk off the bat, bot
nonetheless it's an approximate measure of failure impact.

I began being suspicious of 840 EVO when sorting old files by date became
fantastically slow. If you have a feeling this has happened to you recently -
buckle up for a shitstorm.

TL;DR Avoid 840 EVO.

~~~
Supersaiyan_IV
To the downvoters: this occurred a week after upgrading to Samsung's EXT0DB6Q
firmware. Meaning that mentioned read delays should've been nonexistent.

Not to mention that this disk has only had 5TB written to it.

