
Real world SSD wearout - MBCook
https://blog.okmeter.io/real-world-ssd-wearout-a3396a35c663
======
ScoJoh
Place I work got hit HARD by this. Year and half before I started at this
company they setup 4 SSD's in a RAID 5 config.

One day while working on things I noticed a message on our server indicating a
PDR1001 Error (Predictive Failure). So we ordered a new one. The new SSD
arrived. We popped in the new one and the RAID started to rebuild.... Lo and
behold during that operation Drive 1 threw the same error and the whole thing
came crashing down...

We ended up losing the whole array. I had NO idea we had SSD's in the system.
I had no idea that no one was monitoring their life.... The moment I saw we
had this issue I saw the writing on the wall. 4 SSD's in a raid 5 all
installed at the same time... means all SSDs end up with the same approximate
critical end of life.

All I could do was shake my head at the whole thing... Pretty sure those in
the charge who setup the array still don't understand why this situation was
100% avoidable...

~~~
arminiusreturns
Besides the ssds, who in their right mind uses raid 5 anymore? It's been dead
to me for years... since the first time I had to rebuild an r5 of 4tb disks
and did the math on the time window for cascades. Also, not to nitpick, but
you shouldn't wait to order a replacement that gets shipped to rebuild, you
should have spares ready to go at all times for all raid systems. That might
have been the difference between a cascade and a normal rebuild.

~~~
simcop2387
Yea any time I'm using a raid setup (as opposed to a cluster or other more
intelligent system like zfs) I insist that there be at least one hot spare and
a cold one ready. And never let anyone convince you a raid is a backup

~~~
alexsb92
As someone looking to backup a few terabytes of photos and videos, where I do
have them on a raid, what constitutes an actual backup?

~~~
Max_aaa
An actual backup, would be a copy of said photos and vidoes on:

\- An external Drive. (which is only connected to copy files)

\- An external cloud service.

\- Another computer that you have.

You should probably do a couple of these.

I personally backup home computers (using borg) to a home server, that server
has a 2.5 2TB external HDD connected to it (2 other 2.5 external drives are
kept outside of the house). A backup of important files from the nas
(including the computer backups) gets copied over to the external drive
nightly. Weekly the drives gets rotated.

The really important stuff is also backed up offsite on a daily basis.

------
S_A_P
This is a useful datapoint, but I feel the author could learn a thing or two
from the backblaze disk report. It looks like there are a few use cases that
cause high wear on SSDs. I dont see a lot of concrete numbers here. To me, it
is more useful to say something like: these ssds will last 2 years given that
their sustained write throughput is X GB/sec on average.

From the SSD torture tests Ive seen, it is many petabytes of data that the
average SSD must write before getting anywhere near "weared out".

~~~
nomel
Some care has to be taken when saying a "petabyte of data" with flash media.
You don't have to write a petabyte of data to reach a petabyte of SSD writes.
The minimum block size that flash can write is an "erase block", which is
usually a few megabytes.

This means that the minimum SSD write will be at least a few megabytes,
including that few hundred bytes to that logfile. Likewise, every flushed
write will round up to the nearest erase block, in size.

So, to write a few petabytes to the drive, you only need a few gigabytes of
writes of per-byte-flushed data, since your actual NAND writes are amplified
by a few million.

There's cache and many smart algorithms at play to minimize the number of
erase block writes, but once you cause a flush (close the file, etc), you're
performing a write to flash.

This is also why any embedded system with logging enabled has a very real
maximum operating life. This is fun to discover the first time, when all of
your customers start saying your product stopped working within the same
couple of months.

~~~
xenofanes
Some clarifications: Minimum SSD write is much less than the erase block.
Erase block is the minimum block size that can be erased. For SSDS, there are
2 key concepts: 1\. Read granularity < write granularity < erase granularity
2\. Cells that have been written to must be erased before writing again.

Most SSD vendors will buffer write contents to collect a `write granularity`
worth of data to avoid wasting bytes writing padding. This can be hard to do
on drives without capacitors to supply backup power in the event of power
loss.

~~~
nomel
I stand corrected. Our application was a circular buffer, so all non-cached
writes trigger an erase. I suppose this isn't a normal use case, unless your
drive is 100% full, so maybe tens of gigs rather than gigs, to write a
petabyte. ;)

------
otterpro
Is the SMART attribute “media wearout indicator” only available on enterprise-
grade SSD and not on consumer-grade SSD? I checked my Samsung SSD as well as
NVMe drives and I didn't find it on CrystalDiskInfo.

EDIT: It looks like "wear leveling count" attribute is available on Intel
(#233) and some high-end Samsungs (#177), as well as other SSD, but it's not
on any of my SSD.

~~~
vthriller
If you're using smartmontools, try running `update-smart-drivedb` first: those
fancy attribute names don't come from drives themselves but are always
interpreted from (model, firmware, attribute id) tuple by the software you're
using to query disks.

Or you can just look for [0] directly if you just want to know what drives
Media_Wearout_Indicator is currently defined for.

[0]
[https://raw.githubusercontent.com/mirror/smartmontools/maste...](https://raw.githubusercontent.com/mirror/smartmontools/master/drivedb.h)

------
NikolaNovak
Is there a recommended generalized tool (non-vendor specific) that can be
reliably used to determine your SSD's health and outlook?

(Unfortunately I have a mix of SSDs in use, even in single personal computer,
and all the vendor individual software gets overly complex and bloated...)

~~~
yjftsjthsd-h
On unix-likes, I believe SMART tools works?

~~~
rincebrain
smartmontools also work great on Windows, OS X, ...

------
spydum
So I'm confused. I get the wear out level not going lower than 1%, but that
doesn't mean the drive failed, right?

~~~
mjb
That's very dependent on the drive vendor, model, and even firmware version.
In many cases, drives can go on working for a very long time after the SMART
metrics show it has failed. Whether you want to take the "risk" depends a lot
on how critical this particular device is in your infrastructure.

~~~
tenebrisalietum
I have a 16GB SSD that used to be in a POS terminal before I bought it second
hand. The SSD Wear Leveling Count attribute shows as failed. Still works.

I have chosen to use it as part of a RAID1 for an OS install on a home server.
Waiting to see how long it will last.

~~~
WrtCdEvrydy
It's kinda like the expiration date for foods, an estimate.

------
cjensen
Worth pointing out: good manufacturers like Micron/Crucial publish endurance
figures for their drives. Endurance is the number of bytes you can write
before the drive is officially worn out. Endurance numbers can vary by two
orders of magnitude.

So if you have a write-intensive app like a Jenkins server building a C++
project, you might want to pony up for a server-class drive with better
endurance.

~~~
NullPrefix
>building a C++ project

Build in a ram disk and then copy whatever you need for permanent storage?

~~~
gruez
not everyone has 64GB of ram.

~~~
NullPrefix
64GB is cheaper than enterprise SSD.

------
tru_pablo
As an author I would greatly appreciate any suggestions in what kind of stats
you would want us to gather and share.

~~~
labarna
Just FYI, "to wear (out)" has an irregular past tense, and past participle
forms: "wore" and "worn". So:

1) These disk, because of constant throughput, wore out.

2) These disk, because of constant throughput, are worn out.

~~~
tru_pablo
thnx

------
brokentone
This article seems to have a trove of interesting data, but struggles to
generalize many conclusions out of it.

~~~
wyldfire
I think that's because it's a thinly veiled advertisement.

------
Fnoord
The advice to consider to disable swap applies equally on (micro)SD and USB
sticks. Just make sure you have enough RAM. Also, consider mounting
directories such as /var/log as tmpfs (if you don't need the logs after
reboot) or use a remote syslog.

------
julienfr112
What really happens when the level is low ? Less capacity ? Data loss ?
Performance decreases ?

~~~
pkaye
I've worked on SSD firmware before. Obvious the implementations may vary but
typically there is no hard failure. These are just projections based on data
sheet endurance specs. What you need to look at additionally is how many
defect blocks there are (another SMART spec) as it will start to rapidly
increase as more and more flash blocks go bad.

When the level is low, capacity shouldn't decrease as there are spare blocks.
There should be no data loss expect by catastropic block failurs (some SSD
have internal RAID like redundancy.) There might be a slight performance
decrease as the error correction algorithms become more active.

~~~
komali2
Those are firmware level error corrections? Or like, driver?

~~~
pkaye
Its in the controller. A combination of hardware and firmware. A large percent
of them will be handled by the hardware but a small number requires firmware
intervention.

------
apankrat
"Media wearout indicator" is a vendor-specific attribute.

In fact, there are NO standard (or as the article coyly calls them "basic")
SMART attributes at all. A lot of vendors stick certain attributes into same
slots, granted, but there's literally no industry-wide spec for any of the
attributes. You have to go not by vendor, and not even by device, but by
firmware revision to accurately interpret an attribute.

The wearout indicator often sits at slot 233, but sometimes it will be in 230,
sometimes - some other slot. Moreover, in some drives (e.g. OCZ or ADATA) 233
is undocumented at all and it will grow from 0 normalized.

So this:

> _we implemented collection not of all the attributes, but only basic and not
> vendor-specific ones_

is misleading and inaccurate.

~~~
jacob019
I'd like to request a request for comments.

~~~
jlgaddis
You don't request one, you write one.

------
true_tuna
I did a similar thing and found even lower failure rates (across thousands of
drives in a high write database environment). One thing intel recommended was
to overprovision the drives. Setting the maxlba at 80% of drive capacity
preserves optimal wear leveling. I also used a different drive layout etc. the
long and short of it is nand wearout is kinda like quicksand. You grow up
expecting it to be a way bigger problem than it actually is in your adult
life.

------
gameswithgo
The redis case is interesting. If the 1 minute interval was somewhat arbitrary
you could almost double the life of the drive by just bumping it to 2 minutes.

~~~
krallja
Yes, depending on their use case for this frequent dump file, they may be
happier using AOF or much-less-frequent saves instead. I believe a dump file
is belt-and-suspenders secondary backup for almost all users.

~~~
PeanutNore
I just can't trust a man who doesn't trust his suspenders

~~~
pohl
Maybe it's the belt they don't trust.

------
cmurf
It does very much depend on the "SSD" in question, e.g. SD Card and eMMC don't
tolerate nearly as many writes as a SATA SSD or NVMe, and for sure the former
dislike having power cut. I've had a few name brand SD Cards go permanently
read only as a result: neat I can still read my data, but also not neat that I
can't erase them before sending them back under warranty.

------
nickjj
It's interesting how long SSDs tend to last on development boxes.

I've had a 256GB Crucial SSD running on my primary dev box and it's been
powered on for 3.6 years. During that time it's been powered on 94 times. I
run a ton of Dockerized apps and other things.

The "wear leveling count" is at 163 which according to Crucial means the drive
is at about 95% healthy.

------
rl3
This is why Optane storage is amazing. Wears out about ten times slower, and
has incredible 4K random read performance on top of it.

It almost feels like Intel is squandering a massive advantage by not pushing
it harder. I know Optane DIMMs are coming soon, but M.2 form factor would be
nice as well.

~~~
Rafuino
What would you use it for yourself? (disclosure: I work at Intel on Optane-
related topics.)

~~~
rl3
The fast 4K random read is handy for delta backups to hard disk; I can read
half a million files in about a minute, determine what changed, and send
whatever small changes off to disk.

Another great use case is for web/npm projects with a million dependencies,
all flat file. Working with those is usually miserable otherwise.

In the future I intend to crunch time series data with it.

------
Rafuino
For those of you deploying lots of SSDs for work, how often are you having to
replace them due to wear out? Many have 5 years warranties with multiple PB
written (or drive writes per day), so I wonder how things are in practice.

~~~
true_tuna
Wearout is very uncommon. Biggest thing to do is make sure you don’t fill them
up. Full drives are unable to do their optimal wear leveling.

~~~
Rafuino
Would you do anything different with your storage software or SSD procurement
if you didn't have to worry about filling up SSDs or wear leveling? Wondering
how big of a pain point it is (disclosure: I work on new SSD tech at Intel).

------
ksec
I accidentally discovered something similar as well, I saw my Disk usage was
30TB in the past weeks on my MacBook, I thought that is abnormal. Most of my
work are browsing, and Editor coding. Turns out I had too many Tabs in Safari,
and my Mac was out of memory constantly doing swapping. I was doing anywhere
from 100GB to few hundred GB write per day.

If anyone has a easy solution to this would be much appreciated. Right now I a
just checking every few days to see if I did something silly that causes lots
of paging. There was a time I don't think I need 32GB of Memory, now I think
there is a real need.

~~~
rasz
>If anyone has a easy solution to this would be much appreciated.

-use different browser, Vivaldi supports suspending and lazy tab loading

-add more ram, in your apple case it probably means buying new laptop

------
jarym
I read the related article about Postgres SELECTs causing writes sometimes -
I've used Postgres for years and never knew this!

Anyway, I spend most of my time in the software world and occasionally have to
setup hardware (or more commonly, diagnose hardware issues). Is there a good
guide for how to setup things like Postgres or Redis with recommendations on
swap/cache setup, RAID-level selection, etc?

------
api
We had one SSD die in production, but it was on a system whose write load was
enormous... like 50-60mb/sec sustained _for over one year_.

~~~
whitepoplar
Do you remember what model it was?

~~~
api
Intel I think, but not positive. It was a higher grade SSD. Could have been an
isolated incident but it was definitely abused.

