Most SSD's will provide metrics that are quite readable for estimating the lifetime left for the drive. Most nvme-drive expose a line with 'Percentage used' in the nvme log (just use smartctl to read the nvme log). With sata-drives it is a bit more hit or miss, but the couple of drives I have lying around at the moment record wear leveling in Wear_Leveling_Count, which does provide the raw wear leveling count and the lifetime left as normalized VALUE.
Hard Disk Sentinel is great for helping with this. The developer has been at it for years, figured out all kinds of quirks of different disks and controllers.
One thing I've always wondered about is how the ECC is handled on a modern SSD with regards to data retention.
SSDs often have a data retention spec that basically defines how long you have until your bits start flipping, and it usually falls off a cliff w.r.t. temperature, which can make SSDs non-ideal for offline backups.
However, I've read that reading from the SSD periodically allows it to detect these errors. Some say that even keeping it just powered on is enough.
My question is, do SSDs run some sort of internal scrub while they're powered on? I don't think so, based off of some power consumption tests I've done.
Also, if they do detect an ECC error, will they actually re-write the block in question, or just correct it and return a successful IO, while leaving the compromised data still on the media?
IIRC, it periodically scans the blocks level metadata during lulls. Reads don't take much power but for some larger drives something like 20% of activity was towards maintenance and the guaranteed retention period without power was only something like 2-3 months.
Did the question about Apple’s new laptops and their apparently high writes in general usage get resolved? And was it incorrect/misleading reporting, or somehow not a problem (life of SSD still likely to be ‘long enough’) or is it still potentially the case that the machines will be ‘bricked’ due to the non-replaceable SSD dying early?
I don’t mind having to have the SSD replaced by Apple if the cost is reasonable, just as I do with batteries, but would be good to know what to expect.
While I don't like it being that way, I can definitely see the benefits of non-upgradeable RAM. The performance of the SOC on my "lowly", entry-level M1 Air is out of this world.
But an SSD that's glued on to the motherboard has 0 benefits that I can think of and basically only serves to give any computer a hard coded expiration date. And thinness is not an excuse. There are computers as thin as the Air that have removable storage drives.
> But an SSD that's glued on to the motherboard has 0 benefits
It's because, since the T2 chip and going on with Apple Silicon, they're not SSDs in the NVMe sense. They're an Apple-specific technology derived from their Anobit acquisition, that only look like an NVMe device to the upper layers.
And the only viable mitigation strategies involve giving Apple more money.
Spec'ing a larger onboard SSD to spread out the writes for hopefully longer endurance should be effective.
And, perhaps, opting for more RAM to reduce VM writes to disk? I'm not sure if that's effective. Perhaps the resulting sleep file will be larger as well, resulting on more writes overall. I'm sure somebody can chime in on that?
At least the newer models have SD card slots; can use SD cards for semidurable storage. They are obviously unsuitable for some things, but fine for others.
"Thin" is also no longer a word I'd use to describe the current M1 Macbook Pros; they're significantly thicker and heavier than their Intel counterparts of old (bigger battery required to achieve that legendary battery life?)
I have both a 2019 Intel and 2022 M1, and the M1 is more than 0.5KG heavier and more than 0.7mm thicker.
I was surprised to realize they'd made the newer machines significantly thicker and heavier.
Both machines have their pros and cons, neither is perfect or terrible.
All specs are available here [0]. Even if you compare different sizes the 14" M1 is still just ~230g heavier than the 13" Intel. At the same size (13") the differences are what I put above. "More than 0.5kg" means almost half the weight of the laptop. You need a 16" M1 Pro to get there.
At 16" it would still put them at a barely noticeable 100g (at 2kg) and 0.6mm difference. Even after Apple rid itself of Ive's obsession with thinness they would never go for "significantly thicker and heavier" than the previous model in a portable device.
Can't really think of many (any?) laptops that have this mix of thin, light, high performance, and long battery life.
I agree, I'd prefer longer battery life, and better performance in a slight weight and thickness tradeoff.
Sadly, I've tried PC laptops (Surface tablet, Surface laptops and Asus laptops) and the performance (speed, thermals and battery life) are still nowhere near the Apple M1 hardware. I wish there was better price/performance from apple, but until the PC world has a strong contender, I don't see that happening.
At least Intel has strong competition from AMD now :)
> While I don't like it being that way, I can definitely see the benefits of non-upgradeable RAM. The performance of the SOC on my "lowly", entry-level M1 Air is out of this world.
It's not even clear that this has a performance benefit on typical workloads. At release the M1 was non-trivially faster than contemporaneous PC mobiles, but its competition was also using TSMC 7nm and DDR4.
Now we've seen Zen3+ mobiles on 6nm with DDR5 and upgradeable memory and they're about the same speed for nearly everything despite the M1 being on 5nm, which basically proves they didn't need to solder the RAM.
Less Physical Distance should create less latency, plus an optimized profile for one list of model of banks vs the diversity in the modula industry with different clock speeds and latencies.
Is there any actual evidence that this marginally lower latency makes any real life difference?
And why would the laptops suddenly ship with a variety of modules if they're replaceable? You can still ship it with the same modules in every laptop and get those benefits. And if someone upgrades it, that's still an improvement over no upgrade path, so this makes no sense to me.
> Is there any actual evidence that this marginally lower latency makes any real life difference?
Yes, compare the specs of DDR4 and DDR5 to LPDDR4X and LPDDR5X. The latter are significantly higher performance.
This is also the reason that Dell recently introduced CAMM memory modules -- it is an attempt to address the packaging bottleneck that is limiting the speed of DIMMs currently.
The amount of time that it takes a signal to go down a wire has been relevant for DRAM for a while now. If you look at the traces on the board of any relatively modern computer, you'll see some that take circuitous routes, for the purpose of having the signals arrive at the CPU at the same time. You can see this even on relatively low performance devices like a Raspberry Pi. https://www.cnx-software.com/wp-content/uploads/2019/06/Rasp...
> If you look at the traces on the board of any relatively modern computer, you'll see some that take circuitous routes, for the purpose of having the signals arrive at the CPU at the same time.
Having all the signals arrive at the same time isn't the same thing as having the signals arrive soon.
High bandwidth, low latency unified memory is a central component of the M series architecture and a key reason those chips perform so well at their power profile.
I'm not sure what "evidence" I could provide that would convince you since we don't have high latency Apple chips to benchmark against. However, there's a reason VRAM is soldered onto GPUs.
Soldered RAM doesn't really help much with latency; most of the DRAM latency ocurrs within the chip itself. Where soldering RAM does help is with reaching higher clock speeds with lower power (in phones and laptops), or with reaching clock speeds that are impossible to shove through a DIMM connector (GPUs).
That higher frequency helps phones save on pin count by using a narrow memory bus, and allows laptops to have lots of memory bandwidth to feed the integrated graphics when using typical laptop/desktop bus widths.
The 'iPhone Slowdown Thing' seems to be the 'McDonalds Frivolous Lawsuit" of the computer world. As in, misunderstood mainly for ideological reasons. What Apple did in response to aging batteries was perfectly sensible technically, what they failed to do was communicate it properly to the user.
What Apple did in response to aging batteries was perfectly sensible technically
Yep, and what would be even more sensible is allowing users to 'cap' the charge level the way Samsung has started to with a "charge only to 85%" ability. Don't just respond to aging batteries, allow steps to reduce aging.
> But, that goes against the upgrade sales process.
Have you considered your own bias? Apple automatically caps battery charging at 80% but it is algorithmically controlled instead of a manual toggle. I would like the manual target, too, but if your narrative were accurate they would not have implemented the feature to begin with.
Correct me if I'm wrong, but it doesn't cap anything. It just waits an arbitrary amount of time before charging from 80% to 100%. Simply charging it in that range causes long-term battery wear.
> Yep, and what would be even more sensible is allowing users to 'cap' the charge level the way Samsung has started to with a "charge only to 85%" ability.
You can kinda do this with AOSP, but it's obnoxiously convoluted. You enable the adaptive charging feature, then set a silent alarm for 9:59, then set an alarm on something other than your phone so you can unplug your phone at 7:00 or so. (edit: or I guess you could get a smart outlet to cut power to your phone at 7:00)
The obsession with thinner phones and undersized batteries to allow that was the real issue, in my opinion. Looking at a comparison list[1] the 6s has about half the battery capacity of the iPhone 13, and is the smallest battery of the entire iPhone lineup for its screen / face size.
I would say call that tangential. Properly functioning batteries that are undersized are a battery life problem. A malfunctioning battery that cannot deliver enough current for the CPU at full power is a different problem. It could be argued that the smaller battery would degrade faster due to more cycles, but I don't recall whether there was general dissatisfaction with battery life on the 6S or not. I expect the battery on the 13 to be bigger because I assume (with no research, admittedly) that the SOC and screen both take a good bit more power than the equivalents on a 6S.
> Properly functioning batteries that are undersized are a battery life problem. A malfunctioning battery that cannot deliver enough current for the CPU at full power is a different problem.
These are the same problem. Nominal voltage lowers as batteries age, as does the ability to maintain a voltage under load. The batteries were "fine" for the first year. A larger battery would have been able to keep voltage under load at an acceptable level even as it aged.
> I expect the battery on the 13 to be bigger because I assume (with no research, admittedly) that the SOC and screen both take a good bit more power than the equivalents on a 6S.
And yet the 6 had a larger battery (same screen size). Personally I find the 6S (and 6) to be very thin. They could have easily been a couple mm larger without the consumer noticing (Apple could have made the camera lens flush!).
This is kind of a weird diss against a phone manufacturer that regularly releases OS updates for devices that are over 5 years old. Are there others that even come close? Google abandoned their first-party Pixel 3 about 3-4 years in. I'm not sure Samsung has done any better.
The SSD is soldered onto the motherboard on all new apple laptops and has been for years, there is no replacing it without the whole board. Which can be 70% of the cost of a new laptop.
Whatever ram and SSD config you buy when new is how it will be forever.
This is the exact reason I am currently very uncomfortable with the idea of a new Macbook Pro for my own business use.
On the one had it absolutely makes life a lot easier and all the software I need runs great on it. On the other I would by buying into a device I simply cannot upgrade or maintain myself. This makes me extremely uncomfortable.
I dont want to run Windows as a daily driver since it is really jarring for my personal workflow etc. Yet Linux lacks quite a few of the essential pieces of software I need outside of development. E.g. (Krisp.AI, Reincubate Camo etc)
For as often as I've had to upgrade a machine, or had a hardware failure, I'd just choose whatever works best for my daily workflow. An inconvenient fix that takes my computer out of service for a day while I run over to the Apple Store that only happens every few years at most is just not comparable to something that puts a drag on my workflow every single day.
I was a windows user and thought the same thing for awhile about switching to Linux. I was heavily dependent on Adobe products to create PDF Forms. Then I realized I could use a CRM solution to handle the from creation for me. I wrote about my experience in switching to Linux and so far it is work well for me: https://www.scottrlarson.com/publications/publication-transi...
Have you replaced SSD's because of failure or to upgrade? I personally have never seen an internal SSD fail. My 2018 MacBook Pro has had zero issues, and I still use a Samsung Evo 840 SSD from 2013 in one of my PCs.
I'm not agreeing with Apple soldering the SSD to the logic board. But they do seem to be significantly more reliable than hard drives.
Anecdotally, I did have a Samsung T5 go tits up on me not long ago. But that's an external drive. Not from physical abuse, either, it spent its life sitting on my desk.
I've never had any kind of problem with internal SSDs.
Usually to upgrade capacity, but I have had one personal SSD fail (I've used... 30? 35? in the last decade). It was an internal drive, 60 GB OCZ bought in 2012, failed in 2015.
Similar here - 256GB 830 Pro, lasted damn close to a decade and was in my desktop, then my sons and finally my daughters before it died. Very high writes as well - it was a great SSD.
Even though I don't use any external drives, I occasionally flirt with the idea of buying an SSD to collect music.
But every time I get a little bit paranoid about the idea of "what if it suddenly dies and I loose everything?" and then I start thinking about backuos, at which point I'm like "nah scree this".
Are there any recommendations for particular manufacturers to look for? (Seagate, Samsung, Toshiba etc)
Also, how much is the average lifetime of a consumer-grade SSD these days? I always think they'll die in 5 years, but that's totally out of my head and not from experience.
Music collection is typically “write once, read forever”. It’s hard to estimate the lifetime of such read-only SSDs, but it’s measured in decades, if it’s regularly powered on and read entirely. If it’s not powered for years, you’re going to risk data loss due to thermal noise flipping bits randomly beyond the built-in ECC capabilities. If you keep it powered on, you still need to regularly scrub it (i.e. read all data) to force the firmware to fix and write back flipped bits. Some firmwares may do this periodically by themselves, but it’s a black box, so you must do it yourself to be sure.
I wouldn’t recommend relying on a single drive for long-term storage of anything that’s worth more than ~$1000 - neither with SSDs nor HDDs. Freak accidents can happen with any component. You PSU can go rogue and fry your SSD, etc.
The best option is most likely a single local drive with continuous mirroring to a cloud service. The drawback is the ongoing cost of the cloud, and the possibility of partial data loss, because cloud mirroring won’t be instantenous when saving bulk data.
A local RAID1 array is cost-efficient, but doesn’t save you from black-swan events like floods or house fires.
I had a SSD fail not long ago. It was used regularly, but one day I noticed that reading a large file failed. I went into panic mode and stopped using it and attempted to copy everything to a new drive, but copying many of the large files failed. These large files consisted of things like music which had not been edited for years. This sounds like neither of your scenarios - not thermal bit flipping, and not overwriting. How common are failures of this sort?
Frankly, we have no idea. If the manufacturers know, they're not telling us. I've seen dozens of failed SSDs and their failure modes are completely different from rotating disks and almost always lead to complete or significant data loss with little to no warning.
What numbers we have show that failure rates are low, but honestly I would give it another decade or two before using SSD for persistent data storage (of important data without backups to rotating disks). I don't think we're there yet.
I think you're missing that there is no such thing as 'not edited' on a SSD. Data gets moved around by firmware all the time in order to ensure wear leveling is applied equally all over the drive. This is the exact kind of failure you should expect on an SSD.
While backing up to a cloud is good, and gets the data offsite, bear in mind the chance of losing data there.
The primary issue is not the 10 to the minus X failure rate they quote, but the much more likely chance that you will lose access to the data for some reason. For example, the account is hacked, someone closes the account, or the account just deleted/restricted by the provider.
But I’d say it’s unlikely that your local copy and the cloud backup would get destroyed at the same time.
Also, when I said “cloud”, I meant a proper cloud service with an SLA like S3 Glacier, not Google Drive which gets wiped if your Google account is disabled for uploading a YouTube video with background music.
For music collection and playback (low-bandwidth low-parallelism sequential reads), a mechanical drive would probably be substantially cheaper than SSD for large capacities without a substantial performance hit (though I don't know if you even need terabytes to store music). Anyway I have a 256GB Samsung 850 PRO SSD from a 2016 laptop, which I've used on and off and swapped between devices, and hasn't failed yet.
Not sure what OP's specific use case is, but for me, the one big downside of spinning drives is noise. And even if the drive itself may be fairly quiet, you have to make sure its vibrations don't propagate to furniture where they can get amplified.
Good point for especially storing music files. I'd like to use SSD every time for < 2TB data. Unfortunately, I'm a data hoarder so still spinning rusts.
I suppose it depends on the drive. I have a cheap external WD "passport" drive I use for backups. I sometimes sit it on a wooden end-table at my parents' house, and you can absolutely hear the vibrations.
That's very dependent on Amazon actually shipping you the drive, which between my recent experience and tales from a few friends in the industry, is much more of a coin toss these days.
> Also, how much is the average lifetime of a consumer-grade SSD these days? I always think they'll die in 5 years, but that's totally out of my head and not from experience.
Some have warranty that long but the fact you wont reach TBW doesn't mean it won't lose data.
JESD218 spec only guarantees year of retention unpowered (for enterprise, 3 months for customer) soooo good fucking luck. Many manufacturers don't even say the guaranteed retention in datasheet.
Also, once you finish setup your backups make a calendar entry to test restores at least every year (preferably more often).
> JESD218 spec only guarantees year of retention unpowered (for enterprise, 3 months for customer) soooo good fucking luck.
That retention spec is for a drive at the end of its write endurance. A drive that's less worn-out will have longer retention, more or less by definition (reduced retention is the main problem caused by the wear of writing lots of data). Also, IIRC it's one year for consumer and only 3 months for enterprise (albeit at a higher storage temperature).
I write this from a laptop with a Plextor M5Pro from 2015 which was forgotten on a balcony (above/below zero temp swings, humidity and all) for three straight years. It all comes down to manufacturing tolerances, and damn the standards. Even the battery still somewhat works.
For me they've been rock solid, and I've yet to see one die. I replace them much earlier than that because they just get too small to be useful.
I've got a 240 GB Intel SSD with 49254 hours on it. That's 5.6 years worth of constant work. Still kicking. Not doing anything heavy though. The only reason it's still doing anything is because it's in a firewall. It does run Prometheus though, so it's not completely idle.
My daily computer (tower pc) is coming up on 9 years. The OS drive is a 250gb Samsung SSD. I’d estimate that this computer on average has been running about 8 hours per day during its lifetime. But this is very much YMMV. (New computer is on its way, finally.)
Also, SSD only for music collection seems expensive. I’d get a 2TB (or whatever size seems reasonable) spinner, and put the rest of the funds towards backups (like Backblaze).
> "what if it suddenly dies and I loose everything?"
Can you tell the difference in "what if {HDD,SDD} suddenly dies and I loose everything?"?
> Also, how much is the average lifetime of a consumer-grade SSD these days?
Years.
Actually with all that TLC/QLC/xLC you can get even less than some old-time ones, but they are cheap and you can get much more than you need, so you will have plenty of resource to spare, ie if you have 250GBs of music - you can buy 1TB and have x4 'over-provision', if you have 1TB - buy 2TB or more.
And as others said, music collection is basically WORM so you probably would have the problem of finding an ancient USB3 controller to attach your ancient USB->SATA/NVME external drive for your new shiny holodeck with USB23423 in 2042 than it to die on you.
> "what if it suddenly dies and I loose everything?"
This is not a hypothetical; all drives do fail at some point - it's just a matter of when.
The only way to be safe are external backups, not even RAID (which in unlucky cases, can suffer from serial failure).
If one really doesn't want to spend any money, an option is to periodically backup to an external drive, although cloud backups are relatively cheap and automated, nowadays. But also this is subject to localized mass-failure, ie. burglaries.
> But every time I get a little bit paranoid about the idea of "what if it suddenly dies and I loose everything?" and then I start thinking about backuos, at which point I'm like "nah scree this".
If your data is that important you'll want 3 - 2 - 1. 3 copies of data: two in different media, one offsite(offsite includes cloud these days)
> But every time I get a little bit paranoid about the idea of "what if it suddenly dies and I loose everything?"
Buy 2. Sync the devices regularly. Better to have 2 512GB drives and deal with syncing them once a while and half the space than 1 1TB drive and it's all gone due to any issue.
Never completely depend on a single storage device.
> Are there any recommendations for particular manufacturers to look for?
Anything can always randomly suddenly fail. Low probability, but reasonably possible. So if it contains anything collectible, you'd want mirroring (ZFS is pure awesome) for availability and separate backups just in case.
While I know that SSDs will die eventually, this has yet to happen to any of my private drives.
Some of my drives are over 10 years old and still work something unheard of when it comes to rotating rust.
And those are regularly used too some with ridiculous uptimes measured in years.
The drives I replaced where changed for capacity reasons and not wear.
Our company runs a bunch of servers with SSDs and even there they hold up pretty well. The last error was caused by a faulty controller some months ago.
It would be survivor bias if I ignored the failing disks. However, all of these are from machines which are in my family from the beginning. I don't have a precise tracking of all the HDDs we ever had, but I remember maybe 4-5 of out of maybe 20 failing over the course of ~30yrs, which is not incredibly awesome, but far from ‶unheard of″.
I have multiple consumer grade drives made in 2008-2010 that have been used a lot and have more than 7 years of recorded time of the head flying over the platters and they still work fine. Some with zero reallocated sectors and others with 1-10 in the last decade. But the number hasn't changed for 3+ years in any drive so it doesn't bother me.
Before you say it, of course I keep backups of everything. That's why I still use decade old hardware without much worry.
I have 5 external SSD's that I use for backups (crucial brand), none more than 8 years old and 3 of the 5 have failed completely - luckily I keep multiple backups in multiple places, but no one should be lulled into a false sense of security with these things. They will and do fail, and often at the worst possible time. It sometimes keeps me up worrying if I have enough redundant backups in place.
I have never had an internal SSD fail. But I have had a SanDisk Extreme Portable SSD die after about 14 months. Looking at the reviews a ton of people have had the same thing happen to them.
External SSD's seem to be significantly less reliable. My theory is manufacturers are only designing them for short burst of data transfer. I often transfer 100's of gigabytes at a time and these external drives get extremely hot when doing so.
Can you post model numbers? I've never had good experiences with Crucial SSDs (internal or exteral) compared to Samsung, SanDisk and Intel. I've always bought the higher end models and not run an M.2 SSD in a small external enclosure like some do.
> two SSDs with high reliability ratings fail for us sequentially in a RAID array in a server,
Happens all the time when you buy storage in batches. We learned a lesson, go through 3-4 suppliers and split your orders over a three month period. Buying from one place in bulk is just guaranteeing data loss in your future.
I know that this idea nugget comes up time and time again when discussing drives, but I'm working on large storage systems for over 10 years now and dealing with the drives and their reliability and have yet to see such an occurrence.
I've dealt with deployed systems that had (in many different clusters) a total of upwards of 100K HDDs and also with 10K SSDs and for extended periods.
I saw tons of drive failures of many types but never even once did I see two or more drives of the same batch fail soon one after the other.
Individual sellers probably use the same shipping method for every order and they may order in large batches which are transported at the same time. Although SSDs are a bit more resilient, hard temperature spikes in a single batch won't be detected after shipping. In the past, we would do this for hard drives because although parked heads are safe, high G-loads could actually unpark a head.
There has been a case of a firmware bug, which caused SSDs to fail after a fixed amount of power-on time: https://news.ycombinator.com/item?id=32048148. There are some discussions of how RAID disks may serially fail, which shouldn't be inherent to SSDs.
I wonder is this "third-party" ranking meaningful. All first-party Samsung/Micron(Crucial)/Hynix ships a lot of modules and OEMs like Lenovo generally adopt them. I agree that Kingston is one of the top brands.
Adding onto the other comments here, this is not my experience at all. If a "spinning rust" harddrive makes it past a year or two, it seems to go on basically indefinitely. I have a drive from 2009 in active use, plus six (out of six) just about at ten years old in a NAS that get scrubbed monthly and have no errors. I've also had several that were around that age and only got replaced for performance or space reasons.
I've only had one fail several years in, and I'm pretty sure it was environmentals (smoke) that did it in.
I have a bunch of 3.5” external spinning hard drives, which are about 5 years at this point. Contrary to internet legends that that’s a recipe for data loss and really they should be in a NAS checking their integrity 24/7, they exhibit no data corruption whatsoever. (I do verify that periodically.)
For us in heavy write load lifetime ("till zero wear level indicator") is around 4-5 years (with some as short as 2), but I had case of personal SSD dropping bad blocks with 95% life left.
We do have some that sit at 1% life left and refuse to die tho
> Some of my drives are over 10 years old and still work something unheard of when it comes to rotating rust.
Not at all unheard of! HDDs tend to fail young, or last a very long time.
The oldest working hard drive I have is in my SPARCstation10, so they're about 30 years old now.
In my ZFS home server nearly all the drives are from its original build in 2010, so over 12 years old now. I had to replace one drive early in the life of that system (year 2-ish, forgot exactly) and the rest are going strong.
Over the summer I had this idea I wanted a full linux install on a USB stick I could move from machine to machine. Years ago I had splurged on a really nice Sandisk Extreme 64GB (200MB/s+) and so I installed to that.
After installing, on the first boot KDE immediately informed me my drive was moments from death. I really appreciated that, as I had no idea. I didn't know a thumb drive even had SMART capabilities. I had been using that drive in Windows for random things and it certainly never told me.
Checking SMART it said something to the tune of data loss expected within 24 hours. Yikes.
Did you backup the disk and then run it to see if the prediction was correct?
My first instinct would be to do the backup (of course), but a close second would be "yeah, that's probably not so precise as to make me be quite that lucky today..."
I thought about running `stress -d` on it and letting it go for an hour/day but decided I'd rather just label the drive "BAD" and keep it in my junk drawer. Sometimes tasks don't require reliability and I'd rather not destroy it if I can get a few more uses out of it.
Percentage Used: 2%
Data Units Read: 1,118,827,429 [572 TB]
Data Units Written: 432,707,145 [221 TB]
Power Cycles: 484
Power On Hours: 1,106
The power-on hours, during which the read-write activity happened, correspond to only 46 days, because the rest of the time the SSD was presumably powered-down by the OS, due to inactivity.
Wow, so that's what did it. That's absurd. You know, sometimes I regret looking too deeply into the firmware and hardware side of development. Makes me wonder if I'm not also sitting atop a ticking time bomb I haven't prepared for because the failure mode is so orthogonal to the otherwise entirely reasonable model of the hardware that I can also actually fit inside my head in a reasonable amount of time. Not a very comfortable feeling.
I just replace them once they reach 20% of "percentage (of their lifespan) used or so and then they become yet more drives on which I put on-site / offline backups.
More than half the failures I have seen are 'electronics/firmware' problems. Ie. It is not caused by degradation of the flash memory, but instead caused by the controller going bad, refusing to initialize, having a bad internal power supply, etc.
Obviously without the manufacturer's internal debug tools it's really hard to properly root cause issues.
In the old test of Sata SSDs, all written to death, most of the drives was accessible after the last eventual write. None or almost none of the drives were even visible to the OS after system reboot. It seems that while memory itself may still be readable, the controller and OS can't handle such failure mode and it certainly wasn't tested. So I wouldn't expect any modern SSD to be readable after exhausting all writes completely.
Officially, they're supposed to go into read-only mode. In practice, out of the around 2 dozens of failures I've seen, a single one was still readable and the rest were complete loss of data. I wouldn't count on it, personally.
I've had two failures.
One started audibly clicking and I replaced before full failure (it support said it was common).
Second failure was complete disconnect from the OS (USB drive) no warning. Once it cooled down i tried again, device appeared and reported no disk installed
Another approach is to look at hard use data from racked storage providers.
Eg:
BackBlaze: 2022 Drive Stats Mid-year Review
> As of June 30, 2022, there were 2,558 SSDs in our storage servers. This compares to 2,200 SSDs we reported in our 2021 SSD report. We’ll start by presenting and discussing the quarterly data from each of the last two quarters (Q1 2022 and Q2 2022).
And the Winner Is…At this point we can reasonably claim that SSDs are more reliable than HDDs, at least when used as boot drives in our environment. This supports the anecdotal stories and educated guesses made by our readers over the past year or so. Well done.
We’ll continue to collect and present the SSD data on a regular basis to confirm these findings and see what’s next. It is highly certain that the failure rate of SSDs will eventually start to rise. It is also possible that at some point the SSDs could hit the wall, perhaps when they start to reach their media wearout limits. To that point, over the coming months we’ll take a look at the SMART stats for our SSDs and see how they relate to drive failure. We also have some anecdotal information of our own that we’ll try to confirm on how far past the media wearout limits you can push an SSD. Stay tuned.
For some reason I was obsessed with this topic for a few hours once and ended up finding this YouTube video where someone basically did the experiment I was considering doing: https://www.youtube.com/watch?v=jHpSIBpvU0A .. hammering at some old SSDs non-stop to see how quickly he could break them.
This is why I prefer separate SSDs (NVM.e) on my system based on usage.
* `/root`, System/OS: fastest SSD with plenty of spare space
* `/home`: 1+ TB for data, games, media, etc...
* swap, `/tmp`, `/var`: separate cheap SSD (or even HDD) for frequent and unimportant writes
I don't care about SSD reliability (and failure modes) on laptop/desktop computers. I even run my desktop with two SSDs in RAID 0.
Even if you have a RAID 1 setup (two SSDs operating redundantly), there are plenty of points of failure, many of which are more likely than a modern SSD failing:
* Theft
* Physical damage
* Water damage
* Data corruption
* rm -rf
You need backups in any case, even if SSDs were 100% reliable.
Preface .There was a test about memory endurance of the SSD, where they were actually written to death and precise number of data was measured (not estimated). That was in the age of Sata SSDs, so not super recent, but memory cells were actually more durable in those times, due to bigger tech process and lower number of bits per cell. In the end 256 Gb drives failed after writing 1-3 petabytes of data.
So when I'm buying 1 Tb modern mid range drive, I expect about 4 Pb of writes on it, but I halve that number because of the worse tech process and more levels per cell, and additionally halve again due to by personal build (SFF PC) operating at higher temperatures, not good for the lifespan.
None of my drives have reached even 10% of 1 Pb yet, so I don't care too much about memory lifespan.
I use USB stick/thumb drives in STBs/PVRs and no doubt they take a thrashing when repeatedly recording TV programs. Like most of us, I've many of them and whilst I try to keep the TV ones separate from those I use on my computers they often get mixed when I need one for my PC in a hurry.
What I want to know is there any general purpose utility that is capable of testing a variety of brands of these devices and doing a decent assessment of their state/reliability?
It seems that drive manufacturers don't provide much help here as I've not seen any such utilites for them. I raised this matter with a SanDisk rep at a trade show several years back and he said he didn't know of any.
Same problem applies to SD cards (SDHC, SDXC, etc.), so any info on them would also be useful.
That's interesting, I have a 500GB Crucial SSD that's essentially new and it failed (it failed progressively over some days until the drive wasn't recognized). I've done everything to get the data off it as nothing I do can see the drive (it's no longer recognized as a SATA drive).
It may be worthwhile heating it up in the oven and see what happens (I can control the temp pretty accurately). Take the case off first etc. If it falls apart or the solder melts nothing's lost over and above what's happened already.
That's probably a bad idea. Unless I'm confusing it with a different storage technology, heating flash memory has the not-really-side effect of erasing the data on it. The reason it rejuvenates it is because that also erases most of the effects of wear. (Think writing and erasing on a piece of paper until it's unreadably covered in smeared half-erased pencil residue, then grinding the paper up into pulp and pouring a new sheet of paper.) It can't be worse than nothing, but it's unlikely to be any better.
I'd recommend instead (or at least first) desoldering the component memory chips and trying to read the data off of them directly with a microcontoller (bit-banging whatever protocol the drive uses internally). It's more (and slower and fiddlier) work, but also more likely to get at least some data back.
First, I didn't lose any data but only time when that SSD failed, so going to the extent of desoldering the memory chips isn't necessary. Fortunately, that SSD was only a backup/consolidation from other drives so I still had all that data.
The only reason for suggesting that course of action was out of curiosity, as I recall from old data would sometimes come to 'life' after we erased 2716s, 2732s etc. and then exposed them to excessive heat (it was never intended as a means of unerasing them after we'd exposed them to UV light.
The real issue remains and that's that manufacturers aren't prepared to tell us anything about them and I reckon that's a significant problem.
That's remarkable. I wonder if there could be a market for a device actually built with doing this in mind? Or at least a "recycling" service provided by a manufacturer that buys back the drive to put the old wine into a new bottle, so to speak, and create a new market for these "used" drives? Surely Google or Amazon would appreciate the profits that could come from getting more from their hardware investment like this.
Isn’t the problem with this, that they work till they fail. You can run some software that can tell you how dead the drive is, but you can’t figure how likely it will die. Some drives I think have some stats you can access, but generally you can’t tell until it fails.
There used to be a great site before that was a data center publishing physical hdd stats, not looked for it for a long time, but I presume they would have ssd stats these days too.
I think you're right about working until they fail but I've only ever had anecdotal evidence to this effect.
I also reckon much of the problem comes from the fact that information about them is proprietary—simply manufacturerers just don't tell us much about them as to do so may reveal trade secrets. Several decades ago I was involved in work where we had to have as large storage as possible irrespective of cost, back then a 1GB SanDisk was worth somewhere between $1k and $2k (we replaced single-sep time lapse remote monitoring film cameras with TV and needed large storage).
We approached SanDisk (about the only manufacturer with such large drives at the time) and they were very reticent about telling us anything worthwhile—even though we were a large international organization and had clout. We needed to know the reliability so we had to investigate it ourselves, whilst we made some progress it was never fully satisfactory.
Anecdotal info we learned from various sources was that manufacturerers had ways of testing them by altering the threshold voltage—the point where the gate potential would switch from 0 to 1. At a critical point one could check how many gates failed to switch and this voltage altered over time/with use. Monitoring this could provide useful info such as knowing when to retire a device before it failed.
How accurate this info is I don't know but it seems to make sense. If true, we users should be demanding of manufacturerers utilities that are capable of doing such testing. Trouble is, manufacturerers continue to maintain this secrecy.
PS: several days ago I put a brand new SanDisk 128GB in one of my PVRs and it's really hot to touch even when it's on standby (not recording TV). This isn't the first time I've noticed how hot they get. This isn't the PVR's fault as I've several different brands and the thumb drive gets very hot in each one including my PC. One wonders what this elevated temperature does to the reliability/service life.
Has there been an evaluation on whether the SMART data is valid/accurate?
It would seem that the vendor is incentivized to not report the most accurate data so their drive doesn't come across as bad and/or to avoid warranty claims.