- Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.
- Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block.
- A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs.
- Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive.
- per the NVMe spec there are indicators of drive health in the SMART log page.
- Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc.
You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.
> - Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.
This is true, but despite all of the controversy about this feature it’s hard to encounter this in practical consumer use patterns.
With the 980 Pro 1TB you can write 113GB before it slows down. (Source https://www.techpowerup.com/review/samsung-980-pro-1-tb-ssd/... ) So you need to be able to source that much data from another high speed SSD and then fill nearly 1/8th of the drive to encounter the slowdown. Even when it slows down you’re still writing at 1.5GB/sec. Also remember that the drive is factory overprovisioned so there is always some amount of space left to handle some of this burst writing.
For as much as this fact gets brought up, I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations, but even in slow mode you’re filling the entire drive capacity in under 10 minutes.
> You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.
This has always been the case, thus why even a decade ago the “pro” drives were odd sizes like 120g vs 128g.
Products like that still exist today and the problem tends to show up as drives age and that pool shrinks.
DWPD and TB written like modern consumer drives use are just different ways of communicating that contract.
FWIW I’d you do a drive wide discard and then only partition 90% of the drive you can dramatically improve the garbage collection slowdown on consumer drives.
In the world of ML and containers you can hit that if you say have fstrim scheduled once a week to avoid the cost of online discards.
I would rather have visibility into the size of the reserve space through smart, but I doubt that will happen.
> You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.
I think it is safe to say that all drives have this. Refer to the available spare field in the SMART log page (likely via smartctl -a) to see the percentage of factory overprovisioned blocks that are still available.
I hypothesize that as this OP space dwindles writes get slower because they are more likely to get bogged down behind garbage collection.
> I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations
I agree. I agree so much that I question the assertion that drive slowness is a major factor in machines feeling slow. My slow laptop is about 5 years old. Firefox spikes to 100+% CPU for several seconds on most page loads. The drive is idle during that time. I place the vast majority of the blame on software bloat.
That said, I am aware of credible assertions that drive wear has contributed to measurable regression in VM boot time for a certain class of servers I’ve worked on.
Now that PCIe 5.0 SSDs are available since 6+ months and you could backup your SSD with 15 GB/s but:
> you’re still writing at 1.5GB/sec.
Except of few seconds at the start, the whole process lasts as if you had PCIe 2.0 (15+ years ago). Having so fast SSDs there is no chance to make a quick backup/restore. And during restore you're second time in a row too slow.
It's crazy that instead of using slow PLC at the time of slow PCIe 1.0, back then fast SLC was in use. Now with PCIe 5.0 when you really need fast SLC, you get slow TLC or very slow QLC or even worse PLC coming.
- Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.
- Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block.
- A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs.
- Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive.
- per the NVMe spec there are indicators of drive health in the SMART log page.
- Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc.