Hacker News new | past | comments | ask | show | jobs | submit login
Transforming a QLC SSD into an SLC SSD (theoverclockingpage.com)
251 points by userbinator 28 days ago | hide | past | favorite | 153 comments



You don't need to go through all that trouble to use most cheap DRAMless SSDs in pSLC mode. You can simply under-provision them by using only 25-33% capacity.

Most low end DRAMless controllers run in full disk caching mode. In other words, they first write *everything* in pSLC mode until all cells are written, only after there are no cells left they go back and rewrite/group some cells as TLC/QLC to free up some space. And they do it only when necessary, they don't go and do that in background to free up more space.

So, if you simply create a partition 1/3 (for TLC) or 1/4 (for QLC) the size of the disk, make sure the remaining empty space is TRIMMED and never used, it'll be always writing in pSLC mode.

You can verify the SSD you're interested in is running in this mode by searching for a "HD Tune" full drive write benchmark results for it. If The write speed is fast for the first 1/3-1/4 of the drive, then it dips to abysmal speeds for the rest, you can be sure the drive is using the full drive caching mode. As I said, most of these low-end DRAMless Silicon Motion/Phison/Maxion controllers are, but of course the manufacturer might've modified the firmware to use a smaller sized cache (like Crucial did for the test subject BX500).


How can I verify that things stay this way?

Partitioning off a small section of the drive feels very 160 GB SCSI "Let's only use the outer sectors".


Even keeping the drive always 75% empty would be enough, but partitioning off is the easiest way to make sure it's never exceed 25-33% full (assuming the drive behaves like that in the first place).

To verify the drive uses the all of the drive as a cache, you can run full drive sequential write test (like the one in HD Tune Pro) and analyze the speed graph. If, say, a 480GB drive writes at full speed for the first 120GB, and then the write speed drops for the remaining 360GB, this means the drive is suitable for this kind of use.

I think controllers might've been doing some GC jobs to always keep some amount of cells ready for pSLC use, but it should be a few GBs at most and shouldn't affect the use case depicted here.


> Partitioning off a small section of the drive feels very 160 GB SCSI "Let's only use the outer sectors".

In that it was very reliable at accomplishing the goal?


Short stroking bigger disks for higher IOPS and storage speed was a de-facto method in some HPC centers. Do this to a sufficiently large array and you can see unbelievable IOPS numbers for that generation of hardware.


That is what an ideal FTL would do if only a fraction of the LBAs are accessed, but as you say some manufacturers will customise the firmware to do otherwise, while this mod basically guarantees that the whole space is used as SLC.


Scroll down to the section called “SLC CACHING” towards the end of TFA. Your approach will work for only 45GB though the actual SLC cache is 120GB in size, because the process kicks in before the SLC is fully consumed (so it can page on and out of it).

If you don’t need 66% of the drive’s SLC capacity, the small partition approach is indeed easier and safer.


Oh, I've already did. You'll see that if you scroll up and read the last sentence in my comment above.


How do you make sure the empty space is trimmed? Can you trim a portion of a disk?


The literal answer is yes, an ATA TRIM, SCSI UNMAP, or NVMe Deallocate command can cover whatever range on a device you feel like issuing it for. (The device, in turn, can clear all, none, or part of it.) On Linux, blkdiscard accepts the -o, --offset and -l, --length options (in bytes) that map more or less exactly to that. Making a(n unformatted) partition for the empty space and then trimming it is a valid workaround as well.

But you’re most probably doing this on a device with nothing valuable on it, so you should be able to just trim the whole thing and then allocate and format whatever piece of it that you are planning to use.


Create a partition that you'll never use and run blkdiscard [1] from util-linux on it.

[1]: https://man7.org/linux/man-pages/man8/blkdiscard.8.html / https://github.com/util-linux/util-linux/blob/master/sys-uti...


AFAIK Windows runs TRIM when you format a partition. So you can create a dummy partition and format it. Then you can either delete or simply not use it.

On Linux, blkdiscard can be used in the same manner (create a dummy partition and run blkdiscard on it ex. *blkdiscard /dev/sda2).


If one prefers working with LVM for their devices, that can be a similar wrench. Making/removing a logical volume can do the same

It depends on 'issue_discards' being set in the config

This has drifted over time. I haven't quite rationalized why. Hoping someone can remind me if nothing else, away from a real computer for a bit.


Windows by default issues TRIMs basically instantly when deleting a file, and runs "Optimize disk" (which trims all free space) on a schedule by default as well.


You can go into Properties > Tools > Optimize, the same button that runs defrag on spinning drives runs TRIM on solid state devices.


What about external SSDs over USB? How do you trim those?


There are trim-equivalent commands in the ATA, SCSI, and NVMe command sets. So the OS can issue the SCSI command to the USB device that's using UASP, and the bridge chip inside the external SSD can translate that to the ATA or NVMe counterpart before passing it on to the SSD controller behind the bridge. Not all external SSDs or bridge chips actually support trim passthrough, but these days it's standard functionality.


I wonder how to do it on macOS then. I have several external SSDs and none of them can be trimmed.


https://kb.plugable.com/data-storage/trim-an-ssd-in-macos

Apparently macOS doesn't expose the ability for userspace to manually issue trim commands to block devices (or at least doesn't ship a suitable tool to do so), so the best available workaround is to tell the filesystem layer that it should do automatic trimming even on external drives.


:mind-blown:

i knew about "preconditioning" for SSDs when it comes to benchmarking, etc. didn't realize this was the why.

thanks!


What happens if the remaining space is TRIMMED but routinely accessed? (for example by dd, read only)


If a logical block address is not mapped to any physical flash memory addresses, then the SSD can return zeros for a read request immediately, without touching the flash.


Does the mapping happen on first write? Is TRIM then a command that signals the SSD to unmap that block?


Yes.


You read zeroes.


Not guaranteed by default for NVMe drives. There's an NVMe feature bit for "Read Zero After TRIM" which if set for a drive guarantees this behavior but many drives of interest (2024) do not set this.


Hmmm, when I quick-formatted a drive (which TRIMs the whole thing), then tried reading it back in a disk hex editor, I just saw zeroes.


It seems like that would be a likely common behavior for the FTL, but other options are possible (e.g., reading the old blocks) and it wasn't guaranteed by the spec, which is why they added this NVMe flag (so-called "DZAT") so that you can actually rely on it.


ssd firmwares are a mistake. they saw how easy it is to sell crap, with non ecc (i.e. bogus ram) being sold as the default and ran (pun intended) with it.

so if under provisioned now they work as pSLC, giving you more data resilience in short term but wasting more write cycles because they're technically writing 1111111 instead of 1. every time. if you fill them up then they have less data resilience.

and the best part, there's no way you can control any of it based on your needs.


> giving you more data resilience in short term but wasting more write cycles because they're technically writing 1111111 instead of 1. every time.

No, that's not how it works. SLC caches are used primarily for performance reasons, and they're faster precisely because they aren't doing the equivalent of writing four ones (and especially not seven!?) to a QLC cell.


Technically they are writing (0,0,0,1) instead of (0.0625).


This hack seems to take a 480GB SSD and transform it into a 120GB SSD

However the write endurance (the amount of data you can write to the SSD before expecting failures) increases from 120TB to 4000TB which could be a very useful tradeoff, for example if you were using the disk to store logs.

I've never seen this offered by the manufacturers though (maybe I haven't looked on the right place), I wonder why not?


There are companies selling SLC SSDs (often using TLC or QLC flash but not using that mode) for industrial applications, for example Swissbit.


But they cost far more than what SLC should be expected to cost (4x the price of QLC or 3x the price of TLC.) The clear answer to the parent's question is planned obsolescence.


> The clear answer to the parent's question is planned obsolescence.

The clear answer is lack of demand. People want capacity more than they want endurance they aren't going to use. Early SSD deaths are not yet a common issue, certainly not with MLC or TLC anyway. But capacity is an issue


I don't understand how the author goes from 3.8 WAF(Write Amplication Factor) to 2.0 WAF and gets a 30x increase in endurance. I'd expect about 2x from that.

From what I can see, he seems to be taking the 120TBW that the OEM warranties on the drive for the initial result, but then using the NAND's P/E cycles spec for the final result, which seems suspicious.

The only thing that I could be missing is the NAND going to pSLC mode somehow increases the P/E cycles drastically, like requiring massively lower voltage to program the cells. But I think that would be included in the WAF measure.

What am I missing?


QLC memory cells need to store and read back the voltage much more precisely than SLC memory cells. You get far more P/E cycles out of SLC because answering "is this a zero or a one?" remains fairly easy long after the cells are too worn to reliably distinguish between sixteen different voltage levels.


the author is wrong. what you mention is only true for actual SLC chip+firmware. qlc drivers probably don't even have the hardware to use the entire cell as slc, and they adopt one of N methods to save time/writes/power by underutilizing the resolution of the cell. neither gives you all the benefits, all increases the downsides to improve one upside.

and you can't choose.


Even if they do it in a slapdash way, it's still going to be 0 versus "pretty high" and that's a lot easier than gradients of 16ths. Dropping the endurance to match QLC mode would require intentional effort.


what's missing here: QLC bit is not 4-nery, it's 4 bit per bit. 2^4 = 16, so it's actually 16LC. That's contradictory to the name "quad level cell", yes, for some reason.


Respectfully: to the extent that I can understand what you're trying to say, you don't seem to know what you're talking about. Stop trying so hard to bash the whole industry for making tradeoffs you don't agree with, and put a little more effort into understanding how these things actually work.


we are all here reading a machined translated article from a pt_br warez forum on using the wrong firmware on a ssd controller to talk to the firmware on a "smart" nand flash. to mimic a semblance of control of your own device.

but yeah, I'm the delusional one and the industry is very sane and carrying for the wishes of the consumer. carry on.


See, you're still taking every opportunity to rant about "control", while continuing to get the technical details wrong (this time: the bit about firmware on a smart NAND chip, which is a real thing but not what the article is about). You're not even bothering to make a cogent case for why retail SSDs should expose this low-level control, just repetitively complaining. You could have actually made the conversation more interesting by taking a different approach.


i could complain about all day about how it's impossible to write a decent driver for ssd here, even hdd drivers seen like decent, which is a far cry, but besides amusing you, where do you think this will go?


I wonder if it would be useful as cache disks for ZFS or Synology (with further tinkering)?


To dive slightly into that: You don't necessarily want to sacrifice space for a read cache disk: having more space can reduce writes as you do less replacement.

But where you want endurance is for a ZIL SLOG (the write cache, effectively). Optane was great for this because of really high endurance and very low latency persistent writes, but, ... Farewell, dear optane, we barely knew you.

The 400GB optane card had an endurance of 73 PB written. Pretty impressive, though at almost $3/GB it was really expensive.

This would likely work but as a sibling commenter noted, you're probably better off with a purpose-built, high endurance drive. Since it's a write cache, just replace it a little early.


AIUI the slog is only for synchronous writes; most people using zfs at home don’t do any of those (unless you set sync=always which is not the default).

https://jrs-s.net/2019/05/02/zfs-sync-async-zil-slog/


Under provisioning is the standard recommendation for ZFS SSD cache/log/l2arc drives since those special types were a thing.


Optane 905p goes for $500 a piece (1T) I believe.


For how long?

Terrific for a hobby project, build farm, or even a business in a prototype stage (buy 3-4 then).

Hardly acceptable in a larger setting where continuity in 10 years is important. Of course, not the exact same part available in 10 years (which is not unheard of, though), but something compatible or at least comparable.


If you have a scenario where Optane makes sense today, in 10 years it'll be cost effective to use at least that much DRAM, backed by whatever storage is mainstream then and whatever capacitors or batteries you need to safely flush that DRAM to storage.

A dead-end product on clearance sale isn't the right choice for projects where you need to keep a specific mission-critical machine running for a decade straight. But for a lot of projects, all that really matters is that in a few years you can set up a new system with equal or better performance characteristics and not need to re-write your application to work well on the new hardware. I think all of the (vanishingly few) scenarios where Optane NVMe SSDs make sense fall into the latter category. (I feel sorry for anyone who invested significant effort into writing software to use Optane DIMMs.)


I’ve often wondered when the DRAM-backed storage revolution was going arrive.

Not long ago, 64GB SSDs were the bare minimum you could get away with, and only the most expensive setups had 64GB RAM. Now we’re seeing 64GB modules for consumer laptops priced reason cheap.

I wonder: if RAM prices head towards $0.05/GB (around $50 for the cheapest 1TB) that we’re currently seeing for SSDs, would that allow the dream of a legitimately useful RAM disk to become a reality?


> I wonder: if RAM prices head towards $0.05/GB (around $50 for the cheapest 1TB) that we’re currently seeing for SSDs, would that allow the dream of a legitimately useful RAM disk to become a reality?

Become? I strongly doubt it.

You can make a great RAM disk today, but if you don't find that useful enough then the future isn't going to make it much better.

In that future, you don't need a RAM disk for your 800GB compile folders to live in cache or for your zero-loading-screen games to stream data at 50GB/s off your PCIe 7.0 drive.


Big banks and stock traders do that. They just don't organize it like a disk, because what'd be the point?

(And, of course, you want ECC memory for that.)


In a larger setting you're looking at a much shorter hardware refresh time than 10 years.


Manufacturers offer that, in the form of TLC drives. Which are supported, unlike this hack which might cause data loss.

This gives you 120GB with 4000TB write endurance, but you can buy a 4TB TLC drive with 3000TB write endurance for $200.


Then you could use this technique to achieve something like a 1.2TB disk with 40PB TBW?

I’d be fascinated to hear any potential use cases for that level of endurance in modern data storage.


> use cases for that level of endurance in modern data storage.

All flash arrays. Saying that, as I have a bunch of smaller (400GB) 2.5" SAS SSDs combined into larger all-flash arrays, with each one of those SSD's rated for about 30PB of endurance.

I'm expecting the servers to be dead by the time that endurance is exhausted though. ;)


Exactly, I’ve done similar maths on my disks, and realised that it would be 20 years before they approach their end of life.

By which point, they will be replaced for some new tech that will be cheaper, faster and more reliable and power efficient.


Which drive would that be? The ones I'm seeing cost a lot more than $200.


My friend picked up a 3.84 TB Kingston SEDC600M with 7 PB of write endurance on sale for $180 a couple of months ago. That same place now sells them for around $360. Definitely an original drive. Maybe you just have to be on the lookout for one for when they go on sale.


SSD prices fluctuate a lot. I recently bought 4TB SSDs for 209eu but they are more expensive now (SNV2S/4000G, QLC though)


I'll second that question!


data longevity depends on implementation in the firmware, which you have zero visibility. most consumer drivers will lower longevity.


What isn't prominently mentioned in the article is that endurance and retention are highly related --- flash cells wear out by becoming leakier with each cycle, and so the more cycles one goes through, the faster it'll lose its charge. The fact that SLC only requires distinguishing between two states instead of 16 for QLC means that the drive will also hold data for (much) longer in SLC mode for the same number of cycles.

In other words, this mod doesn't only mean you get extreme endurance, but retention. This is usually specified by manufacturers as N years after M cycles; early SLC was rated for 10 years after 100K cycles, but this QLC might be 1 year after 900 cycles, or 1 year after 60K cycles in SLC mode; if you don't actually cycle the blocks that much, the retention will be much higher.

I'm not sure if the firmware will still use the stronger ECC that's required for QLC vs. SLC even for SLC mode blocks, but if it does, that will also add to the reliability.


About ten years ago I got my hands on some of the last production FusionIO SLC cards for benchmarking. The software was an in-memory database that a customer wanted to use with expanded capacity. I literally just used the fusion cards as swap.

After a few minutes of loading data, the kernel calmed down and it worked like a champ. Millions of transactions per second across billions of records, on a $500 computer... and a card that cost more than my car.

Definitely wouldn't do it that way these days, but it was an impressive bit of kit.


I worked at a place where I can say, FusionIO saved the company. W e had a single Postgres database which powered a significant portion of the app. We tried the kick off a horizontal scale project to little success around it - turns out that partitioning is hard on a complex, older codebase.

Somehow we end up with a FusionIO card in tow. We go from something like 5,000 read QPS to 300k reads QPS on pgbench using the cheapest 2TB card.

Ever since then, I’ve always thought that reaching for vertical scale is more tenable than I originally thought. It turns out hardware can do a lot more than we think.


The slightly better solution for these situations is to set up a reverse proxy that sends all GET requests to a read replica and the server with the real database gets all of the write traffic.

But the tricky bit there is that you may need to set up the response to contain the results of the read that is triggered by a successful write. Otherwise you have to solve lag problems on the replica.


You can get up to, I think, half a thousand cores in a single server, with multiple terabytes of RAM. You could run the entirety of Wikipedia's or Stack Overflow's o Hacker News's business logic in RAM on one server, though you'd still want replicas for bandwidth scaling and failover. Vertical scaling should certainly get back in vogue.

Not to mention that individual servers, no matter how expensive, cost a tiny fraction of the equivalent cloud.

Remember the LMAX Disruptor hype? Their pattern was essentially to funnel all the data for the entire business logic onto one core, and make sure that core doesn't take any bullshit - write the fastest L1-cacheable nonblocking serial code with input and output in ring buffers. Pipelined business processes can use one core per pipeline stage. They benchmarked 20 million transactions per second with this pattern - in 2011. They ran a stock exchange on it.


Back when the first Intel SSDs were coming out, I worked with an ISP that had an 8 drive 10K RAID-10 array for their mail server, but it kept teetering on the edge of not being able to handle the load (lots of small random IO).

As an experiment, I sent them a 600GB Intel SSD in laptop drive form factor. They took down the secondary node, installed the SSD, and brought it back up. We let DRBD sync the arrays, and then failed the primary node over to this SSD node. I added the SSD to the logical volume, then did a "pvmove" to move the blocks from the 8 drive array to the SSD, and over the next few hours the load steadily dropped down to nothing.

It was fun to replace 8x 3.5" 10K drives with something that fit comfortably in the palm of my hand.


In the nineties they used battery backed RAM that cost more than a new car for WAL data on databases that desperately needed to scale higher.


I’d also recommend this if you’re using eMMC in embedded devices. On a Linux system, you can use the `mmc` command from `mmc-utils` to configure your device in pSLC mode. It can also be done in U-Boot but the commands are a bit more obtuse. (It’s one-time programmable, so once set it’s irreversible.)

In mass-production quantities, programming houses can preconfigure this and any other eMMC settings for you.


That makes eMMC slightly less awful!


I wish this kind of deep dive with bus transfer rates was more common. It would be great to have a block diagram that lists every important IC model number / working clock frequency + bus width / working clock rate between these ICs for every SSD.


Some Kingston SSDs allow you to manage over-provisioning (i.e. to choose the capacity-endurance tradeoff) by using a manufacturer-provided software tool.


I don't think that would change how many bits are stored per cell, though? If you, say, set overprovisioning to 80%, then that's going to be 80% of the QLC capacity, and it's going to use the remaining 20% still in QLC mode, it's not going to recognize that it can use SLC with 20% of the SLC overprovisioned.


Yeah, all over provisioning does is gives the controller more spare cells to play with. The cells will still wear at the same rate as if you didn’t over provision, however depending on how the controller is wear leveling it could further improve the life of the drive because each cell is being used less often.

This mod (I only just skimmed the post) provides a longer life not by using the cells less often (or keeping more in reserve), but by extending each cells life by decreasing the tolerance of charge needed to store the state of the cell, but in return decreasing the bits that can be stored in the cell so decreasing the capacity.


It be nice if manufacturers provide a way to downgrade SSD to SLC via some driver settings.


While ssds do not, all flash chips do. So if you were ever going to try building your own SSD or simply connect some flash directly up to your soc via some extra pins, you would be able to program them this way. I imagine extending NVMe to offer this is possible if there was enough popular demand.


NVMe already supports low level reformatting.


Great thing about disks is that they don't require drivers at all. The driver settings Windows app is not going to be open sourced if such thing were to exist.


but how would they make more money?


I'd buy such a device. Currently in holding on to my last pair of SSDs from pre-QLC era, refusing to buy anything new


There are still new SSDs that use TLC, such as the Silicon Power UD90 (I have one in my system). Not only that, some of them will run in SLC mode when writing new data and then move the data to TLC later - advertised as SLC Caching - which could be even better than always-TLC drives (even ones with a DRAM cache).


Your comment, along with other users, suggests that TLC is a positive attribute for consumers, however, the transition from SLC and MLC NAND to TLC and QLC 3D-NAND actually marked a decline in the longevity of SSDs.

Using a mode other than SLC with current SSDs is insane due to the difference with planar NAND features, as the current 3D-NAND consumes writes for everything.

3D-NAND, To read data consume writes [0],

    " Figure 1a plots the average SSD lifetime consumed by the read-only workloads across 200 days on three SSDs (the detailed parameters of these SSDs can be found from SSD-A/-B/-C in Table 1). As shown in the figure, the lifetime consumed by the read (disturbance) induced writes increases significantly as the SSD density increases. In addition, increasing the read throughput (from 17MBps to 56/68MBps) can greatly accelerate the lifetime consumption. Even more problematically, as the density increases, the SSD lifetime (plotted in Figure 1b) decreases. In addition, SSD-aware write-reduction-oriented system software is no longer sufficient for high-density 3D SSDs, to reduce lifetime consumption. This is because the SSDs entered an era where one can wear out an SSD by simply reading it."
3D-NAND, Data retention consume writes [1],

    " 3D NAND flash memory exhibits three new error sources that were not previously observed in planar NAND flash memory:

    (1) layer-to-layer process variation, 
    a new phenomenon specific to the 3D nature of the device, where the average error rate of each 3D-stacked layer in a chip is significantly different;

    (2) early retention loss, 
    a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming; and

    (3) retention interference, 
    a new phenomenon where the rate at which charge leaks from a flash cell is dependent on the data value stored in the neighboring cell. "

[0] https://dl.acm.org/doi/10.1145/3445814.3446733

[1] https://ghose.cs.illinois.edu/papers/18sigmetrics_3dflash.pd...


Even datacenter-grade drives scarcely use SLC or MLC anymore since TLC has matured to the point of being more than good enough even in most server workloads, what possible need would 99% of consumers have for SLC/MLC nowadays?

If you really want a modern SLC drive there's the Kioxia FL6, which has a whopping 350,400 TB of write endurance in the 3TB variant, but it'll cost you $4320. Alternatively you can get 4TB of TLC for $300 and take your chances with "only" 2400 TB endurance.


Current tech meets the needs of normie home users (most of whom will never see even a full drive write) and big enterprise (servers have specific retirement schedules, each and every server is fully disposable, complete redundancies abound) but leave out in the cold those of us running small businesses and smaller data centers where machines are overprovisioned and not on a depreciation schedule, hopefully with a RAID 1(0) or ZFS/LVM being the safety mechanism in place.


I got 4TB of TLC for $230 (Silicon Power UD90). It even has SLC caching (can use parts of the flash in SLC mode for short periods of time).


True, I was looking at the prices for higher end drives with on-board DRAM, but DRAM-less drives like that UD90 are also fine in the age of NVMe. Going DRAM-less was a significant compromise on SATA SSDs, but NVMe allows the drive to borrow a small chunk of system RAM over PCIe DMA, and in practice that works well enough.

(Caveat: that DMA trick doesn't work if you put the drive in a USB enclosure, so if that's your use-case you should ideally still look for a drive with its own DRAM)


TLC cannot mature as long as it continues to use 3D-NAND without utilizing a more advanced material science. Reading data and preserving data consume writes, what degrades the memory, because the traces in the vertical stack of the circuit create interference.

Perhaps there are techniques available to separate the traces, but this would ultimately increase the surface area? which seems to be something they are trying to avoid.

You should not use datacenter SSD disks as a reference, as they typically do not last more than two years and a half. It appears to be a profitable opportunity for the SSD manufacturer, and increasing longevity does not seem to be a priority.

To be more specific, we are talking about planned obsolescence for consumer and enterprise SSD disks.

> If you really want a modern SLC drive there's the Kioxia FL6, which has a whopping 350,400 TB of write endurance in the 3TB variant, but it'll cost you $4320.

Did you read the OP article?


I agree wholeheartedly. It’s not something a large enterprise can do, but for my own home and multiple small business needs I purchased a good number of Samsung 960/970 Pro NVMe drives when they came out with the TLC 980 Pro.

I’m still rocking some older Optanes and scavenge them from retired builds. They’ll last longer than a new 990 Pro.


> Your comment, along with other users, suggests that TLC is a positive attribute for consumers, however, the transition from SLC and MLC NAND to TLC and QLC 3D-NAND actually marked a decline in the longevity of SSDs.

The bit that you're pointedly ignoring and that none of your quotes address is the fact that SLC SSDs had far more longevity than anyone really needed. Sacrificing longevity to get higher capacity for the same price was the right tradeoff for consumers and almost all server use cases.

The fact that 3D NAND has some new mechanisms for data to be corrupted is pointless trivia on its own, bordering on fearmongering the way you're presenting it. The real impact these issues have on overall drive lifetime, compared to realistic estimates of how much lifespan people actually need from their drives, is not at all alarming.

Not using SLC is not insane. Insisting on using SLC everywhere is what's insane.


> Your comment, along with other users, suggests that TLC is a positive attribute for consumers

TLC is better than QLC, which is specifically what my comment was addressing; I never implied that it's better than SLC though, so just don't, please.

It's interesting to see that 3D-NAND has other issues even when run in SLC mode, though.


> I never implied that it's better than SLC though, so just don't, please.

My apologies.

> It's interesting to see that 3D-NAND has other issues even when run in SLC mode, though.

Basically the SSD manufacturers are increasing capacity by adding more layers (3D-NAND). When one cell is read in the vertical stack, the interferences produced by the traces in the area increases the cells that need to be rewritten, what consumes the life of the device, by design.


> When one cell is read in the vertical stack, the interferences produced by the traces in the area increases the cells that need to be rewritten, what consumes the life of the device, by design.

You should try being honest about the magnitude of this effect. It takes thousands of read operations at a minimum to cause a read disturb that can be fixed with one write. What you're complaining about is the NAND equivalent of DRAM rowhammer. It's not a serious problem in practice.


Not NAND equivalent as the larger the stack, the larger the writings on the continuous cells, not just rewriting a single cell.

Here, the dishonest are the SSD manufacturers of the last decade, and they are feeling so comfortable as to introduce QLC into the market.

> It's not a serious problem in practice.

It's as serious as in to read data consume the disk, and the faster its read the faster it's consumed [0]. You should have noticed that SSD disks no longer come with a 10-year warranty.

    "under low throughput read-only workloads, SSD-A/-B/-C/-D/-E/-F extensively rewrite the potentially-disturbed data in the background, to mitigate the read (disturbance) induced latency problem and sustain a good read performance. Such rewrites significantly consume the already-reduced SSD lifetime. "
Under low throughput read-only workloads.

It is a paper from 2021, what means sci-hub can be used to read it.

[0] https://dl.acm.org/doi/10.1145/3445814.3446733


> It's as serious as in to read data consume the disk, and the faster its read the faster is consumed

Numbers, please. Quantify that or GTFO. You keep quoting stuff that implies SSDs are horrifically unreliable and burning through their write endurance alarmingly fast. But the reality is that even consumer PCs with cheap SSDs are not experiencing an epidemic of premature SSD failures.

EDIT:

> You should have noticed that SSD disks no longer come with a 10-year warranty.

10-year warranties were never common for SSDs. There was a brief span of time where the flagship consumer SSDs from Samsung and SanDisk had 10-year warranties because they were trying to one-up each other and couldn't improve performance any further because they had saturated what SATA was capable of. The fact that those 10-year warranties existed for a while and then went away says nothing about trends in the true reliability of the storage. SSD warranties and write endurance ratings are dictated primarily by marketing requirements.


In a 2min search,

https://www.reddit.com/r/DataHoarder/comments/150orlb/enterp...

    "So, on page 8's graphs, they show that 800GB-3800GB 3D-TLC SSDs had a very low "total drive failure" rate. But as soon as you got to 8000GB and 15000GB, the drives had a MASSIVE increase in risk that the entire drive has hardware errors and dies, becomes non-responsive, etc."
Study: https://www.usenix.org/system/files/fast20-maneas.pdf

(with video): https://www.usenix.org/conference/fast20/presentation/maneas


Would you care to explain how any of that supports the points you're actually making here?

Some of what you're spamming seems to directly undermine your claims, eg.:

> Another finding is that SLC (single level cell), the most costly drives, are NOT more reliable than MLC drives. And while the newest high density 3D-TLC (triple level cell) drives have the highest overall replacement rate, the difference is likely not caused by the 3D-TLC technology


"likely" not caused by. Any case I delete such spamming? link.

> Would you care to explain how any of that supports the points you're actually making here?

Other day, if you don't mind.


On the page 7 of the usenix study,

    "The last column in Table 1 allows a comparison of ARRs across flash types. A cursory study of the numbers indicates generally higher replacement rates for 3D-TLC devices compared to the other flash types. Also, we observe that 3D-TLC drives have consumed 10-15X more of their spare blocks."
Latter follows

    "we observe that SLC models are not generally more reliable than eMLC models that are comparable in age and capacity. For example, when we look at the ARR column of Table 1, we observe that SLC models have similar replacement rates to two eMLC models with comparable capacities [...] This is consistent with the results in a field study based on drives in Google’s data centers [29], which does not find SLC drives to have consistently lower replacement rates than MLC drives either. Considering that the lithography between SLC and MLC drives can be identical, their main difference is the way cells are programmed internally, suggesting that controller reliability can be a dominant factor."
What certainly follows,

    "Overall, the highest replacement rates in our study are associated with 3D-TLC SSDs. However, no single flash type has noticeably higher replacement rates than the other flash types studied in this work, indicating that other factors, such as capacity or lithography, can have a bigger impact on reliability."
So programmed obsolescence is present in the drivers, as well as in the 3D-NAND that degrades over time with reads (the chosen traces design, not the layers themselves). Interesting.

China, are you reading this? You have the opportunity to shake the market and dominate it globally, just by implementing a well-designed product, honest drivers and modest nm (not lowering to today's sizes, just enough to ensure decent energetic efficiency and good speed).


* Were I wrote "drivers" should be read as controllers and firmware.


The massive increase is still 1/500 chance per year.


> When one cell is read in the vertical stack, the interferences produced by the traces in the area increases the cells that need to be rewritten, what consumes the life of the device

So 3D-NAND suffers interference between the stacked layers? (Introducing Columnhammer... /s)


Samsung 870EVO (SSD), 980 Pro/990 Pro (NVMe) are all TLC drives. Kingston KC3000 is faster than 980 Pro, hence it's probably TLC, too.


A decent rule of thumb if that if a drive uses TLC, it will probably say so in the spec sheet.

If it's left ambiguous then it's either QLC, or a lottery where the "same" model may be TLC or QLC.


Kingston NV2 is in that "what you get may differ" category, and Kingston explicitly says that what you get may change. I have two NV2s with differing die count, for example. Their controller might be different too. They're external, short-use drives, so I don't care.

So, returning to previously mentioned ones, from their respective datasheets:

    - 870 EVO: Samsung V-NAND 3bit MLC
    - 980 Pro: Samsung V-NAND 3bit MLC
    - 990 Pro: Samsung V-NAND TLC
    - KC3000: NAND: 3D TLC
    - NV2: NAND: 3D // Explicit Lottery.


A bit confused … the article is about Ssd drive with 500- M. Is what it said and discussed more details here applied to nvme drive with 1000+ M. Same?


I bought up a lot of 960 and 970 Pro models when the 980 came out and it was TLC, to have MLC drives for logs and caches. Little did I know that TLC was just the beginning of the decline and QLC was right around the corner for even enterprise!


This table shows a lot of TLC models, at least some of which are still sold:

https://www.johnnylucky.org/data-storage/ssd-database.html


The point I was making is that there is no profit to be made by extending the life of drives. And sample size of one (i.e. you) is not representative of the market. There is always a demand for storage and people will keep buying worse products because there is no other choice.


I don't understand this logic. Consider the two possibilities here.

The first is that only weird tech people are interested in doing this. Then they might as well allow it because it's a negligible proportion of the market but it makes those customers favor rather than dislike your brand, and makes them more likely to recommend your devices to others, which makes you some money.

The second is that it would be widely popular and large numbers of customers would want to do it, and thereby choose the drives that allow it. Then if Samsung does it and SanDisk doesn't, or vice versa, they take more of the other's customers. Allowing it is the thing makes them more money.

Meanwhile the thing that trashes most SSDs isn't wear, it's obsolescence. There are millions of ten year old QLC SSDs that are perfectly operational because they lived in a desktop and saw five drive writes over their entire existence. They're worthless not because they don't work, but because a drive which is newer and bigger and faster is $20. It costs the manufacturer nothing to let them be more reliable because they're going in the bin one way or the other.

The status quo seems like MBAs cargo culting some heuristic where a company makes money in proportion to how evil they are. Companies actually make money in proportion to how much money they can get customers to spend with them. Which often has something to do with how much customers like them.


There are millions of ten year old QLC SSDs

In 2014 QLC was nothing but a research curiosity. The first QLC SSD was introduced in 2018:

https://www.top500.org/news/intel-micron-ship-first-qlc-flas...

You have to also remember that people buy storage expecting it to last. I have decades-old magnetic media which is tiny but still readable.


QLC has shipped in flash storage devices since 2009:

https://www.slashgear.com/sandisk-ships-worlds-first-memory-...

But that it doesn't really matter what people were using 10 years ago is the point. Devices from that era are of negligible value even if they're perfectly operational because they're tiny and slow.

The point you raise is a different one -- maybe you have an old device and you don't want to use it, you just want to extract the data that's on it. Then if the bits can no longer be read, that's bad. But it's also providing zero competition for new devices, because the new device doesn't come with your old data on it. The manufacturer has no reason to purposely want you to lose your data, and a very good reason not to -- it will make you hate them.


Wild! I had assumed this is a hardware level distinction


How many bits a particular NAND chip can store per cell is presumably hardware-level, but I believe it's possible to achieve SLC on all of them anyway, even if they support TLC or QLC.

Hell, the Silicon Power NVMe SSD I have in my machine right now will use SLC for writes, then (presumably) move that data later to TLC during periods of inactivity. Running the NAND in SLC mode is a feature of these drives, it's called "SLC caching".


Of course it is trivial to just write 000 for zero and 111 for one in the cells of a TLC SSD to turn it into effectively a SLC SSD, but that in itself doesn't explain why it's so much faster to read and write compared to TLC.

For example, if it had been DRAM where the data is stored as charge on a capacitor, then one could imagine using a R-2R ladder DAC to write the values and a flash ADC to read the values. In that case there would be no speed difference between how many effective levels was stored per cell (ignoring noise and such).

From what I can gather, the reason the pseudo-SLC mode is faster is down to how flash is programmed and read, and relies on the analog nature of flash memory.

Like DRAM there's still a charge that's being used to store the value, however it's not just in a plain capacitor but in a double MOSFET gate[1].

The amount of charge changes the effective threshold voltage of the transistor. Thus to read, one needs to apply different voltages to see when the transistor starts to conduct[2].

To program a cell, one has to inject some amount of charge that puts the threshold voltage to a given value depending on which bit pattern you want to program. Since one can only inject charge, one must be careful not to overshoot. Thus one uses a series of brief pulses and then do a read cycle to see if the required level has been reached or not[3], repeating as needed. Thus the more levels per cell, the shorter pulses are needed and more read cycles to ensure the required amount of charge is reached.

When programming the multi-level cell in single-level mode, you can get away with just a single, larger charge injection[4]. And when reading the value back, you just need to determine if the transistor conducts at a single level or not.

So to sum up, pseudo-SLC does not require changes to the multi-level cells as such, but it does require changes to how those cells are programmed and read. So most likely it requires changing those circuits somewhat, meaning you can't implement this just in firmware.

[1]: https://en.wikipedia.org/wiki/Flash_memory#Floating-gate_MOS...

[2]: https://dr.ntu.edu.sg/bitstream/10356/80559/1/Read%20and%20w...

[3]: https://people.engr.tamu.edu/ajiang/CellProgram.pdf

[4]: http://nyx.skku.ac.kr/publications/papers/ComboFTL.pdf


> but it does require changes to how those cells are programmed and read. So most likely it requires changing those circuits somewhat, meaning you can't implement this just in firmware

Fortunately everyone shipping TLC/QLC disks needs to use a pSLC cache for performance reasons, so that hardware is already there.


Could this be used to extend the lifetime of an already worn-out SSD? I wonder if there's some business in china taking those and reflashing them as "new".


The only rejuvenation process that I know is heat, either long period exposure to 250°C or short-term at higher temperature (800°C).

https://m.hexus.net/tech/news/storage/48893-making-flash-mem...

https://m.youtube.com/watch%3Fv%3DH4waJBeENVQ&sa=U&ved=2ahUK...


That first article was 12 years ago when MLC was the norm and had 10k endurance.

Macronix have known about the benefits of heating for a long time but previously used to bake NAND chips in an oven at around 250C for a few hours to anneal them – that’s an expensive and inconvenient thing to do for electronic components!

I wonder if the e-waste recycling operations in China may be doing that to "refurbish" worn out NAND flash and resell it. They already do harvesting of ICs so it doesn't seem impossible... and maybe this effect was first noticed by someone heating the chips to desolder them.


So the trick is to somehow redirect all of the heat energy coming from cpus onto the storage, in bursts? :D


Stay closer, now they're putting heatsinks on controller chips of SSDs :D


Maybe 6-8 years ago, Samsung (?) used to advise foregoing a heatsink on their nvme sticks for optimal results.


Technically, QLC NAND that is no longer able to distinguish at QLC levels should certainly still be suitable as MLC for a while longer, and SLC, for all practical intents and purposes, forever.


Yes, but certainly no consumer or even enterprise ssd firmware has bothered to integrate that functionality.


I thought memory QLC and TLC memory chips are different at the physical level, not that is just a matter of firmware.


There are physical differences, QLC requires more precise hardware, since you need to distinguish between more charge levels. But you can display a low-quality picture on a high-definition screen, or in a camera sensor average 4 physical pixels to get a virtual one, same thing here, you combine together some charge levels for increased reliability.

Put another way, you can turn a QLC into a TLC, but not the other way around.


The memory cells are identical. The peripheral circuitry for accessing the memory array gets more complicated as you support more bits per cell, and the SRAM page buffers have to get bigger to hold the extra bits. But everyone designs their NAND chips to support operating with fewer bits per cell.

Sometimes TLC and QLC chips will be made in different sizes, so that each has the requisite number of memory cells to provide a capacity that's a power of two. But it's just as common for some of the chips to have an odd size, eg. Micron's first 3D NAND was sold as 256Gbit MLC or 384Gbit TLC (literally the same die), and more recently we've seen 1Tbit TLC and 1.33Tbit QLC parts from the same generation.


Is it possible for SSD firmware to act “progressively” from SLC to MLC to TLC and to QLC (and maybe PLC in the future)? E.g. for a 1TB QLC SSD, it would act as SLC for usage under 256GB, then MLC under 512GB, then TLC under 768GB, and then QLC under 1TB (and PLC under 1280GB).


It's theoretically possible, but in practice when a drive is getting close to full what makes sense is to compact data from the SLC cache into the densest configuration you're willing to allow, without any intermediate steps.


That's just a normal ssd rated at the QLC capacity.


DIWhy type stuff. Still, fun hack. TLC media has plenty of endurance. We see approximately 1.3-1.4x NAND write amplification in production workloads at ~35% fill rate with decent TRIMing.


TFA reports a maximum write amplification factor of 3.75, which puts you in different ballgame territory altogether.


That's why I mentioned it. They claimed it improved to 1.8 in pSLC mode.


Not surprising given that it doesn't have to "compact" the data into TLC mode and thus rewrite it, if the whole drive is in SLC mode.


It mentions the required tool being available from um... interesting places.

Doing a Duck Duck Go search on the "SMI SM2259XT2 MPTool FIMN48 V0304A FWV0303B0" string in the article shows this place has the tool for download:

https://www.usbdev.ru/files/smi/sm2259xt2mptool/

The screenshot in the article looks to be captured from that site even. ;)

Naturally, be careful with anything downloaded from there.


There were several instances were I saw an interesting tool for manipulating SSDs and SD cards only available from strange Russian websites. This one at least has an English UI ... A lot of research seems concentrated there and I wonder why it did not catch the same level of interest in the west.


These are genuine factory tools supplied by chip vendors such as Silicon Motion, supposedly under NDA, leaked and passed around loosely among Chinese factories. These things are sometimes repacked with malware installers, so blindly running on your dev machine with AWS keys might not be the best idea. Trying to run it on Linux, macOS, or rewriting in Rust might not be great too.

It doesn't happen in the West because manufacturing happens in China in Chinese language. I suppose it's easier for Russian guys to (figuratively) walk into their smoke room and ask for the USB key.


and I wonder why it did not catch the same level of interest in the west.

Because people in the west are too scared of IP laws.


Yeah. That site has a lot of info for a huge number of flash controllers/chipsets/etc.

Wish I had a bunch of spare time to burn on stuff like this. :)


Good to see they are still available.

The wide variety of controller/memory combinations makes it quite a moving target.

This is the "mass production" software that makes it possible to provision, partition, format, and even place pre-arranged data or OS's in position before shipping freshly prepared drives to a bulk OEM customer. On one or more "identical" drives at the same time.

For USB flash thumb drives the same approach is used. Factory software like this which is capable of modifying the firmware of the device is unfortunately about the only good way to determine the page size and erase block size of a particular USB drive. If the logical sectors storing your information are not aligned with the physical memory blocks (which somewhat correspond to the "obsolete" CHS geometry), the USB key will be slower than necessary, especially on writes. Due to write-amplification, and also it will wear out much sooner.

Care does not go into thumb drives like you would expect from SSDs, seems like very often a single SKU will have untold variations in controller/memory chips. Also it seems likely that during the production discontinuities when the supply of one of these ICs on the BOM becomes depleted, it is substituted with a dissimilar enough chip that a revision of the partitioning, formatting, and data layout would be advised, but does not take place because nobody does it. And it still works anyway so nobody notices or cares. Or worse, it's recognized as an engineering downgrade but downplayed as if in denial. Wide variation in performance within a single SKU is a canary for this, which can sometimes be rectified.


I was unable to find the source code, so it is important to be careful. In my case it sounds like a faith jump that I don't have (my apologies to the developers).

In any case, this is a feature that manufacturers should provide. I wonder how it could be obtained.


That is the actual manufacturer's tool.


> I wonder how it could be obtained.

Reverse engineering and a huge amount of free time.


oh that western superiority complex... hits once again... beside the mark


> western superiority complex

What are you on about?


it's a well-known site and you could check that with web.archive.org dating it back to at least 2013.


What does that have to do with?

> western superiority complex


In countries where people have been less conditioned to be mindless sheep, you can more easily find lots of truth that doesn't toe the official line.

Spreading xenophobic FUD only serves to make that point clearer: you can't argue with the facts, so you can only sow distrust.


> Spreading xenophobic FUD

?


Silicon Motion controllers are trash.


I thought they were the best in class. What's the alternatives?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: