Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Engineering a NAND Flash Device Management Algorithm (joshuawise.com)
101 points by jwise0 on Aug 4, 2014 | hide | past | favorite | 29 comments

I realize that this whole process was more than just data recovery (it's a very valuable learning experience too), but if it was just about data recovery, couldn't he buy another SD card and re-solder the IC from the broken board to the new one?

Ha, yes, good point :-) I didn't write about that, but I did take some pictures of failed attempts at that.

That's one of the first things I did, actually. After dumping the contents of the flash off, I went on Amazon and hit 'reorder' on the same SD card that I'd bought before. Unfortunately, it was not the same: in the picture [1], the left is the one I'd purchased this time, and the right is the one I'd destroyed. The deals that low-cost SD card makers get on NAND flash vary greatly from day to day, so they just manufacture based on whatever controller and flash combination they can get cheapest on any given day: even the same SKU is unlikely to stay the same internally very long.

I did also try soldering to the BGA pads on the damaged one [2] [3], but no joy: I imagine that there were some traces that went backwards on the board before going towards the controller (for instance, to meet the TSOP leads), and on inserting the SD card into my laptop, I still had no signs of life.

[1] http://joshuawise.com/photos/etc/sd-card/sd-fux-11.xscale.jp...

[2] http://joshuawise.com/photos/etc/sd-card/sd-fux-13.xscale.jp...

[3] http://joshuawise.com/photos/etc/sd-card/sd-fux-15.xscale.jp...

5 minutes of googling == 6 months of reverse engineering :)

recovery tools for SM2683EN flash controller: http://www.usbdev.ru/files/smi/

xor formulas and block structure for Transcend card: http://flash-extractor.com/library/SM/EN2683/EN2683b%20BA__e...

Aha, very good! Yes, I had seen the second link -- and, in fact, posted on the Soft-Center forums at the time. It gave me some of the basic information, but sadly, without the "key" to what some of that means, it's not terribly useful to me :-( for instance, I'm still not sure what "xor 0186" means, and how that translates to the whitening scheme I saw.

The "Update size" and "Update enable" did give me the idea to do what I called 'sector updates'. Do you have any more information on how those work?

I didn't have that 'usbdev.ru' site at the time. That page seems specific to the USB versions, not the SD card (SM2683) parts; unfortunately, I speak very little Russian. Do you have any particular parts I should ook at?

Thanks so much for any help you might be able to provide! I'd like to fill in the blanks in my knowledge of these things; in particular, I'd feel a lot more comfortable if I knew how the sector updates worked...

Well, 5 minutes of googling and 6 months of learning russian :)

Very cool hack, I love reading stuff like this, thank you for sharing your experience and knowledge. I was going to ask the same question about just re-soldering onto a new SD Card of a similar type, I was wondering if the controllers can detect the exact chip type they are connected to, if so it might work. Also, for de-soldering and soldering SMT's the hot air re-work station makes it easy, never hurts to have the right tools :)

Unfortunately, it was not the same: in the picture [1], the left is the one I'd purchased this time, and the right is the one I'd destroyed.

The one you destroyed has a single Samsung 128Gbit TLC flash; the one you bought has a pair of Micron 64Gbit MLC. I'd say the latter is almost certainly better from a reliability perspective, and probably even cost more to manufacture.

Maybe, if they're buying the memory at market prices. I believe that for high-volume NAND flash consumers, there is probably more of a spot-pricing scheme in place: for whatever reason, if Micron had a whole bunch of 64Gbit MLC around because (say) HTC stopped making a phone yesterday, then Transcend would be plenty happy to scoop it up for a low price.

(edit: Googling for NAND flash spot pricing results in http://www.dramexchange.com , which seems to confirm those sorts of suspicions. I think the market is probably pretty volatile...)

SD cards are the lowest bin tier as well, given their low performance requirements and low margins relative to SSDs, embedded designs, etc. The leftovers and rejects tend to end up in that channel.

In that vein, the 64Gbit micron devices may in fact be 128Gbit die with half dead arrays -- so they may have a similar process node and reliability to the samsung device.

The MLC is undoubtedly superior to TLC however.

Ah I see, it would've been too easy then :). Thanks for the detailed response!

Brilliant work and excellent write-up. Thanks for sharing your efforts!

Just chiming in to second this.

Crazy to think what archaeologists may have to deal with in a 1000 years. Or (a little more sci-fi) findings at other planets.

I'm always humbled when I see other peoples soldering skills. goddamn. And I love reading this type of stuff, doing things just "because". I found a rusty usb under a overpass a few months ago, after some cleaning/soldering/tricks I was finally able to read it and it turned out to be some kids schoolwork from 3 years ago haha

on a side I would suggest posting this over at hackaday, the marketers-pretending-to-be-hackers crowd here won't appreciate this.

Thanks. My soldering skills are not particularly special, though magnification certainly helps steady the hand! In the end, the real saviour here was the Schmartboard, which has these nicely recessed divots in the board that you can just 'push' solder along.

I'll consider sending this to Hackaday, too -- thanks for the reminder. That said, I've found that the HackerNews audience is pretty diverse in interest; you might be surprised what comes between the startup fever...

I loved the writeup! thanks for the detailed writeup and notes. Very impressive.

although the account is new I've been lurking for some time. [deleted tirade] in short, no I would not and am not surprised. this article was a nugget buried inside a mountain of irrelevant shit.

You are lucky that the SD card you had used a discrete package for the flash - to reduce costs, quite a few of them just encapsulate a bare die, which is nowhere near as robust; even assuming the die didn't crack, trying to wirebond one of those without special machinery is nearly impossible. MicroSD almost exclusively is constructed this way.

There's also a very interesting article about reverse-engineering the microcontroller used inside: http://www.bunniestudios.com/blog/?p=3554

Very interesting! I wonder where on the flash the firmware for the SD card is stored -- or perhaps it's stored in the controller EEPROM? If I could dump it out, that would be very valuable indeed.

My EE knowledge is a few years out of date, but I was surprised to learn that excessive correlation between pages causes problems. The XOR key used for decorrelation is apparently not too hard to reverse engineer, so I wonder if this could be turned into an attack against solid state storage devices. Would storing a particular data stream which becomes very correlated once the XOR is applied lead to data corruption? Wear leveling and filesystems might make this difficult to pull off, but it still scares me a bit.

Due to close physical proximity, there will always be some degree of capacitive coupling between the cells. This coupling will cause a cell's potential to increase slightly when its neighbors are programmed. Having all of your neighbors programmed to the highest potential state is the worst case, as your delta V from coupling is greatest. If it is shifted enough, there would be a bit error at that cell.

Data randomization seeks to mitigate this issue by normalizing the distribution of states across the page. Having a single XOR key wouldn't do a very good job for the reasons you noted. When I worked on flash, we used elements of the address to seed a PRNG for data randomizing. So the XOR key varies across the entire device.

There are other systems in place in flash to further mitigate these issues. All programming is adaptive, using feedback between programming pulses to hit the target. The pages within a block are intelligently ordered so that a programmed cell cannot possibly have all of its neighbors programmed from lowest to highest potential.

But yes, in general, if you had the right data stream, you would be able to slightly degrade the BER, possibly past what the ECC can repair. There are a lot of systems in place though, as NAND is inherently lossy to begin with. These issues are compounded by MLC designs which have tighter margins per cell.

SSDs have yet another layer of system mitigation. I know of at least one manufacturer that disables NAND level randomizing in favor of encrypting every bit of data that is programmed. Some drives have enough redundancy that they can lose an entire flash die without losing data -- as if losing a disk in a raid setup.

You probably shouldn't be storing anything important long term on a device that programs NAND raw. i.e. flash drives and sd cards. They aren't designed nor spec'd for high reliability.

This whole XOR scheme seems destined to fail! Why not just use a 64b/66b (or similar) encoding scheme?

The XOR scheme is perfectly good enough. If it were a real issue affecting customers it would be replaced (but it isn't).

The XOR scheme is extremely cheap (compact) and does not need to operate serially on the data stream (good for performance). The only applications that use the NAND provided randomizer are the cheapest of controllers. In fact, even the SD controller in the linked article used their own XOR scheme. A system designer can always turn off the builtin randomizer, and replace it with whatever method they choose -- they all do for various reasons. At the controller level it can be implemented in, typically, higher performance and more compact logic processes. It does not need to be duplicated for multichannel devices, as it would if it were in the NAND.

The XOR scheme is perfectly good enough

...until someone finds a way to exploit it, as has happened with CD's "weak sector" copy protection schemes. It's only a matter of when it will happen, not if.

Corrupting the storage of a test pattern isn't particularly useful. MAYBE, you could cause premature tagging of bad blocks wearing out a flash drive/card faster. If the system you are using is allowing these kinds of writes to your storage device you have more pressing issues.

Only the most primitive SD/flash drive controllers actually use this scheme anyway -- encryption is much better at randomizing.

This reminds me of the CD copy-protection schemes that rely on "weak sectors" - data that, once passed through the whitening scheme used for CDs, created long runs of 0s/1s that made them very difficult to read and even more difficult to duplicate in CD-R burners.

I agree that it also scares me when certain patterns of data are essentially harder to store than others - and the "solution" is to just make it statistically unlikely, as opposed to using a more robust encoding like RLL that guarantees worst-case behaviour (although requires more overhead).

This problem of some sequences of bits being problematic has actually been around for a long time - see http://en.wikipedia.org/wiki/Lace_card for example - and is one of the reasons for the odd character layout of EBCDIC.

I also wondered this! I suspect it should be possible to do that; it would certainly be an interesting attack vector to try on cloud storage systems...

Very nice article.

I wrote a lot of the flash object store for the Apple Newton, back in 1992. I've often wondered how many of the things we came up with were later patented by other companies.

Impressive work and a fantastic writeup to boot. Kinda makes me want to accidentally break something (okay not really).

ECC explanation is also good.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact