
Reverse Engineering a NAND Flash Device Management Algorithm - jwise0
http://joshuawise.com/projects/ndfslave
======
Coko
I realize that this whole process was more than just data recovery (it's a
very valuable learning experience too), but if it was _just_ about data
recovery, couldn't he buy another SD card and re-solder the IC from the broken
board to the new one?

~~~
jwise0
Ha, yes, good point :-) I didn't write about that, but I did take some
pictures of failed attempts at that.

That's one of the first things I did, actually. After dumping the contents of
the flash off, I went on Amazon and hit 'reorder' on the same SD card that I'd
bought before. Unfortunately, it was not the same: in the picture [1], the
left is the one I'd purchased this time, and the right is the one I'd
destroyed. The deals that low-cost SD card makers get on NAND flash vary
greatly from day to day, so they just manufacture based on whatever controller
and flash combination they can get cheapest on any given day: even the same
SKU is unlikely to stay the same internally very long.

I did also try soldering to the BGA pads on the damaged one [2] [3], but no
joy: I imagine that there were some traces that went backwards on the board
before going towards the controller (for instance, to meet the TSOP leads),
and on inserting the SD card into my laptop, I still had no signs of life.

[1] [http://joshuawise.com/photos/etc/sd-card/sd-
fux-11.xscale.jp...](http://joshuawise.com/photos/etc/sd-card/sd-
fux-11.xscale.jpg)

[2] [http://joshuawise.com/photos/etc/sd-card/sd-
fux-13.xscale.jp...](http://joshuawise.com/photos/etc/sd-card/sd-
fux-13.xscale.jpg)

[3] [http://joshuawise.com/photos/etc/sd-card/sd-
fux-15.xscale.jp...](http://joshuawise.com/photos/etc/sd-card/sd-
fux-15.xscale.jpg)

~~~
flashsd
5 minutes of googling == 6 months of reverse engineering :)

recovery tools for SM2683EN flash controller:
[http://www.usbdev.ru/files/smi/](http://www.usbdev.ru/files/smi/)

xor formulas and block structure for Transcend card: [http://flash-
extractor.com/library/SM/EN2683/EN2683b%20BA__e...](http://flash-
extractor.com/library/SM/EN2683/EN2683b%20BA__ec_de_d5_7a__2x2)

~~~
jwise0
Aha, very good! Yes, I had seen the second link -- and, in fact, posted on the
Soft-Center forums at the time. It gave me some of the basic information, but
sadly, without the "key" to what some of that means, it's not terribly useful
to me :-( for instance, I'm still not sure what "xor 0186" means, and how that
translates to the whitening scheme I saw.

The "Update size" and "Update enable" did give me the idea to do what I called
'sector updates'. Do you have any more information on how those work?

I didn't have that 'usbdev.ru' site at the time. That page seems specific to
the USB versions, not the SD card (SM2683) parts; unfortunately, I speak very
little Russian. Do you have any particular parts I should ook at?

Thanks so much for any help you might be able to provide! I'd like to fill in
the blanks in my knowledge of these things; in particular, I'd feel a lot more
comfortable if I knew how the sector updates worked...

------
userbinator
You are lucky that the SD card you had used a discrete package for the flash -
to reduce costs, quite a few of them just encapsulate a bare die, which is
nowhere near as robust; even assuming the die didn't crack, trying to wirebond
one of those without special machinery is nearly impossible. MicroSD almost
exclusively is constructed this way.

There's also a very interesting article about reverse-engineering the
microcontroller used inside:
[http://www.bunniestudios.com/blog/?p=3554](http://www.bunniestudios.com/blog/?p=3554)

~~~
jwise0
Very interesting! I wonder where on the flash the firmware for the SD card is
stored -- or perhaps it's stored in the controller EEPROM? If I could dump it
out, that would be very valuable indeed.

------
aaron_l
My EE knowledge is a few years out of date, but I was surprised to learn that
excessive correlation between pages causes problems. The XOR key used for
decorrelation is apparently not too hard to reverse engineer, so I wonder if
this could be turned into an attack against solid state storage devices. Would
storing a particular data stream which becomes very correlated once the XOR is
applied lead to data corruption? Wear leveling and filesystems might make this
difficult to pull off, but it still scares me a bit.

~~~
bahahah
Due to close physical proximity, there will always be some degree of
capacitive coupling between the cells. This coupling will cause a cell's
potential to increase slightly when its neighbors are programmed. Having all
of your neighbors programmed to the highest potential state is the worst case,
as your delta V from coupling is greatest. If it is shifted enough, there
would be a bit error at that cell.

Data randomization seeks to mitigate this issue by normalizing the
distribution of states across the page. Having a single XOR key wouldn't do a
very good job for the reasons you noted. When I worked on flash, we used
elements of the address to seed a PRNG for data randomizing. So the XOR key
varies across the entire device.

There are other systems in place in flash to further mitigate these issues.
All programming is adaptive, using feedback between programming pulses to hit
the target. The pages within a block are intelligently ordered so that a
programmed cell cannot possibly have all of its neighbors programmed from
lowest to highest potential.

But yes, in general, if you had the right data stream, you would be able to
slightly degrade the BER, possibly past what the ECC can repair. There are a
lot of systems in place though, as NAND is inherently lossy to begin with.
These issues are compounded by MLC designs which have tighter margins per
cell.

SSDs have yet another layer of system mitigation. I know of at least one
manufacturer that disables NAND level randomizing in favor of encrypting every
bit of data that is programmed. Some drives have enough redundancy that they
can lose an entire flash die without losing data -- as if losing a disk in a
raid setup.

You probably shouldn't be storing anything important long term on a device
that programs NAND raw. i.e. flash drives and sd cards. They aren't designed
nor spec'd for high reliability.

~~~
StillBored
This whole XOR scheme seems destined to fail! Why not just use a 64b/66b (or
similar) encoding scheme?

~~~
bahahah
The XOR scheme is perfectly good enough. If it were a real issue affecting
customers it would be replaced (but it isn't).

The XOR scheme is extremely cheap (compact) and does not need to operate
serially on the data stream (good for performance). The only applications that
use the NAND provided randomizer are the cheapest of controllers. In fact,
even the SD controller in the linked article used their own XOR scheme. A
system designer can always turn off the builtin randomizer, and replace it
with whatever method they choose -- they all do for various reasons. At the
controller level it can be implemented in, typically, higher performance and
more compact logic processes. It does not need to be duplicated for
multichannel devices, as it would if it were in the NAND.

~~~
userbinator
_The XOR scheme is perfectly good enough_

...until someone finds a way to exploit it, as has happened with CD's "weak
sector" copy protection schemes. It's only a matter of when it will happen,
not if.

~~~
bahahah
Corrupting the storage of a test pattern isn't particularly useful. MAYBE, you
could cause premature tagging of bad blocks wearing out a flash drive/card
faster. If the system you are using is allowing these kinds of writes to your
storage device you have more pressing issues.

Only the most primitive SD/flash drive controllers actually use this scheme
anyway -- encryption is much better at randomizing.

------
kabdib
Very nice article.

I wrote a lot of the flash object store for the Apple Newton, back in 1992.
I've often wondered how many of the things we came up with were later patented
by other companies.

------
mng2
Impressive work and a fantastic writeup to boot. Kinda makes me want to
accidentally break something (okay not really).

------
kasperset
ECC explanation is also good.

