
Anatomy of a Solid-state Drive - Anon84
http://queue.acm.org/detail.cfm?id=2385276
======
Luyt
_TRIM/UNMAP. A common problem with SSDs is that a host file system could erase
free data but didn't have a way of telling the storage device it no longer
needed that data. The TRIM/UNMAP interface lets the SSD clear the LBA (logical
block addressing) entries in the FTL, giving it more free space to use in
garbage collection and reducing write amplification. OS/X, Microsoft Windows,
and Linux have implemented TRIM._

FreeBSD has TRIM support, too.

~~~
jethroalias97
I feel like what they are describing is more defragmentation than garbage
collection. The article uses garbage collection to mean taking sparsely filled
blocks and consolidating them into a single block to free the sparsely filled
ones. Garbage collection in the OO context is where you free memory the
program can no longer interact with. Perhaps it means different things in
different contexts?

~~~
wmf
A sparsely-filled block contains some valid data and some old invalid data,
i.e. garbage.

------
JoeAltmaier
A picture would have been worth 1000 words here.

------
revelation
I wonder why there aren't more products that use DDR3 memory for storage.
Constant power supply and a reasonable small emergency battery seem no-
brainers for server operations and you don't have these creeping failure modes
as with SSD. Is it the lack of error correction? The need to refresh the
state?

~~~
weaksauce
lack of permanence really. I have seen servers that go down for more than a
few hours or even a few days. It's basically a game of chicken with your data.
What happens when your power goes out for longer than you can provide backup
supply to? You lose all your data that was on the DDR3.

The basic way that they deal with that is to use a two phase system. Lots of
DDR3 and servers running a caching layer on them(memcached, etc...). You can
buffer your writes in ram as well but that as always is a tradeoff.

~~~
ghshephard
"What happens when your power goes out for longer than you can provide backup
supply to?" - You flush to Spinning disk.

~~~
tisme
Tough to flush to spinning disk _after_ your power goes down. So you'd have to
do it continuously, which would remove most of the performance gains.

~~~
ghshephard
You include a battery in the package. You only need enough power to spin up
the disk, flush from your RAM cache, and then power down. In fact, the DDR3
Disk Drive could include all three, RAM+Battery+Disk. Say, 64 GB RAM in DDR3
PC3-10600 1024Meg x 64 ( Crucial Part #: CT2KIT102464BA1339) for $600, 2.5" 80
Seagate ST980815A for $44, and a 10.8v, 4800mah (overkill for flushing, but
they are cheap) PA3534U-1BRS battery for $19.81.

Add a charging element for battery $3.75, a controller for $5, Ram Sockets
board for $4.25, Disk Interface connector for $2.00, case for $7.00, assembly
for $6.50, assorted screws/packaging for $1.00.

You could sell a 64 GB DDR3 disk with backup disk for $693.5, 87% of which
would be the cost of the memory itself - large such systems would be mostly
the cost of the memory, as the other components (disk) don't increase in cost
much, and except for the ram sockets, none of the others increase in cost at
all.

Battery would need to be swapped out every four years, or so, but that would
only be $20 cost each time.

~~~
gus_massa
Looks good, but people will forget to change the batteries and then fill the
forums with complains when they lost all their data.

------
ck2
Does it make sense anymore for SSD to go through a hard drive interface?

Even SATA3 cannot keep up with a raid of SSD.

~~~
ChuckMcM
No it doesn't. It never did. However from a marketing / market acceptance
perspective, it was a requirement. When you introduce a technology like flash
you have to ask "Who is going to use this?" and "How?" The first mass market
winner for flash was smartphones and digital cameras. Remember when Apple
bought all the available Flash to launch the second generation iPhone? That
flash capacity wasn't needed when people weren't building a new brand of
camera or phone so some folks started putting this tech into USB sticks, and
those were _really_ popular. So the USB sticks and digital cameras and USB
based card readers all made the flash appear like a disk drive and that meant
that it could immediately be put to use by consumers. And _that_ sort of
cemented the idea that "Flash is for disks" into the minds of many people and
folks have built a huge market around that.

Of course what started this revolution was memory, or EPROM to be precise.
Back when dinosaurs walked the earth you stored firmware in a chip that
physically had a window on the top of it. This was a "memory" chip that you
would write by injecting charge into a transistor gate by forcing electrons to
tunnel across to it (or out of it depending on the technology). But you needed
a special programmer to do that, and to erase it, you had to get rid of those
charges so you literally shined an ultraviolet light thru the window and the
photons kicked the electrons right out of dodge. It was painful and the chip
companies responded with something called "EEPROM" or electrically erasable
programmable readonly memory. Which you could erase with a special high
voltage signal on the motherboard doing the work of the ultraviolet lamp in
previous generations. As density grew and the erase time shortened,
manufacturers added the ability to only erase part of the memory but they
could do that reasonably quickly "in a flash" as it were, and to distinguish
memories that could be quickly erased from those which used older, and slower,
technologies they started calling them "Flash" memories.

Of course as the article points out, Flash is nothing at all like a Hard
drive, that people use it that way is an artifact. It much more closely
represents the characteristics of something called "Drum Memory" [1] which,
back when actual random access memory was very expensive to produce, made
computers better. The reason was that a drum had a lot of heads and spun a
piece of ferro magnetic material under those fixed heads. What this meant was
that there was no 'seek' time, you picked the head you wanted electronically
and you could read and write a few hundred to a couple of kilobytes of data.
This enabled virtual memory in a big way because if you matched the amount of
data on the drum with a 'page' of memory you could simply write out a page of
memory or read in a page of memory faster than either tape or disk. The only
problem was that you had to read all of the drum's track and write all of the
track so a read-modify-write cycle meant reading in the track, modifying it,
and the rewriting the entire track. Sound familiar? It should that is exactly
how flash ended up working.

So here you have a random access memory that started life as a memory, but
gained commercial acceptance as a pseudo disk drive, and a generation of
programmers and system designers who had never heard about drum memory or
considered it as something other than a curious artifact of the "before time"
when people stored data as dots on a cathode ray tube for heavens sake. But
they should have paid attention, because that is exactly where flash belongs.
Sitting "beside" really fast dynamic ram, and even faster static ram (which is
on chip and usually called level 1, level 2, or level 3 cache memory. If you
remember Jeff Dean's observation about latencies every programmer should know
[2], you would notice that reading 4K bytes from DRAM was on the order of .5
to 1 micro-seconds, and reading / writing 4K of memory from the network was 10
uS, and 4k read/write from disk - closer to 15000 microseconds (or 15
milleseconds). Reading 4K from flash on the PCI bus is on the order of 4 uS.
Somewhere between getting it from the network and getting it from RAM. But the
reason it is so much faster from the PCI bus is that you just map the PCI
address space into memory space and memcpy from flash to RAM.

If you compare latency and bandwidth to going through an SSD interface you
find that not only do you have a whole bunch of kernel between you and the
SATA chip, which has its own protocol and drivers, you are constrained to a 6
gbit pipe which is probably shared with other SATA ports on the same SATA
controller chip. From a systems architecture perspective attaching flash to
your machine through an SSD plug is lame.

Now that said folks have started figuring this out. So people like Intel make
PCI Flash cards, big chunks o memory like things. And all of the wear leveling
is built into the flash controller just like the dynamic memory refresh logic
is built into the DRAM controller. The processor sees something that looks
like memory, occasionally operations take longer than expected if the
controller is in the middle of something. The current challenge though is that
so far folks think selling you the same flash chips that you can buy for $2/GB
as an SSD should cost $200/GB as a PCI card. That math is seriously holding
back flash, and the fact that the best slot to use for flash is the one your
video card is sitting in (16x PCIe) and Intel has yet to add another 16x PCIe
port for non-volatile memory cards, or added architectural support for putting
PCI address resources into the page table. That will happen though, when I
can't predict but it will because people keep asking for it and it makes some
really killer server architectures possible.

[1] A pretty decent write up on drum memory from Wikipedia -
<http://en.wikipedia.org/wiki/Drum_memory>

[2] A github copy of Jeff Dean's observation updated with Flash SSDs -
<https://gist.github.com/2841832>

~~~
confluence
This comment is God like - do you have a blog or the like where you keep
details of your thoughts?

I think I just got a blast of the future of server architecture, future
latency and the end of HDDs reading this - many thanks :D

------
rll
Mojibake from acm.org?

~~~
Confusion
I don't see any mojibake?

~~~
rll
Yeah, it looks like they fixed it. All the apostrophes were misconverted most
likely from Windows-1252 when I read this yesterday but they are fine now.

------
BIair
When I see anatomy in a title, I expect to see pictures.

