

SandForce SSD controller beats Intel X-25M by compressing data before storing - Alex3917
http://www.anandtech.com/storage/showdoc.aspx?i=3702&p=1

======
wmf
Whenever compression or dedupe is involved, it matters _what data_ you're
writing; unfortunately Anand doesn't specify that so it's hard to tell how
realistic these numbers are.

~~~
xtho
On the last but one page, there is some discussion of this issue.

~~~
weaksauce
And the conclusion is that it is slower than the original version of the
vertex but it is still faster in writes compared to the G2 ~140MB/s vs 109MB/s
where the original vertex SSD was able to get 185MB/s.

FTA: _Presumably the majority of your file writes aren’t going to be
compressed files so your performance shouldn’t be gated by this issue, even
then I’ve shown that you shouldn’t be any worse off than you would be with
Intel’s X25-M._

~~~
seunosewa
Video files are compressed, and they are the biggest drive fillers.

------
andrewcooke
it would be interesting to compare that to a file system with transparent
encryption on an intel drive.

this is the best way i have found to think about what they are doing (it's a
while since i read the article, so apologies if i am just repeating what is
said, or get something wrong - iirc, anand hints at the below but doesn't
state quite as much):

given an arbitrary, unreliable storage medium, you need to store both "raw"
data and additional information for error recovery. it seems that until now
there were technical / historical reasons that made it optimal / normal to
store these separately, as i have described above (ie a disk stores a chunk of
data and then has a relatively small checksum afterwards).

but there's no real reason why that need be optimal in all cases. for example,
raid does something different. raid 1 stores two copies (ignore that each has
error correction too, for the sake of argument). now raid has certain
technical motivations (cheap disks fail, hopefully independently) that make
that reasonable.

so what is new about ssds compared to spinning disks that is the enabling
factor here? one guess is that since you can read from various chips at the
same time you can do something like raid. for example, say you have 8 memory
chips, then you could use a raid 5 style approach with one chip as parity,
losing 12% of your space. or two chips losing 1/4 of the space.

my inference (and what i think anand implies at some point) is that when you
sit down and do the maths, there's some number of parity chips (say) that
allows you to start using cheaper chips (with higher error rates).

but that doesn't entail compression.

so either i am missing something, or the compression is optional - perhaps it
is being used to hide the fact that they are having to use so much space for
error correction? or perhaps it is just a marketing gimmick?

 _edit: or perhaps without compression it's actually too slow to sell? i think
this may be it. and if so, putting compression on an intel drive will actually
beat this._

or perhaps there's something about the approach that means the data have to be
compressed anyway?

~~~
wmf
You're confusing two separate features. The improved error correction allows
using cheaper flash. The compression/dedupe is what improves performance and
endurance.

~~~
andrewcooke
i am? i thought that is what i was saying...

what's confusing me is that this is being sold as a "high end" drive. it's
not. it's a drive engineered to be as cheap as possible, using lower quality
components, that's being sold as "fast" only because it has compression. put
compression (eg with zfs - i don't know what other file systems have
transparent compression) on an intel drive and it should be better.

but the review hints that the two are linked. if they are, then perhaps it's
not as simple as that.

~~~
wmf
Even if they were aiming for low cost, I wouldn't blame them for marketing the
thing as high performance given that it's currently #1.

~~~
andrewcooke
sure, everyone knows that companies will spin things however they can
(although i don't understand why some people - particularly americans - seem
so fond of pointing this out; it's hardly the most positive aspect of
capitalism).

but the review could have been a little more questioning. why not compare it
to writing compressed data on an existing drive?

(and i'm sure i don't need to point out, to a connoisseur of free markets like
yourself, that although they make a living convincing people to buy the latest
product, and so work hand in glove with the manufacturers, they also have to
compete for readers by reputation, which requires some level of integrity)

------
alexgartrell
To be fair, there's a good chance I'm missing something here, but 25 gb of
data (and not super-redundant low-entropy ascii, a lot of executable machine
code) being compressed into 11 gb seems pretty amazing. Have any third parties
verified these numbers?

Editted to add: Beyond that, it seems like all this tech would make random
lookups abhorrently expensive, because a seek(30) doesn't necessarily jump to
the (30 * A + B)'th place. But I'm not a file systems/storage guy at all, so
I'd love for some education here.

~~~
houseabsolute
I just compressed the 1.6MB cgo application into 400kB using standard Mac OS X
zip. A few other applications I tried compressed similarly well.

~~~
jbellis
standard zip is far more cpu intensive than anything you could afford to run
on something where you (a) have very limited cpu resources and (b) need to add
very very little latency.

~~~
houseabsolute
tar+gzip performed similarly well (1.6MB -> 450kB) and is a lot faster than
normal zip. GZip was similarly able to compress my Mac's System folder from
four gigabytes to two. Anyway, the point stands that executable files are
quite compressible. If you'd like me to try it with an algorithm you consider
more appropriate, please direct me to the download page.

------
dpifke
Unpaginated version:

<http://www.anandtech.com/printarticle.aspx?i=3702>

