
The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance - randta
http://www.anandtech.com/show/11209/intel-optane-ssd-dc-p4800x-review-a-deep-dive-into-3d-xpoint-enterprise-performance
======
bbulkow
Aerospike CTO here. We did a lot of testing with this drive, and the
interesting bit is that performance doesn't degrade under write pressure. With
a NAND drive, when you push writes high ( usually needed in front-edge /
microservice apps ), the read latencies take a real hit, and often get into
millisecond-average at the device. Optane simply doesn't behave that way.

All of that means you have to code to it differently. It isn't DRAM, it's not
Flash. We've done the work to integrate and test at Aerospike ( semi-plug
sorry ), so the system works.

The XPoint tech gets really interesting later in the year, when three things
happen: the higher density drives show up, Dell ships their high-NVMe chassis,
and more clouds support it.

Regarding cloud - IBM's BlueMix has announced Optane, and the other providers
have a wide variety of plans. I can't comment more.

Finally, Intel has been clear about putting this kind of tech on the memory
bus, and that really opens the doors to some interesting choices, some data
structure changes. That requires new coding, we're on it.

Here's an interesting ComputerWorld article about our experience with Optane:
[http://www.computerworld.com/article/3188338/data-
storage/wh...](http://www.computerworld.com/article/3188338/data-storage/what-
aerospike-learned-from-testing-intels-superfast-optane-ssds.html)

~~~
bch
> All of that means you have to code to it differently.

> That requires new coding, we're on it.

Can you give a rough example to guide my thinking? I understand OSes are
considering ways that this has both traditional disk and RAM properties, and
are sorting out storage subsystems of drivers, but I assume you're talking
about something closer to user-level structures and algorithms?

~~~
bbulkow
Yes, I think there are clear user-level approaches you can ( and must ) take.

There are some interesting talks out there about NAND, how it works, and how
to optimize - I saw something here on HN a few days ago about writing your own
time-series database, which got a variety of the facts wrong but was an
example of how to choose data structures that are NAND-reasonable. You can
look up some of my YouTube talks and slideshare, for example - I've been
talking this for a while.

At a high level, NAND has more IOPs than god, because they don't seek. An old
enterprise spindle littlerally does 200 to 250 seeks per second. And Flash can
read from 500,000 different random locations per second. That's so far apart
that different user level approaches are called for.

In terms of XPoint, let me give you one detail. What does a "commit" look like
in XPoint? What do the different kinds of memory barriers look like? What's
the best way to validate this kind of persistence on restart, which you don't
have to do with DRAM? Does that change your "malloc" free list structure,
because you need to validate? Is it a good idea to chop up all the space
available, so you can validate different parts independently, or does that
mean you end up with the multi-record transaction problem? These are the kinds
of things we consider in database design on new hardware ( obligatory: we are
hiring ).

------
saosebastiao
Impressive.

On a semi-related note, the fact that cloud providers offer managed Postgres
databases is great, but things like this keep pushing me to think about bare
metal in colo. A $20,000 server/backup combo with a couple of these will give
me 5x the performance of a server that costs me $70k/year on AWS _before
provisioned IOPS_. That's a huge gap to play with for taking care of your own
maintenance costs.

~~~
dboreham
It has always been the case that owned metal is much cheaper than cloud. Cloud
wins for things like very small machines (you can't buy 20% of a metal box);
temporary deployments; and startups where the investors don't want to be on
the hook for big $$ capex that they end up needing to shift on eBay post
flame-out. Also where you just don't have the ability to run hardware
(increasingly the case) and where for whatever reason the cost to deploy and
run metal would add up to more than your $70k delta (easy if it needs one more
person on staff).

Also remember that funky storage card may have a firmware bug that costs you
hundreds of hours to track down and get resolved. That time could have been
burned by Jeff Bezos' guys instead..

~~~
chx
I never understood this. Just rent a dedicated box or ten. There's no capex,
it's way cheaper to have 3-5 times the capability you will need just sitting
there instead of renting stuff from Amazon with all the I/O unpredictability,
complexity, w/e. People talk as if colo and cloud is the only solution -- and
I am like, rent dedicated is the only solution. I do not readily know how huge
your site needs to be for colo to worth it but Examiner was ~75th on Quantcast
and we ran it on a dozen dedicated boxes and colo would'v been ridiculously
more expensive.

~~~
bkeroack
With public cloud, you're paying for the control plane. Period.

Yes the raw compute cost is much higher (and the performance often less) than
bare metal, but software development is really, _really_ expensive. With the
public cloud you get the result of literally millions of programmer-hours "for
free". To many that's worth it (at least at first, below a certain scale).

This is also one reason why Kubernetes is really exciting, BTW. It's a control
plane that you don't have to rent.

~~~
chx
I really badly do not understand what EC2 gives you that a dedicated box
doesn't. Yes, if you use all the other services, then there's a value but then
again for example SQS is a particularly shitty queue.

~~~
zxcmx
It's not ec2, it's the ecosystem.

DDOS, multi-path, multi-az, dr, snapshots, hardware redundancy, elasticity,
freedom from datacenter contracts, fiber contracts, driving to rack stuff,
jammed fingers, staff costs, firmware bugs, bad hardware batches, cdn,
compliance costs. I have more but I'll stop there.

It doesn't make sense at very small or large scales but it captures a hell of
a lot of the middle.

~~~
chx
Most of what you described is colo and not dedicated boxes...

~~~
ericd
And honestly, even colo doesn't take that much time unless you're scaling up
like crazy. Once we racked our machines, we almost never had to touch them.
You can get a LOT of resources in a single box for $10k/server these days.

------
mtanski
Some people like to claim that old storage technologies go away. But in
reality old storage technologies live on along side of new... all that happens
is that we end up having more tiers to deal with.

Drum magnetic memory has been replace but we still have spinning rust, tape,
optical, DRAM, SRAM, SSD...

~~~
bbulkow
There are only 3 forms of bit storage, historically.

1) Poking holes in things ( tape, DVD, etc ) 2) Magnets ( core memory, drum
memory, tape, disks ) 3) Circuits ( DRAM, NAND, etc )

Examples of all three of these still exist.

What's interesting about XPoint is it is literally a fourth form that has
never been commercially available: melting a substance and cooling it quickly
or slowly, forming either a crystal or amorphic solid, which then has
different properties. We don't know what the substance is, but it's cool that
we now have this 4th thing.

~~~
klodolph
There are only 3 forms that really survive to this day. There are a couple of
notable obsolete ones:

4) Delay lines. I would definitely not categorize these as circuits. The bits
are stored as pressure waves in motion.

5) Electrostatic charge. Talking about the Williams tube. The bits are stored
as residual charge on a phosphor surface.

~~~
kabdib
... and magnetic bubble memory, which is a very odd hybrid somewhere between
tape and delay lines.

Also, phase-change optical (as in rewritable CDs/DVDs).

Also also, printing stuff out at high bit-density with good ECC, and scanning
it back in again.

Also also also, chemically encoded bits (as in the DNA-based storage that was
demonstrated recently).

There are more, not in common use or very scalable (e.g., mechanical
switches).

~~~
klodolph
I think it's a bit conspicuous how you mention good ECC for printing stuff
out, since good ECC is a staple of many forms of storage (especially hard
drives, optical drives, tape) and we don't mention e.g. "storing bits on
spinning rust with good ECC and reading it back in again".

I think the problem with paper is thot it's not really part of the computer
any more. You need a human to take the paper out of the printer, store it, and
put it back in the scanner. At least with tape and optical, we have robots to
do that for us.

~~~
kabdib
I hadn't thought of that. Your average "crappy thumb drive" probably has more
ECC than you'd need on a physical piece of paper. You're right.

~~~
mtanski
Although when crappy thumb drives fail the whole thing tends to be unreadable
or doesn't even show up as a USB device.

Generally no amount of ECC will help the default failure mode of these.

------
benlwalker
I'm the technical lead for the Storage Performance Development Kit
([http://spdk.io](http://spdk.io)). I have one of these in my development
system and we're hoping to post benchmarks fairly soon. SPDK further reduces
the latency at QD 1 by several microseconds. It's an impressive device!

~~~
justinclift
Any word on real world endurance?

There's some speculation these devices are massively overprovisioned due to
the er... cells (or equivalent) wearing out much faster than early
hype/info/etc.

So, it'd be interesting to get a real world idea if these things are likely to
explode badly ;) 6 months after purchase or not (etc). :)

------
gigatexal
Jeebus, that's one performant drive.
[http://www.anandtech.com/show/11209/intel-optane-ssd-
dc-p480...](http://www.anandtech.com/show/11209/intel-optane-ssd-
dc-p4800x-review-a-deep-dive-into-3d-xpoint-enterprise-performance/4) \-- I
can't believe those numbers. The QD1 numbers are impressive to say the least.

------
ganfortran
The random access performance is IMPRESSIVE! It is going to bring a storm in
DB landscape, since now we probably don't need to force to use data structure
that tailored to perform with sequential writes, but all the other data
structures as well? Very exciting indeed!

------
frozenport
At $5 per gb, I would instead buy RAM.

Edit: why the downvotes? I'm currently doing image processing and using a ram
drive with ~200GBs.

~~~
Crespyl
The reason you would use this over RAM is persistence, it doesn't need to stay
powered to keep the data.

If all you need is a massive amount of temporary storage for some algorithm,
you'll still need RAM, but if you want a stupidly fast backing store for a
huge amount of source and then output data, this is pretty incredible.

~~~
Dylan16807
When your computation fits in RAM, what's the point of a lower latency backing
store that does fewer gbps per dollar?

------
strictnein
Another nice review of the same part:
[https://www.pcper.com/reviews/Storage/Intel-Optane-SSD-
DC-P4...](https://www.pcper.com/reviews/Storage/Intel-Optane-SSD-
DC-P4800X-375GB-Review-Enterprise-3D-XPoint)

------
randta
Better than NAND performance if ever so slightly for a first generation memory
product but I still feel like it would not able to scale to DRAM speeds. The
search for a universal memory goes on...

~~~
Analemma_
They're not advertising it as universal memory or a DRAM replacement though,
so I'm not sure why you would make that comparison. They're advertising it as
sitting between RAM and an SSD on the storage hierarchy, and it seems to
mostly deliver on that promise.

~~~
endorphone
One of the proposed uses for 3D XPoint has been replacing system memory, owing
to its extremely low latency (which it has already delivered), high
throughput, where it's early, and supposed almost unimaginable
reliability/rewrites. In DIMM form it wouldn't have the overhead of a storage
controller, could hypothetically be very parallel, etc.

It's extremely early in the technology, and I imagine we will get there. The
first SSDs were terrible compared to SSDs now, and 3D XPoint as a technology
is extremely scalable and refinable.

~~~
Dylan16807
> extremely low latency (which it has already delivered)

Are we looking at the same numbers? "probably under 10 microseconds" is pretty
terrible compared to DRAM.

~~~
endorphone
3D XPoint has extremely low latency (e.g. 7 usec), and has already been
demonstrated as such. Putting it through a PCI slot and a storage controller
is not the same. The discussion is about running it as DIMM memory through a
normal memory controller.

~~~
Dylan16807
So you remove the storage controller and PCIe bus and you go from 10us to 7us.
7us is still a hundred times slower than DRAM, is it not?

~~~
_wmd
You're ignoring a lot of context being created by the parent comment. While
the latencies may never directly compete with DRAM, planned densities are
already more than favourable. Having a 2TB chunk of persistent memory mapped
into your address space with single-thread performance exceeding 100,000
random reads/sec with extremely consistent latency is a game-changer in for
example database applications.

~~~
Dylan16807
What context am I missing? While Analemma_ was talking about using it to
augment DRAM, like you are, endorphone was very specifically talking about
_replacing_ DRAM in that comment. And that comment claims that it already has
good enough latency for the job.

~~~
endorphone
It's turtles all the way down, and I think this is needlessly splitting hairs.

L1->L2->L3->(L4->)DRAM->Storage

3D XPoint is being considered as the DIMM-module, byte-addressed "memory"
before storage. Saying that it "augments" DRAM is almost meaningless because
we already have multiple levels of DRAM.

~~~
davrosthedalek
If I am not mistaken, we have only one level of _D_ RAM. Caches are SRAM.
(Also, DRAM on DIMM is not byte-addressable)

~~~
cocoablazing
The PS2, Xbox 360, Wii, and some intel chips with iris provide on-die DRAM
caches.

------
elorant
Which are the use cases for an SSD like that? Would I see improvements if I
were to load my db on it or is it for working with bigger files?

~~~
mozumder
These are going to be absolute beasts for databases.

Databases are best on drives that have reduced latencies. Data access is
random and keeps jumping around the drive for joins. Your SELECT statements
are going to speed up in proportion to latency improvements.

------
ksec
I am rather hoping this make large capacity DRAM some price drop.

~~~
scurvy
I'll settle for NAND pricing to drop. There's a bit of large-scale NAND
shortage right now. Companies are bearing the brunt of the shortage right now,
but they'll past it on to consumers if it continues much longer.

