
Intel Announces Optane DIMMs - p1esk
https://www.anandtech.com/show/12828/intel-launches-optane-dimms-up-to-512gb-apache-pass-is-here
======
ChuckMcM
These are going to change things for a lot of reasons, not the least of which
is significantly higher transaction rates in systems. In particular being past
a memory barrier means that your cache is consistent and persistent across
reboots in nanoseconds rather than milliseconds or even microseconds means
large transnational systems can do more operations at up to 6 orders of
magnitude faster. In file systems, and especially so called 'IRON' or error
corrected systems, this will significantly boost the speed of operations as
more writes can be delayed safely to insure that dense spinning media is
writing sequentially rather than randomly. Using the old NetApp cluster
architecture I expect you could get a couple of gigabytes per second of random
(from the clients) R/W performance on a very reasonable number of drives.

Of course it also makes it possible to rapidly reboot into an OS if all you
need in the boot loader is to copy a gigabyte of 'ram' from one place to
another and then jump to it. What that enables is powering down servers that
aren't being used with the ability to power them up in milliseconds rather
than seconds. That has been of Google and other cloud providers to have 'power
proportionate computing' clusters.

Very cool stuff indeed.

~~~
baq
for rebooting, you'd have to have super-fast POST and device initialization
after power cycle, so i wouldn't say milliseconds.

~~~
martincmartin
He doesn't mean rebooting will take microseconds/nanoseconds, he means the
transaction commit will take that long. Before, a transaction commit meant
flushing disk, at least as far as the disk controller. Now, commit = memory
barrier.

~~~
ChuckMcM
Actually and "restarting". Technically it isn't "booting" because the OS is
already in memory and initialized, it just isn't running.

Given your typical motherboard which you tell to bypass POST, it can go from
power applied to memory controllers alive in just a few microseconds. The code
then to go from there to running is also quite small, especially if there
aren't things like graphics controllers on the PCI bus that have to be
initialized first.

~~~
namibj
And even those can probably be lazy-loaded, with a page fault in the IOMMU and
handled as soon as the rest is done, or the rest isn't doing anything and
something is thus blocking on the GPU initialisation.

------
al_james
What I can't work out from both the article and the comments here: from an
application point of view, do I use this like I use memory, or do I use it
like I use a disk?

No matter how fast a disk is, using it means either some expensive
serialization/deserialization step (and also the associated memory access to
create the 'working' object that my logic actually works on) or writing my
algorithms to forego in memory objects (and the associated features offered by
my programming language, e.g. classes / objects or whatever) and working from
the raw byte values.

What I really want, and would be a game changer as to how we use things, would
be that my programming languages heap can be made persistent (or at least a
part of it). In this case instead of:

    
    
      var mything = new Thing();
      load_thing_from_disk(mything);
    

I might have:

    
    
      persistent var mything = new Thing();
    

Done. However this also introduces more questions, like transactional commits
to memory etc (as few apps are coded to ensure consistency of memory across
reboots).

However I cant help thinking that some way to harness persistent fast memory
without needed some complex disk->logic mapping would be a game changer.

Edited: spelling and wording

~~~
pbalcer
Disclaimer: I work at Intel on PMDK (pmem.io)

It _is_ the game changer that you wish for, since the marshaling logic that
you mention is gone. Persistent Memory can be accessed directly through memory
mapped file, bypassing the traditional read()/write() I/O paths. Recent file
systems have also been modified to a) skip the page cache layer and b) forgo
the msync() call that would be otherwise required to synchronize the modified
pages. This is what's called DAX (Direct Access [0]). In the place of msync()
you can now just use CPU cache flush instructions. These two file system
changes entirely eliminate kernel code from the I/O path (apart from the
initial page faults).

Persistent Memory Development Kit contains libpmemobj [1], which is almost
exactly what you are imagining ;) It's a persistent heap, with transactions
for durability. It's not as nice (yet) as your code snippet, but here's C++
example [2] of a persistent queue push:

    
    
      obj::transaction::exec_tx(pool, [this, &value] {
        auto n = obj::make_persistent<Node>(value, nullptr);
    
        if (head == nullptr) {
          head = tail = n;
        } else {
          tail->next = n;
          tail = n;
        }
      });
    

`make_persistent` is, akin to `make_unique`, a memory allocation of a "Node"
class. Once allocated, we can just assign the newly allocated object to a
different persistent variable. No kernel code executing, no serialization ;)

[0]-
[https://www.kernel.org/doc/Documentation/filesystems/dax.txt](https://www.kernel.org/doc/Documentation/filesystems/dax.txt)

[1] - [https://github.com/pmem/pmdk](https://github.com/pmem/pmdk)

[2] -
[https://github.com/pmem/pmdk/blob/master/src/examples/libpme...](https://github.com/pmem/pmdk/blob/master/src/examples/libpmemobj%2B%2B/queue/queue.cpp)

~~~
twic
If your data contains pointers, then for it to be round-tripped through
persistence correctly, i would imagine you'd need to map it at the same
virtual memory address every time. Which isn't possible. Have i got that
wrong?

~~~
wmf
When you mmap() a file you can specify the virtual address so it will be the
same every time.

~~~
pbalcer
Yes, but to accomplish that you would have to use the MAP_FIXED flag, which is
quite dangerous because it can replace previous mappings. That can lead to
problems with dynamic memory allocation since almost all malloc()
implementations use anonymous mmap.

~~~
repolfx
Yes but this is a trivial problem to fix on 64 bit machines. There's so much
address space the kernel can just be told to never pick certain address ranges
for unfixed mmaps, leaving the rest of the address space free for persistent
heaps.

The actual hard part of persistent heaps isn't the persistence part. It's
transactionality and upgrade management.

------
georgeaf99
Simon Peter, at UT Austin, has been working on a file system that works better
for systems with NVM. It looks like existing file systems (e.g. EXT4) are
going to suck in this new paradigm (DRAM -> NVM -> SSD) . It’s a pretty
interesting read and the benchmarks are damn impressive.

[https://www.cs.utexas.edu/~simon/sosp17-final207.pdf](https://www.cs.utexas.edu/~simon/sosp17-final207.pdf)

~~~
rybosome
Interesting read, thanks for sharing.

It didn’t even occur to me that the file systems will need to change to fully
take advantage of NVRAM. I wonder at what point the abstraction will stop
leaking and require another higher layer to account for differences in
performance. I’m sure the OS will need tuning, but applications might not
unless they’re pretty bare metal.

~~~
zuck9
One of the reasons why Apple's new file system APFS was developed was
optimization for flash storage. HFS+ was designed with floppy and spinning
disks in mind.

~~~
jwandborg
What notable changes did APFS implement to accommodate flash storage?

------
kibwen
Anybody want to give a brief comparison of how these compare in practice to
DDR4 and SSD storage? I assume it's "slower than the former, faster than the
latter", but having an idea of the magnitudes would be useful.

~~~
konceptz
Linus did a video about Optane I found helpful.

[https://youtu.be/cwy4ujt0qHM](https://youtu.be/cwy4ujt0qHM)

~~~
djsumdog
This video is about the Optane devices you place in M2/nvme slots to be used
as cache.

It will be interesting to see the real benchmarks on this DIMMS. We all want
to know if they're comparable to real DDR4 memory.

The current market is terribly overpriced (there's some debate on if there's
price fixing with the big three or if it's a genuine shortage/supply problem
with the Note recalls and new phone releases). DDR4 is nearly double the price
it was the last time I did a build over a year ago. :-/

EDIT: Looks like these chips will be specialized for certain server
boards/CPUs and only share the DIMM interface and not protocol.

~~~
kibwen
Thanks for the clarification, I'm watching the video now and it's interesting
(Optane is way cheaper than I expected) and still seems useful as a lower
bound for what we can expect from the DIMM version.

------
JohnBooty
I'm so confused. I can't find a good explanation of how applications and/or
the OS "see" these DIMMs.

How does my OS/app see this? Is it accessed like regular DRAM memory... except
slower and persistent?

Or would my OS see it as a "normal" drive... except one that's really fast and
happens to be connected via a DIMM slot instead of PCIe/SATA/whatever?

~~~
sethhochberg
Your operating system will see them as storage, not regular DRAM. Intel has
released SDKs for developers to use if their applications need to be able to
interact with persistent memory in any kind of detailed way (say, if you
develop a database product, or a filesystem).

[http://pmem.io/pmdk/](http://pmem.io/pmdk/)

It is very likely that generic kernel support will come for use in Linux and
Windows directly, building on top of the existing DAX systems in those
operating systems (DAX - direct access - APIs being used for IO to memory-like
devices, bypassing cache layers which are useful for more traditional storage
types). This would allow a user to create a regular old storage volume in
their NVDIMM for general use.

[https://www.kernel.org/doc/Documentation/filesystems/dax.txt](https://www.kernel.org/doc/Documentation/filesystems/dax.txt)

Do note that NVDIMMs aren't a drop-in replacement for regular DRAM DIMMS,
despite using the same bus and electrical subsystem. You'll need proper
hardware support on your motherboard and CPU, since memory controllers are on
CPU these days.

~~~
twic
> Your operating system will see them as storage

Do you mean it will present them to userland as storage, rather than see them
as storage?

Seeing them as storage implies to me that the DIMM emulates an AHCI, which i
don't think is the case.

~~~
sethhochberg
Correct yeah, I probably oversimplified that part a bit. The kernel will be
fully aware that your NVDIMM is NVDIMM, none of the technical details
available so far suggest there will be any kind of emulation of legacy storage
protocols.

------
onli
I don't really understand how this is supposed to work. These are DIMMs,
meaning they are ram alternatives? Will they be cheaper and slower than DDR4,
but persistent and getting benefits from that? Which advantages exactly?
Wouldn't the OS need to be aware of that, defeating the point of them being
DIMMs?

128GB, 256GB and 512GB per module is sadly too much for consumer motherboards.
Why not a 16GB version, didn't Intel even launch Optane with those small
sizes?

~~~
zokier
The memory bus is pretty much the fastest thing that you can put things in
your system; for example looking at some random Xeon CPU, it has 6 channels of
DDR4-2666, each capable of transferring about 21 GB/s data for a total of 128
GB/s. Compared to that, even 16 lane PCI-e 3.0 is relatively slow at 16 GB/s.

While I don't have the figures to back it up, I believe the differences in
latency (and by extension random access perf) are even more dramatic, and
where the real performance advantages come from.

~~~
ksec
The bus is fast but Optane is slow. It doesn't even saturate 4x PCIe with
512GB Capacity. So the final Optane DIMM on 6 Channel Memory may only be about
21 - 24GB/s.

~~~
nickflood
But it's only going to get better from then on it's a first revision of a
product with a novel tech inside it. First SLC SSDs had abysmal performance
numbers and density by today's standards.

------
reacharavindh
I can already imagine storage systems where the blocks are just written to
memory before acknowledging as a synchronous commit, and reads occurring
completely from this cache or for one direct hop from this cache. It's going
to be amazing in the future as this tech matures. A persistent RAM will change
a lot of our architectures for the better.

~~~
wmf
That's pretty much how SANs used to work.

~~~
reacharavindh
If you're referring to NetApp's use of NVRAM, to take in writes, and
acknowledging the IO before actually writing to disk, it is not that simple.
They do NVRAM mirroring to save those writes to another partner's NVRAM for
HA. This is a network tax on core IO path. If there is a mature Optane DIMM,
it may not be necessary to do the network dance in IO path.

I sincerely hope no storage company acknowledges a write after just writing to
volatile main memory. That's a recipe for disaster when a node goes down.

~~~
atomicUpdate
> I sincerely hope no storage company acknowledges a write after just writing
> to volatile main memory. That's a recipe for disaster when a node goes down.

Sorry to break this to you, but this is exactly what every major storage
product does. Data will be mirrored to separate DDR behind and then good
status is given to the host. The data will be destaged at some point later
when cache space is required for some other operation.

The data itself is safe as long as the battery backup (or capacitor for
smaller systems) is charged enough to handle a power outage. The storage
system knows the battery levels and may not allow a write cache if there isn't
enough supplemental power to destage the full write cache in the event of a
power loss.

~~~
heavenlyblue
This is just mixing up the issues, same as saying "what if the HDD that
acknowledged the write suddenly catastrophically failed"?

------
JudasGoat
We never really worried about write endurance with dynamic RAM, but it would
seem to be a lot busier than disk I/O at least in some server applications.

~~~
marshray
We need reviewers report a new metric: time to failure at continuous max write
throughput.

For some SSDs, it's under a week.

~~~
pilsetnieks
Are you sure? A quick back of a napkin calculation seems to suggest that a
fully saturated SATA3 connection would still be a measly 36 terabytes in a
week.

It seems quite low for any practical purpose. I don't doubt that there
probably are some tiny shitty drives that will conk out after a week like that
but are there any reasonably popular drives like that?

~~~
opencl
The 500GB 970 Pro is rated for 2300 MB/s sequential writes and 600 TB write
endurance. That's about _three days_ to exhaust the write endurance. Latest
high end SSD model from the leading manufacturer. Not that it could actually
come anywhere near sustaining that throughput for three days straight.

[https://www.anandtech.com/show/12674/samsung-
announces-970-p...](https://www.anandtech.com/show/12674/samsung-
announces-970-pro-and-970-evo-nvme-ssds)

~~~
AlphaSite
The perf isn’t really that good for ext extended periods.

------
chrisper
I am guessing that this now means that we also have to encrypt our devices in
DIMM, especially if it is persistent.

------
dajonker
There are lots of papers out there discussing the possibilites of non-volatile
memory, e.g. Let’s Talk About Storage & Recovery Methods for Non-Volatile
Memory Database Systems (Arulraj, Pavlo, Dulloor, 2015)[1]. However, they all
seem to work with "simulated" NVM. It will be interesting to see how these
real units compare to the simulations.

[1]
[https://www.cs.cmu.edu/~jarulraj/papers/2015.storage.sigmod....](https://www.cs.cmu.edu/~jarulraj/papers/2015.storage.sigmod.pdf)

------
ksec
So we could fit, 512GB DIMM, at up to 8 TB of Optane, and using them like
in~memory DB with persistence? I wonder what performance improvement would I
get on a Postgre DB compare to RAM + NVMe SSD.

~~~
derefr
A lot of the overhead there might come from Postgres (e.g. building temporary
in-memory indices for hash-joins, because it's assuming random access to the
tablespace is slow.) Ideally you have a DBMS that already understands that
it's running on nonvolatile memory, and so doesn't have separate "in memory"
and "on disk" formats for its data.

Maybe something like Aerospike? (don't know much about it but I've heard it's
"for" that)

~~~
ovao
Or some LMDB-backed database, sure. With LMDB you'd essentially map the entire
address space of the disk, then just persist pointers. Writes would still not
be as performant as you would hope, but random read performance would be in
the small handfuls of nanoseconds per.

------
whitepoplar
How would Optane-on-DIMM affect the performance of relational databases as
compared to the NVMe variant of Optane?

~~~
en4bz
As far as I know nvme is exclusively block based. So you get to write in 512
or 4K blocks. I believe the DIMM versions are byte addressable or at least
they will be at some point in the future.

~~~
21
I'm pretty sure the smallest addressable unit of RAM in modern computers is a
whole cache line (64 bytes).

~~~
rrix2
While being technically correct, that is still much smaller than 512 or 4K
blocks.

------
Aardwolf
So it's like putting an SSD in your RAM slots? But what about speed, and
especially amount of write cycles?

~~~
dmix
I believe the whole idea, beyond pure speed of RAM vs SSDs, is they are
promoting a programming 'model' where you bypass the operating system I/O
(transiting via pages/blocks) and instead directly read/write data (via bytes)
to the memory from the application. Which could be useful for particular
usecases / subsets of data.

They mention write cycles in the article...

> The existing enterprise Optane SSD DC P4800X initially launched with a write
> endurance rating of 30 drive writes per day (DWPD) for three years, and when
> it hit widespread availability Intel extended that to 30 DWPD for five
> years. Intel is now preparing to introduce new Optane SSDs with a 60 DWPD
> rating

~~~
JohannFlobuster
The idea is to drive CPU utilization as well.

For example, in the x86 world, pairing NVMe drives with putting portions of
the application writing to NVDIMM drives core performance, say, in SQL, from
40% util to 100% util.

Even a single 8Gb DIMM can dramatically increase utilization and performance.

~~~
monocasa
And literally half the bandwidth. NVMe means that you have to write the data,
flush the caches over those ranges to ensure that it's actually in DRAM, then
instruct the NVMe controller to read it back out of DRAM. With these drives
you just write (and maybe flush), and it's done. And they probably have their
own dedicated DRAM protocol controller.

And it might even be slightly better than halving the bandwidth needs, since
swapping DRAM banks isn't free, so you might be saving on mildly thrashing
your DRAM controller when you're using DRAM and the NVMe drive is trying to
read at the same time.

~~~
xorblurb
Modern DMA is cache coherent (maybe except if you opt-out of it? I'm not even
sure you can). It is still costly though.

------
patrickg_zill
So will regular RAM become "level 4 cache" and Optane exist on the other side
of that?

Basically you run everything you can in memory and then just mmap() in the
files you want to use?

~~~
marshray
But that's how we do it now. It's just that Optane persistent storage is
attached to a different bus with much better latency and throughput. Probably
categorically and disruptively better.

How will user-mode applications refer to their persistent data if not by a
filesystem path? You gotta put access permissions on some kind of object that
humans can copy-and-paste into their backup scripts.

------
joshbaptiste
I found this PDF to be very helpful, explaining the general use case of
persistent memory programming.

[https://www.usenix.org/system/files/login/articles/login_sum...](https://www.usenix.org/system/files/login/articles/login_summer17_07_rudoff.pdf)

------
theandrewbailey
What's the API for these Optane DIMMs? Does the program decide what gets
placed in them? Does the OS? (if so, how?)

~~~
justincormack
To the OS, its just memory and you can just mmap it. There are some filesystem
abstractions built on top though eg
[https://lwn.net/Articles/729770/](https://lwn.net/Articles/729770/)

~~~
jnwatson
What about multithreading? An OS can read from an SSD then context switch to
do useful other stuff until the data comes back. Once the OS issues a request
to (relatively) slow memory, the only thing that can use that core are other
instructions in the pipeline and other hyperthreads; i.e no other OS threads
allowed.

~~~
wtallis
The fastest NVMe SSDs are already right around the threshold where there isn't
time to complete a pair of context switches before you get the data back, and
the difference in latency between polling or waiting for an interrupt is
significant. These Optane DIMMs should be fast enough that only hardware-
managed context switches like hyperthreading/SMT are usable without
performance loss.

------
rurban
See also the timeline of those PCM, based on chalcogenide glass, which were
invented in the 60ies by Stanford_R._Ovshinsky. He always insisted that it's
better than DRAM. Finally we are getting there.

[https://en.wikipedia.org/wiki/Phase-
change_memory#Timeline](https://en.wikipedia.org/wiki/Phase-
change_memory#Timeline)

Theoretically it should be 1000x more durable than flash, and also 1000x
faster, but they are not there yet. But it looks like they solved the packing
problem. And Micron insists that it is chalcogenide based, but not "phase-
change memory", the one they started in 2012 and took back in 2014.

~~~
mattthebaker
PCM to Micron is an existing product/design that is NOR-Flash like but
utilizes PC material as the storage medium.

3D XPoint is said to be able to stack the storage medium on the die and
requires no access transistor -- which "PCM" has. So it is based on PC
material but has some new kind of selector.

The 3D/stacking part is important for scaling/density -- even NAND-Flash has
hit 2d limits and gone vertical.

------
blt
Reading this thread, it seems like there are doubts about whether this
particular product will really work like non-volatile RAM. Either way, it's an
indicator that the real thing will be here soon. I can't wait to see how fast
persistent memory will change the architecture of our computing environments.
I think it will have huge effects that we can't foresee yet.

------
bitL
How does it compare to SanDisk ones? Is this going to enable HP's The Machine
without memristors?

------
pulse7
Now hibernation will be fast and usable. Stand-by will not be needed
anymore...

------
bbulkow
I was on the outside vendor was panel at the event (along with Oracle and
redislabs) if anyone would like an engineering view of the technology.

~~~
wmf
Everything we want to know is exactly what you're not allowed to talk about.

------
redshirt
No need for paging for the most part with these. Current virtual memory
systems won’t be able to deal...can’t wait to see what’s next.

------
fooyc
What databases are built with this kind of storage in mind already?

~~~
augustl
VoltDB comes to mind.

This interview from 2013 is still worth a listen imo!

[http://www.se-radio.net/2013/12/episode-199-michael-
stonebra...](http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/)

------
AsianScum
Persistent Memory age is coming. Can we install NVDIMM into a desktop?

~~~
pbalcer
The product being announced is for data centers.

------
erichocean
This is ideal for something like LMDB. Can't wait.

------
libeclipse
> Optane DC Persistent Memory DIMMs have twice the error correction overhead
> of ECC DRAM modules.

That is pretty impressive.

------
xchip
"640K ought to be enough for anybody." (apparently 4 people in HN hated this
comment)

------
magoon
Oh, the security implications of this...

e.g. malloc owner PID x is now PID y

------
xrobot7
Throwaway account for obvious reasons. Was closely involved in the development
of it.

To summarize, this product is shot, and is just hype.

I'll check the technical questions/gaps and answer or fill them in tomorrow.

------
ukblewis
Lol, looks like Linus actually inspired them. I mean with the whole DDR
industry acting corruptly to keep prices sky high, this seems like a
reasonable way to add some competition.

~~~
mkhalil
I assume you mean Linus from YouTubes "Linus Tech Tips", and not Linus
Torvalds. Surprised you wouldn't clarify that considering what website this
is.

Considering Intel's history, I wouldn't bet they direct their efforts based on
a YouTube reviewer.

Also, considering Intels very anti-competitive behavior (e.g [0], [1], [2]), I
am wary of stating Intel entering the DDR industry will make it any less
corrupt.

[0]: [https://www.theverge.com/2014/6/12/5803442/intel-
nearly-1-an...](https://www.theverge.com/2014/6/12/5803442/intel-nearly-1-and-
a-half-billion-fine-upheld-anticompetitive-practices) [1]:
[https://www.wired.com/2009/12/ftc-sues-intel-for-anti-
compet...](https://www.wired.com/2009/12/ftc-sues-intel-for-anti-competitive-
practices/) [2]:
[https://www.youtube.com/watch?v=osSMJRyxG0k](https://www.youtube.com/watch?v=osSMJRyxG0k)

~~~
B1FF_PSUVM
> Intel entering the DDR industry

Eh.
[https://en.wikipedia.org/wiki/Intel_1103](https://en.wikipedia.org/wiki/Intel_1103)

RAM was their 1xxx product line, EPROM was 2xxx, microprocessors 4xxx (later
8xxx, what with 8 glorious bits of data bus ...)

