

Why Intel is adding instructions to speed up non-volatile memory - joe_bleau
http://danluu.com/clwb-pcommit/

======
bahahah
There are several storage class memories that are nearing commercialization.
Intel is betting big on at least one of them. Most technologies in this class
are orders of magnitude faster and have orders of magnitude better endurance
than flash memory, while being only slightly slower the DRAM, yet non-
volatile.

It is plausible that with another layer of in-package cache they could
eliminate DRAM altogether, replacing it with ultrafast NVM. Imagine the
resume/suspend speed and power savings of a machine whose state is always
stored in NVM.

~~~
runeks
> There are several storage class memories that are nearing commercialization.

I'm very interested in this. Could you point out which technologies that are
near ready for commercialization?

My understanding is that the current cost is orders of magnitude higher per
unit of storage for these new technologies compared to NAND flash or even DDR3
RAM. But of course, a dedicated fab could change that very quickly.

~~~
jhallenworld
Well nvDIMMs are available right now (from companies like Netlist, Agigatech,
Viking, Smart, Micron). This is DRAM with an analog switch, a controller and
flash memory. When you lose power, the DRAM is disconnected from the processor
and the contents are copied to the flash. The newer technology might be
cheaper, but I thought so far the write performance is not as good as DRAM.

The issue is the cache: the data is not non-volatile until it has been written
back to DRAM. Even then, you need some advanced warning of a power outage for
it all to work.

Unibus (bus for PDP-11 core memory systems) had an early warning signal, to
give the memory controller a chance to write back the previous (destructive)
read.

------
sweis
I've heard predictions that a significant portion of new x86 servers will be
using non-volatile memory within the new 5-7 years.

Memory is becoming the new disk. This could have major security implications,
as memory contents are unencrypted in general.

Fortunately, Intel CPUs will have hardware support to encrypt SGX enclaves.
Perhaps that support can be used for general memory access as well.

~~~
ams6110
if non-volatile memory is becoming the new disk, why is it any more or less
likely to be encrypted than current disk storage (mostly not, as far as I've
seen).

~~~
sweis
Long story short, memory bandwidth is much faster than the best x86 crypto
implementations can handle.

Encrypting disks or network is no problem today, but we'll need architectural
changes to support full memory encryption without a performance hit.

~~~
WallWextra
This could be done mostly transparently, with the encryption in the memory
controller. Addresses and data are already scrambled with a (non-
cryptographic) scrambling code for EMI reasons. Of course, a sufficiently fast
hardware crypto core would be required.

EDIT: Also, I forgot that the last generation of consoles (and I assume the
current) have transparent encryption of main memory.

~~~
justincormack
Indeed, there is discussion of encrypted memory on the controller in Risc-V at
[http://lowrisc.org](http://lowrisc.org)

------
Animats
Computing really hasn't figured out how to handle non-volatile memory as yet.
It's almost always used to emulate rotating disks, with file systems, named
files, and a trip through the OS to access anything. Access times for non-
volatile memory are orders of magnitude faster than disk access times, so
small accesses are feasible. But that's not how it's treated under existing
operating systems.

There are alternatives. Non-volatile memory could be treated as a key/value
store, or a tree, with a storage controller between the CPU and the memory
device. With appropriate protection hardware, this could be accessed from user
space through special instructions. That's what I though this article
indicated. But no. This is just better cache management for the OS.

~~~
harshreality
Can't use current flash chips in that way, because write endurance.

~~~
rbanffy
Also, current Flash memories do not allow single address writes. At least the
write endurance problem could be addressed by adding write leveling to an
address translation layer. The single address thing could be addressed by a
caching/grouping layer that could interact with the leveling mechanisms. Add
to that an all-core state dump to a block write and you can recover to an
internally consistent state after a power failure.

------
unwind
Very interesting! It's always fun to see "external" development in the general
field of computer architecture affect low-level stuff like a CPU's cache and
memory subsystems.

It wasn't super-easy to figure out _who_ in the grand ecosystem view of things
is going to have to care about these instructions, but I guess database and OS
folks.

Also, if the author reads this, the first block quote with instruction
descriptions has an editing fail, it repeats the same paragraph three times
(text begins "CLWB instruction is ordered only by store-fencing operations").

------
0x0
Isn't there a higher risk of data loss, if your "hard drive" is 100% memory
mapped - all it would take is one buggy kernel driver writing to an invalid
pointer or memset'ing the whole thing to 0?

~~~
signa11
well, the same is true now as well right ? for example, a buggy driver can
override a buffer-cache pointer with something else, and then you are hosed.
if you are playing in the kernel-land and not careful enough, you are courting
disaster...

~~~
0x0
True, but if it overruns a buffer, it still needs to maintain a valid
SCSI/ATAPI/whatever command packet format and submit the packet to the
controller with repeatedly increasing block numbers - that's a lot of
instructions, while something that clears the entire address space could
probably be done in 1-2 assembly instructions (mov rcx, -1; rep stosq)

------
jhallenworld
Support for non-voltile memory needs to be added to Linux. For example, one
should be able to map the non-volatile memory into user space and directly
access it. There needs to be some BIOS-OS interaction so that the OS doesn't
treat the non-volatile memory as general memory (for the likely case where
only some of the memory is non-volatile). Alternatively, the non-volatile
memory should be usable as a block device.

The non-volatile memory needs a layer of RAID-like volume management. For
example, when you transfer the memory from one system to another, there should
be a way to determine that the memory is inserted in the correct slots
(remember there is RAID like interleaving/striping across memory modules).

------
shaurz
How to solve the context switch overhead issue:
[https://www.destroyallsoftware.com/talks/the-birth-and-
death...](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-
javascript)

~~~
JoeAltmaier
How about: a cpu that has scores of hyperthreads? They don't block in the
kernel; they stall on a semaphore register bitmask. That mask can include
timer register matches another register; interrupt complete; event signaled.

Now I can do almost all of my I/o, timer and inter-process synchronization
without ever entering a kernel or swapping out thread context. I've been
waiting for this chip since the Z80.

~~~
rbanffy
While not exactly a chip (it never reached board stage) I designed a processor
in college where the register file was keyed to a task-id register. This way,
context switches could take no longer than an unconditional jump.

I dropped this feature when I switched to a single-task stack-based machine
(inspired by my adventures with GraFORTH - thank you, Paul Lutus). This ended
up being my graduation project.

