Hacker News new | past | comments | ask | show | jobs | submit login
Intel’s Plans for 3DXP DIMMs Emerge (realworldtech.com)
126 points by dzaragozar 9 months ago | hide | past | web | favorite | 73 comments

It seems obvious in retrospect, but persistent memory adds a pretty exciting new advantage for persistent data structures.

Another thought: As potentially paradigm changing technology like this becomes available will it ever make sense to redesign the OS?

I used to think so - once you have persistent primary memory, you can install operating systems and applications directly into primary memory, so there's no longer any point in having a distinction between "install" and "run/boot".

However, iOS and Android have shown that it's possible to do away with this distinction even with a traditional OS running underneath. So I now tend to think that instead what will happen is more continual evolutionary changes at the OS level to work better in a "boot once" environment, rather than a revolution.

> so there's no longer any point in having a distinction between "install" and "run/boot".

When Linux boots, the in memory state changes quite a bit. Even the actual code gets modified during boot. The whole process takes well under a second. Linux does support an “execute in place”, but it’s barely a win, and I don’t think it works on x86.

A more interesting idea is to put your OS installation on a DAX (direct access) filesystem.

Seriously, yes. If there was ever a time to rethink OS design, surely this is it.

That being said, operating systems like Linux tend to capture most of the value from these kind of advances - often by dint of being able to simply 'get out of the way' if a sufficiently important user space process wants access to the device.

But one would suspect that things have changed sufficiently from the 1970s to warrant a ground-up rethink. Core counts, distributed systems (the Plan 9 folks already too a swing at this in the 90s), nearly ubiquitous graphics/GPGPU accelerators, persistent memory, nearly ubiquitous access to 64-bit address spaces (at least for desktop and most phones) - you'd think something would change about design. I don't work in the area so I don't know what that is...

> Seriously, yes. If there was ever a time to rethink OS design, surely this is it.


Traditional servers are persistent: they never turn off. 500+ days of uptime is typical. And today, with VMs which at worst... hibernate... it seems like "never turning off" might be the norm.

On the contrary, as a security professional I’d be thrilled if servers had a lifespan of hours instead of weeks or months. Reimaging VMs/containers/machines from scratch frequently gives so many advantages.

When OS, system, or library updates happen, you can easily launch replacement servers on the updated stack, put them in the rotation, and decommission the old ones. This is so much simpler than trying to run OS upgrades in-place across an entire fleet. The longer a machine has been running between reboots, the lower my belief in its odds of upgrading and restarting cleanly.

Further, this regularly tests your load balancing setup and pretty much fundamentally gives you capacity to scale up and down as load permits. Problems will be discovered early on, instead of during crunch time when you have to scale or when a few of your machines go offline during peak hours.

Security-wise, you don’t just get the benefit of fast, regular updates. But you also get assurances that users haven’t left stale data like unencrypted database exports, PII dumps, etc. lying around. Go on a long-lived machine some day and check out users’ home directories. That shit is a gold mine if someone who wants to do harm gets on your systems.

Not to mention regular reimaging makes it harder for an attacker to establish a permanent foothold in your infra.

None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

You forgot, frequent rebuilds kill off any intrusion as the world reflashes -- unless they get into IoT or microcontroller packages.

I did mention that it makes it harder for an attacker to keep a foothold in your infrastructure, but I think I wasn't as clear as I wanted to be.

But yeah, it's bad that an attacker has been able to get to a critical system, but it's a phenomenal defense if any of their beacons or remote access tools last at most a few hours or days before being wiped. This makes an attacker's life much harder.

> None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

On the contrary. Persistent memory means that infinite uptime is the future. Which, as you note, is difficult. Resetting the OS every now and then to a known state is a good practice, although disruptive to a lot of workflows.

If anything, I consider your post to be an argument AGAINST persistent memory.

Persistent memory might enable those sorts of uptimes, but it doesn't inherently mandate it.

But traditional operating systems still assume RAM contents is volatile (because currently it is), most filesystems assume disks are glacially slow etc.

A traditional spinning rust HDD has an effective latency of ~10 ms. The NVME version of the 3DXP has an effective latency of ~50 us, or two orders of magnitude better. Not sure how low the DIMM version will go, but maybe another order of magnitude?

If so, we're talking three orders of magnitude difference. That would radically affect the assumptions going into storage algorithms. Suddenly you can no longer spend millions of instructions trying to avoid I/O. Batching of I/O is also not needed to the same degree. Complex syncing of memory and disk is not needed. Etc etc.

> But traditional operating systems still assume RAM contents is volatile (because currently it is)

RAM is only volatile on startup. Certainly not when a VM hibernates and comes back.

> RAM is only volatile on startup.

No it isn't. Anything that needs to survive a power cycle needs to go to non-volatile storage. And this is assumed to be very, very slow.

I don't think "computers stay up a long time these days" is an argument against doing OS research on order-of-magnitude-faster, byte-addressable persistent storage.

We seem to be doing pretty well with a bunch of abstractions from the 1970s, as well as with the idea of just building giant trapdoors into our hardware whenever these abstractions fail (e.g. most databases, DPDK in the network space, etc). It's not a crisis. It just seems like a pretty good time to do some basic OS research (aside from all the usual headwinds for that, e.g. massive complexity of underlying hardware, difficulty finding meaningful workloads for a "toy" OS, etc).

Your ideas around uptime are still in the 70's. Systemd updates require reboots. Then there's Spectre and Meltdown BIOS updates, gotta reboot for those. Oh and the SSD and NIC firmware as well.

To think we formerly only had one devil in glibc. Now everything is constantly being updated and it's fine. We've moved on from the uptime as phallic measuring stick mantra. Patch, reboot, and stay secure.

Probably the optimal use of this technology is databases, since they rely most on random access to large persistent data.

Any operating system that's designed around this technology is probably going to look like a database.

Basically, boot to Postgres and all "files" are now SQL tables, stored in NVDIMM. Indexes are in DRAM, and critical nodes are in cache.

All data (system and user) is organized and opinionated: All photos are in a photo database, with tables for IPTC metadata. All music. All executable files. If you're browsing the web, it'll probably cache data in local SQL tables. etc..

I can envision using SQL stored procedures as actual apps, perhaps with an API to access graphics hardware, network, sound, etc..

The entire information world is either a database or a cache (or communication between them), layered on top of each other over and over. Every new storage technology typically ends up being yet another layer as either database or cache (or both). This case is pretty unique in that it can actually serve to remove a layer: ram (typically a cache) is not necessary if nvs is viable at the same speeds. But in general new storage tech just adds another layer, which the software world reacts to by rushing in as if to fill a void by creating new software to take advantage of it which ends up being... another database or cache, often with similar tradeoffs to the layers of cache/database surrounding it. In the limit, I see more and more layers of cache/database until they merge into some kind of continuous data/cache field with a continuous tradeoff gradient between size and latency.

Hey, that sounds line a mainframe... I wonder what it'll take to get zOS running on commodity.

Mainframes have always led the way in computer architecture.

And yet z systems have now moved to PCI express, an interconnect designed for desktop PCs

They can afford to, considering how much one costs :)

Persistent dara structures? Well yes, but that’s just the tip of the iceberg. There other scenarios like powerful real-time analytics that could benefit from 100TB of RAM immediately.

100TB systems at RAM speeds are theoretically possible without this new memory. For,example 64-bit systems could easily provide enough address space.

The problem is practically speaking, server systems limit address bit capabilities quite often. And other problems still remain, not the least of which is the crazy price for 100TB of DDR4,physical slots, etc. The price would be crazy even for most enterprise projects.

So yes this new generation of memory will be disruptive, but also keep in mind even though it’s faster than SSDs, that’s not nearly enough. I’m not positive, but IIRC correctly it’s still 2 or 3 orders of magnitude slower than conventional memory.

Does that mean this new wave od persistent RAM it’s not useful and awesome? Not at all, I’ve already started using it.

But it does mean it’s still at the stage where you have to analyze your scenarios carefully, see if it’s a good for your architecture and environment, and benchmark your particular stack to verify assumptions and make sure it’s help you the best way it can.

There will, almost inevitably, be someone who needs 101TB of memory. Then you get back to the same place where you need to scale out instead of up. If you asked cloud architects for cheaper, lower latency network or faster more expemsive storage you'd probably get the former most of the time.

Spark already works nicely with 100+TB datasets, and those can sit in memory across a thousand spot instances. Technology like tidalscale's hyperkernel can also merge together multiple systems into a single addressable memory space at the OS level so that you can run non-distributed applications across multiple commodity machines (like a reverse VM).

If 3d xpoint can give competitive price and speeds to tradional DRAM, then it will have a place in the market. Nobody has seen pricing yet nor benchmarks for these. For Intel however, this could increase their component share from CPU/chipset/network/storage to also include memory. That is pretty compelling since it's a market they haven't monetizes (not counting memory controllers) since the early days of Intel.

I would also think, OSes that tend to emphasize their primary file systems as 'the distributed memory', like DragonFly BSD -- would benefit significantly.

I am speculating, of course, but the whole Hammer 2 design of DF BSD emphasizes cross-machine 'database-like' file system, with built-in transparent state snapshots, state-branches, etc. [1].

So with this new type of persistent storage, DF's Hammer2 could erase the difference between 'persistent state' and in-memory-only state.

Therefore eliminating the need for reconciliations, application-specific backups, and application-specific distributed architectures.

[1] http://apollo.backplane.com/DFlyMisc/hammer2.txt

> Another thought: As potentially paradigm changing technology like this becomes available will it ever make sense to redesign the OS?

Realistically, it's implications are much bigger for applications that depend heavily on persistent storage, like databases. They make tons of assumptions about persisting to block storage, whereas 3DXP could enable them to function entirely "in memory", so all that block storage specific optimization they have is now working against them. I'm just generalizing here, though.

Zero serialization. Imagine installing a program and always having it "running". It may be swapped out but littetally everything in it is ready to go when you switch to it's window.

We have that right now, do we not? You don't have to quit apps except on reboot.

Except that many apps are so buggy you have to restart them often in practice. NVM won't change that, sadly.

I'd still want a way to kill it.

This is a huge PITA with mobile devices - I have no clue what code is, or isn't, being executed at any given time. Even if I force-kill an app, it has still most likely left some background service running, that will still use data, trigger GPS updates, wake the phone up, etc. What I wanted since the very day I first got my smartphone is to have PC-like control over applications.

In a perfect world of total ubiquity of wireless electricity, not to mention infinite CPU speeds and free and unlimited bandwidth, having everything running all the time in some way might be ok. As it is today, we still need the ability to kill software (and have it stay down), up to and including rebooting everything, to deal with obscure bugs in applications, OS and drivers. Not to mention being able to have some semblance of understanding of the device's state.

Leaking memory would be dangerous.

While there are some analytics workloads that will benefit tremendously, the main use case will be improving server utilization.

Currently RAM is not a compressible resource like CPU. However many applications don't have a fixed or if easily predictable RAM footprint and so you have to overprovision. Swap has been there to solve that but with its performance impact, it often can't be used for server applications.

These DIMMs will blur the boundary between memory and swap and make swap again viable.

I dont get your logic. CPUs have a finite number of instructions they can do in a timeframe, its not compressible. In the other one someone could compress the memory, works great for storage. Sure, it‘ll be slower but compressing seldomly used memory pages like macOS does is indeed possible

I think his point is that you can "run" a large amount of applications at the same time on a CPU. It will execute everything, albeit slowly. This might not be acceptable for performance concerns, but it's doable.

He's not talking about actual data compression in RAM. Because even with compression, with current OSes, if you try to fit more than 20GB of data, let's say becoming 10GB compressed, into 5GB of RAM, it's not possible. You have to swap and at that point your performance is completely gone.

The performance gap between an overloaded CPU and swapping is humongous. One is annoying or slightly troublesome, the second is a death knell.

> and make swap again viable

You shouldn't work in political marketing :-)

One interesting thing for databases is that as nonvolatile storage latency decreases, traditional btrees get more attractive relative to newer log-structured designs. Especially if the write endurance is increased as well over current SSDs.

Or to put it another way: there's not a lot of reason to have two layers of log-structured storage. Your SSD already needs its own log-structured flash translation layer, and if that's tuned properly for your database workload, then another layer of the same kind of thing may not help much.

There's so much amazing stuff I could do with this. Imagine persistent redis? Huge huge pages? Booting from a DIM?

The possibilities are endless.

Why can't we just put a battery onto DRAM that maintains state if the power goes out, and be done with it?

That is how storage worked on PDAs back in the day with volitile memory. Let's the batteries completely die or change them incorrectly and you lost your data. Let's not go back to those days!

DRAM is not that dense; you can fit 128 GB of DRAM or 512 GB of XPoint on a DIMM. XPoint is also supposed to be cheaper than DRAM.

Because the power consumption to keep DRAM refreshed is fairly high so you'd need a pretty big battery, and because it would still be more expensive than 3DXP. It's just not practical for most use cases.

I figured that, but can we have some numbers here?

We do on RAID cards. It has limits.


Battery backed DRAMM has been around for a decade or more.

You mean like suspend-to-RAM?

I guess in a theoretical NVM-only system you could pull the plug at any time, and instantly resume it when the power is back on? If I'm reading right though the latency of 3DXP is somewhere in the 10-20us ballpark, still 100-1000x slower than DRAM.

Yes but it's also cheaper than DRAM.

You could resume after pulling the plug as long as things are consistent. If you commit data in the wrong order you could have trouble!

The CPU has internal state as well that won’t be persisted - at the very least the registers.

Would be to save them when you notice power loss.

no, unless Everything else in your system retains state, that includes all the registers in every single chipset/controller/processor.

Price. Remember the current DRAM is 2.5x the price of what is was two to three years ago. So the XP DIMM being 4x cheaper then DRAM Now isn't that much different if DRAM dropped back to its median level.

Is there any tech that's faster than DRAM and cheaper than SRAM? There's a need to fill that gap.

L2/L3 cache?

That's SRAM usually.

website times out. Also, this story hasnt been reported by any other news source. Is there some other site I can see the story at?

It's not a news story...mostly analysis/predictions. (If you're talking about generic announcements of XPoint DIMMs that was in the news at the end of may: https://www.anandtech.com/show/12828/intel-launches-optane-d...)

Website should be up, I just rebooted AWS :)

Oh don't do that. We have other shit running on it.

I have coworkers who believe AWS is "Amazon WorkSpace" and complain "my AWS is slow, please reboot my AWS."

Unfortunately it’s still timing out :(

Should work now. apache was getting freaked out.

Yup it works, thanks!

Found a copy freshly archived today @ archive.org:


Sadly this only archived the first of four pages.


Thank you for being a gentleman (or gentlewoman)!

On a few occasions I've found myself in the presence of a senior engineer at a large defense company who would never stop talking about how persistent memory will change everything forever. Fair enough, but he'd go on about it in the weirdest ways. I think his impression is that the CPU registers would also be nonvolatile. I'm concerned that guy might be a few electrons short of a full orbital.

Saving and restoring CPU registers and flushing caches into non-volatile memory doesn't require much time or energy.

Specifically, you can normally do something like that between when you notice the power dropping and when it drains too much and you have to shut down.

This is perfectly possible. I used to work on something that did just this but at the time used a small battery and an SSD to quickly dump the volatile state before power loss. This meant we had to limit the amount of volatile data that could be stored in ram (due to the transfer rate and up time on the battery). We were eagerly awaiting 3DXP DIMMs so that we could remove that limit. It really will have a big impact on critical systems where any data loss is not acceptable.

Shouldn't be hard, all they have to do is a light context switch to idle thread on power loss interrupt and make the idle thread externally re-entrable.

The initial release was really underwhelming, given the hype around this. So my personal (uninformed) expectations is just incremental improvement to the initial product.

Moving 3D XPoint memory from the peripheral IO bus to the memory bus is way more than an incremental improvement.

Says the company that claimed it would be 1000x faster than NAND flash, it isn't, and moving the location of the bus isn't going to change that.

> Says the company that claimed it would be 1000x faster than NAND flash, it isn't, and moving the location of the bus isn't going to change that.

Using the IO bus instead of the memory bus is exactly why existing Optane products haven't delivered latency that's 1000x better than NAND flash. NVMe transactions take at least 5-10µs even with DRAM as the SSD media rather than NAND flash or 3D XPoint. Moving to the memory bus is a prerequisite to 3D XPoint fulfilling those original performance claims.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact