Another thought: As potentially paradigm changing technology like this becomes available will it ever make sense to redesign the OS?
However, iOS and Android have shown that it's possible to do away with this distinction even with a traditional OS running underneath. So I now tend to think that instead what will happen is more continual evolutionary changes at the OS level to work better in a "boot once" environment, rather than a revolution.
When Linux boots, the in memory state changes quite a bit. Even the actual code gets modified during boot. The whole process takes well under a second. Linux does support an “execute in place”, but it’s barely a win, and I don’t think it works on x86.
A more interesting idea is to put your OS installation on a DAX (direct access) filesystem.
That being said, operating systems like Linux tend to capture most of the value from these kind of advances - often by dint of being able to simply 'get out of the way' if a sufficiently important user space process wants access to the device.
But one would suspect that things have changed sufficiently from the 1970s to warrant a ground-up rethink. Core counts, distributed systems (the Plan 9 folks already too a swing at this in the 90s), nearly ubiquitous graphics/GPGPU accelerators, persistent memory, nearly ubiquitous access to 64-bit address spaces (at least for desktop and most phones) - you'd think something would change about design. I don't work in the area so I don't know what that is...
Traditional servers are persistent: they never turn off. 500+ days of uptime is typical. And today, with VMs which at worst... hibernate... it seems like "never turning off" might be the norm.
When OS, system, or library updates happen, you can easily launch replacement servers on the updated stack, put them in the rotation, and decommission the old ones. This is so much simpler than trying to run OS upgrades in-place across an entire fleet. The longer a machine has been running between reboots, the lower my belief in its odds of upgrading and restarting cleanly.
Further, this regularly tests your load balancing setup and pretty much fundamentally gives you capacity to scale up and down as load permits. Problems will be discovered early on, instead of during crunch time when you have to scale or when a few of your machines go offline during peak hours.
Security-wise, you don’t just get the benefit of fast, regular updates. But you also get assurances that users haven’t left stale data like unencrypted database exports, PII dumps, etc. lying around. Go on a long-lived machine some day and check out users’ home directories. That shit is a gold mine if someone who wants to do harm gets on your systems.
Not to mention regular reimaging makes it harder for an attacker to establish a permanent foothold in your infra.
None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.
But yeah, it's bad that an attacker has been able to get to a critical system, but it's a phenomenal defense if any of their beacons or remote access tools last at most a few hours or days before being wiped. This makes an attacker's life much harder.
On the contrary. Persistent memory means that infinite uptime is the future. Which, as you note, is difficult. Resetting the OS every now and then to a known state is a good practice, although disruptive to a lot of workflows.
If anything, I consider your post to be an argument AGAINST persistent memory.
A traditional spinning rust HDD has an effective latency of ~10 ms. The NVME version of the 3DXP has an effective latency of ~50 us, or two orders of magnitude better. Not sure how low the DIMM version will go, but maybe another order of magnitude?
If so, we're talking three orders of magnitude difference. That would radically affect the assumptions going into storage algorithms. Suddenly you can no longer spend millions of instructions trying to avoid I/O. Batching of I/O is also not needed to the same degree. Complex syncing of memory and disk is not needed. Etc etc.
RAM is only volatile on startup. Certainly not when a VM hibernates and comes back.
No it isn't. Anything that needs to survive a power cycle needs to go to non-volatile storage. And this is assumed to be very, very slow.
We seem to be doing pretty well with a bunch of abstractions from the 1970s, as well as with the idea of just building giant trapdoors into our hardware whenever these abstractions fail (e.g. most databases, DPDK in the network space, etc). It's not a crisis. It just seems like a pretty good time to do some basic OS research (aside from all the usual headwinds for that, e.g. massive complexity of underlying hardware, difficulty finding meaningful workloads for a "toy" OS, etc).
To think we formerly only had one devil in glibc. Now everything is constantly being updated and it's fine. We've moved on from the uptime as phallic measuring stick mantra. Patch, reboot, and stay secure.
Any operating system that's designed around this technology is probably going to look like a database.
Basically, boot to Postgres and all "files" are now SQL tables, stored in NVDIMM. Indexes are in DRAM, and critical nodes are in cache.
All data (system and user) is organized and opinionated: All photos are in a photo database, with tables for IPTC metadata. All music. All executable files. If you're browsing the web, it'll probably cache data in local SQL tables. etc..
I can envision using SQL stored procedures as actual apps, perhaps with an API to access graphics hardware, network, sound, etc..
100TB systems at RAM speeds are theoretically possible without this new memory. For,example 64-bit systems could easily provide enough address space.
The problem is practically speaking, server systems limit address bit capabilities quite often. And other problems still remain, not the least of which is the crazy price for 100TB of DDR4,physical slots, etc. The price would be crazy even for most enterprise projects.
So yes this new generation of memory will be disruptive, but also keep in mind even though it’s faster than SSDs, that’s not nearly enough. I’m not positive, but IIRC correctly it’s still 2 or 3 orders of magnitude slower than conventional memory.
Does that mean this new wave od persistent RAM it’s not useful and awesome? Not at all, I’ve already started using it.
But it does mean it’s still at the stage where you have to analyze your scenarios carefully, see if it’s a good for your architecture and environment, and benchmark your particular stack to verify assumptions and make sure it’s help you the best way it can.
Spark already works nicely with 100+TB datasets, and those can sit in memory across a thousand spot instances. Technology like tidalscale's hyperkernel can also merge together multiple systems into a single addressable memory space at the OS level so that you can run non-distributed applications across multiple commodity machines (like a reverse VM).
If 3d xpoint can give competitive price and speeds to tradional DRAM, then it will have a place in the market. Nobody has seen pricing yet nor benchmarks for these. For Intel however, this could increase their component share from CPU/chipset/network/storage to also include memory. That is pretty compelling since it's a market they haven't monetizes (not counting memory controllers) since the early days of Intel.
I am speculating, of course, but the whole Hammer 2 design of DF BSD emphasizes cross-machine 'database-like' file system, with built-in transparent state snapshots, state-branches, etc. .
So with this new type of persistent storage, DF's Hammer2 could erase the difference between 'persistent state' and in-memory-only state.
Therefore eliminating the need for reconciliations, application-specific backups, and application-specific distributed architectures.
Realistically, it's implications are much bigger for applications that depend heavily on persistent storage, like databases. They make tons of assumptions about persisting to block storage, whereas 3DXP could enable them to function entirely "in memory", so all that block storage specific optimization they have is now working against them. I'm just generalizing here, though.
Except that many apps are so buggy you have to restart them often in practice. NVM won't change that, sadly.
This is a huge PITA with mobile devices - I have no clue what code is, or isn't, being executed at any given time. Even if I force-kill an app, it has still most likely left some background service running, that will still use data, trigger GPS updates, wake the phone up, etc. What I wanted since the very day I first got my smartphone is to have PC-like control over applications.
In a perfect world of total ubiquity of wireless electricity, not to mention infinite CPU speeds and free and unlimited bandwidth, having everything running all the time in some way might be ok. As it is today, we still need the ability to kill software (and have it stay down), up to and including rebooting everything, to deal with obscure bugs in applications, OS and drivers. Not to mention being able to have some semblance of understanding of the device's state.
Currently RAM is not a compressible resource like CPU. However many applications don't have a fixed or if easily predictable RAM footprint and so you have to overprovision. Swap has been there to solve that but with its performance impact, it often can't be used for server applications.
These DIMMs will blur the boundary between memory and swap and make swap again viable.
He's not talking about actual data compression in RAM. Because even with compression, with current OSes, if you try to fit more than 20GB of data, let's say becoming 10GB compressed, into 5GB of RAM, it's not possible. You have to swap and at that point your performance is completely gone.
The performance gap between an overloaded CPU and swapping is humongous. One is annoying or slightly troublesome, the second is a death knell.
You shouldn't work in political marketing :-)
The possibilities are endless.
You could resume after pulling the plug as long as things are consistent. If you commit data in the wrong order you could have trouble!
Would be to save them when you notice power loss.
Using the IO bus instead of the memory bus is exactly why existing Optane products haven't delivered latency that's 1000x better than NAND flash. NVMe transactions take at least 5-10µs even with DRAM as the SSD media rather than NAND flash or 3D XPoint. Moving to the memory bus is a prerequisite to 3D XPoint fulfilling those original performance claims.