
The final step for huge-page swapping - bitcharmer
https://lwn.net/SubscriberLink/758677/0d004feb1bbc862b/
======
mmt
> The advent of nonvolatile memory is changing the equation, though, and
> swapping is starting to look interesting again

It's not clear to me why the _nonvolatile_ part makes any difference to swap.
I'd expect it would make more sense to attach a volatile, RAM-based SSD [1],
maybe via PCIe or even once removed via storage fabric.

Perhaps it's just incidental, and it's merely price and/or capacity that's
important, considering that it's attached via the memory bus.

[1] Not that I've ever seen such a product with a high enough capacity for a
low enough price. It seems like it could be a way to recycle old and/or slow
RAM, maybe a startup idea, but I'm no hardware guy.

~~~
shawn
In the game and simulation industry, texture resolution is a problem. If you
run the math on how big textures need to be to match pixel resolution 1:1 in
3D space, it quickly becomes infeasable for large scenes.

One solution is to stream the mip levels, and to create virtualized mipmaps.
This basically lets you create a 128k by 128k texture, which works fine
because only small blocks are loaded. It’s literally the same thing as virtual
memory, but for texture memory.

One downside is that when you shut down the game / simulation, all of that
data no longer exists. It was virtual. It either streamed from the sever or
you packed it to disk somewhere. Either way you have to set it up again the
next time the program starts.

I don’t know whether the present topic would let you bypass this limitation,
but if there was some way to start a program with a huge amount of memory
allocations preloaded, that would be very attractive. And if you can _reboot_
without losing that, then suddenly you can start crafting persistent worlds
that load instantly with ~infinite resolution.

This is the dream that voxel tech was supposed to make into reality, and it
will only become more of a concern over time as mixed reality tech matures.

~~~
sterlind
the feature you're looking for is called opening a memory mapped file. I've
seen tremendous performance improvements, but it's a really underused feature
for some reason. Memory map Performance is incredible.

~~~
wtallis
Does Windows have any analog to madvise()? It seems like that's probably
necessary to making things work well when you're mapping huge data sets,
unless you're sure they reside on _really_ fast storage.

~~~
PeCaN
Windows 8 and later has PrefetchVirtualMemory[1]. Before that your best bet is
to get a little hacky and try reading the file asynchronously to get it in the
cache (Windows caches files pretty aggressively, so this more or less works).

[1] [https://msdn.microsoft.com/en-
us/library/windows/desktop/hh7...](https://msdn.microsoft.com/en-
us/library/windows/desktop/hh780543\(v=vs.85\).aspx)

~~~
senozhatsky
Seems that VirtualAlloc() [1] is a little bit closer to madvise(), but still
is pretty far. madvise() is quite powerful. For instance, take a look at
MADV_FREE.

[1] [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa366887\(v=vs.85\).aspx)

-ss

~~~
PeCaN
I figured the use-case here was MADV_WILLNEED, which is what
PrefetchVirtualMemory is for. VirtualAlloc flags would be for things like
reserving virtual pages without committing them.

There's no single API on Windows that does the myriad of things that madvise
does.

------
deckiedan
All those quoted performance improvements sound great - but what kind of Real
World workload or environment would one need to be in to see this in action?
Anywhere when swapping? Or does it need to be with special hardware? Would a
SSD backed low-memory VM be improved?

~~~
saas_co_de
All sorts of cloud hosting things would benefit from this. The ability to swap
in and out very fast allows you to run more containers or vms on the same
machine in cases where only a percentage of those instances are utilized at
any given time but even if one is swapped out it must be able to swap in and
respond with reasonable latency.

Any kind of "serverless" hosting or any architecture where you have a
container per user would benefit from this kind of development.

~~~
Baech8ei
If nvme swap gets faster too with huge pages then testing a whole microservice
swarm on a local dev machine might benefit from it when the currently idle
services can be swapped more swiftly while something chews on data.

------
snvzz
By using swap, determinism is bye-bye as disk access, unlike RAM, is not
deterministic.

I'm afraid of latency peaks caused by this, too.

~~~
lmm
> By using swap, determinism is bye-bye as disk access, unlike RAM, is not
> deterministic.

No, because what's driving this change is the rise of NVMe rather than disk,
which has consistent access times.

~~~
snvzz
Consistent doesn't imply deterministic.

~~~
gravypod
Most consumer DRAM is non-deterministic. A small fraction of users use tools
to provide high/stronger deterministic guarantees but issues like bit flips
and variable access latency still apply to DRAM although these issues are far
smaller in timescale and scope.

None the less I'm assuming this set of optimizations is targeted not only at
nvme storage but also 3D XPoint. Intel and Micron have done a lot of work in
this space and promise to provide non-volatile, low latency, and high density
in a single package. Micron has seemingly abandoned QuantX (their 3D XPoint)
but Intel has recently made some huge strides in this area which, if adopted,
will definitely push them to compete in this space.

Soon Intel Optane will be available in a DDR4-like package. It will still be
closer to disk-like than RAM like and that is why I think this effort to push
for large page swapping will be interesting. It being non-volatile also
provides some cool possibilities for the future if this tech becomes more
mainstream. Imagine being able to provision a portion of your swap that always
contains the state of certain programs you care about or certain always-used
libraries. Maybe even caching the entire OS in DRAM-like storage so that all
boots read directly from your non-volatile swap.

Also, as a very important aside, this is from the patches mentioned in this
article.

    
    
        From: Huang Ying <ying.huang@intel.com>

------
sumanthvepa
When the article mentions faster non-volatile memory, are they referring to
NVMe SSDs or Intel's Optane? Are SSDs that much faster to merit a new memory
management strategy?

~~~
sp332
Optane is fast enough to be qualitatively more similar to RAM than to disk.
[https://www.youtube.com/watch?v=cwy4ujt0qHM](https://www.youtube.com/watch?v=cwy4ujt0qHM)
The video shows that performance varies with workload differently from RAM
though, so I think having a separate class for it is appropriate.

~~~
imtringued
The blender benchmark is impressive. CPU utilisation increased from 30-50% to
60%-100% by using optane as swap vs a regular SSD. Optane is still a very
early product and it's beating conventional SSDs already.

------
PixyMisa
You seem to have put your subscriber link into the URL. Probably best to
remove it.

~~~
mkj
They're meant to be shared, good advertising for LWN.
[https://lwn.net/op/FAQ.lwn#slinks](https://lwn.net/op/FAQ.lwn#slinks)

