What is a VMM in this context? The documentation mostly left me more confused th...

jarusl · on June 9, 2019

Functionally it would be very similar to KVM/Xen. The goal was to be able to run full virtualized OS environments on large scale HPC systems to allow users to deploy non-standard OS images. This was before containerization with Shifter and Singularity came along. Existing solutions at the time (KVM, Xen, etc...) were all designed around commodity datacenter/cloud environments which had different design goals, mainly high levels of consolidation. The decisions made as a result were not optimal in an HPC environment, since consolidation on those systems wasn't really in the picture at the time. The Palacios project looked at how to minimize overheads in the VMM architecture while still delivering the features that would be useful in the HPC context.

walterbell · on June 9, 2019

Are there examples of HPC-related VMM improvements that could be upstreamed to KVM/Xen or CPU silicon/microcode?

jarusl · on June 9, 2019

I'm not sure there is much that can be directly upstreamed, since the main outcomes were more high level design ideas than localized improvements.

One of the take-aways from the project was that for a subset of workloads and environments it would be very nice if the underlying system supported large coarse grained resource allocations that could be managed independently. This isn't a new observation by any means, but its benefits are clear in a virtualized environment.

For instance with memory, one of the advantages Palacios had was that it preallocated a VM's memory at initialization using a set of large contiguous physical memory regions. This allowed it to represent the majority of the underlying memory map in a static and compact form that could be queried in constant time without locking. Nested page tables were managed independently and lock free on each core using local memory which, among other things, avoided NUMA crossings during page table walks.

Large physically contiguous memory allocations in Linux have never been well supported, and Palacios relied on memory hotplug to achieve it. This is certainly not a great way to do it, but there really wasn't another way to dynamically allocate large contiguous memory regions (hundreds of megabytes) at runtime.

Carrying that through to the hardware side, it would be nice to have an alternative to nested page tables that resembled something more like segmentation.

jchw · on June 9, 2019

Virtual Machine Monitor, often also called a Hypervisor (though this is only correct some of the time, specifically, when not running in a hosted environment.)