Efficiently using memory when you have three levels (Hypervisor->OS->JVM) that are largely opaque to each other is an real challenge. Doing ballooning at the JVM level [0] instead of the OS level makes sense for VMs primarily running Java applications. VMware often seems to be the first to get a new technology out, but their competition quickly follows (e.g. KSM for transparent page sharing and compcache for memory compression). I wonder if Oracle or Red Hat are working on their own JVM ballooning implementation.
[Disclosure: I worked at VMware from 2002 through 2011.]
I take issue with the claim that the competition "quickly followed" on for (at the least) the transparent page sharing feature. VMware was shipping transparent page sharing in their type 1 hypervisor since ESX 1.5, which was released before I joined the company.
KSM was first proposed in 2008, and I believe it didn't actually ship until 2009 (note: the delay was at least in part because the developers wanted to avoid the possibility of exposure to patent litigation since VMware held a patent on the technology, software patents are evil, blah blah).
You can tell a similar story for VMotion/live migration of running virtual machines; VMware first shipped it in 2003 and it was at least 2007 before any competing hypervisors were shipping a similar feature (Hyper-V didn't have it until 2008 R2, Xen had it sooner - possibly in 2007?).
Azul Systems has been marketing a different solution to this same problem in the Zing VMs for some time now. I believe their solution is based on adding extra APIs to the underlying kernel etc. so the JVM's heap isn't so opaque to the OS/hypervisor.
The VMWare solution sounds like a simpler, pragmatic "hack" to solve the problem. It would be interesting to hear from someone that has used both technologies to see how they compare in practice.
As I understand it, Azul rather solves this problem by giving control of the memory-mapping unit to the VM. Which allows some really neat tricks (I'd expect it also allows VM->kernel privilege escalation, but their machines are Java appliances anyway.)
Is it just me or is running a VM in a VM just crazy? The application stack now looks like
OS -> Hypervisor -> VM -> OS -> VM -> App
Where "App" might be a service so you might have this stack duplicated dozens of times for an actual app that a user can use. And you don't save anything either, rather than "processes" your sysadmins manage "VMs". What happened to
Yes it bothers me a bit. I think the 'everything to a VM' culture at the moment unfortunately misses doing isolation in a lighter way. (jails,zones,lxc,lguest etc). Some of those are not quite mature but still. Relatedly sometimes license requirements are per cpu, and the license accepts vmware restrictions but not cgroup or resource manager type restrictions.
Or you could just run the JVMs on the base OS and use good old file system security if you really need sandboxing. The supposed advantages of adding a OS VM layer seem iffy if you're just running one app per.
It always amazed me that the JVM doesn't have any options to restrict CPU resources (something VMware does have). The only alternative I can think of is to use the unix taskset utility to limit JVMs to specific CPU cores. Anybody got experiece with a setup like this?
Limiting memory usage and file permissions with the JVM is indeed easy.
Create a cgroup hierarchy with the CPU subsystem attached, then put each JVM in its own cgroup in that hierarchy. By default, each JVM will get an equal amount of CPU time. A JVM can be given relatively more CPU time by increasing cpu.shares for its cgroup. You can similarly manage memory and i/o with the memory and blkio subsystems.
There are a few ways to solve this; the simplest might be to replace the OS with a library (a libos) linked into the JVM that implements whatever the JVM needs from an OS API in terms of what a specific VM offers. This is the idea of the MIT Exokernel Operating System, XOK.
[0] http://pubs.vmware.com/vfabric5/topic/.../vfabric-tc-server-...