I spent some time digging into this for the fast, light, and easy-to-use replacement for Docker Desktop and (co)lima that I've been working on.
The root of the problem is that VM memory is allocated on demand, but never freed after it's used for the first time. In other words, once used, it can never be released. Since my app has a lighter userspace, it starts out using less memory than other VMs, but eventually reaches the memory limit given enough usage and time. (My optimized memory management setup means the VM works well with a lower memory limit than others, but it doesn't solve the fundamental problem.)
Linux has a feature called "page reporting" to report chunks of memory that are no longer used to the hypervisor, which can then drop the to reduce usage on the host side. WSL 2 actually uses this feature, but I suspect it becomes less effective with longer VM uptime because memory becomes more fragmented over time. Since Hyper-V has been limited to dropping contiguous 2 MiB chunks of memory until recently [1], fragmentation is likely the reason many users report high memory usage. Page cache is definitely a contributor as well, but a much easier one to fix. It looks like Microsoft is working on the problem with page reporting.
Although Apple's Virtualization.framework doesn't support page reporting, I was able to implement it with some workarounds and confirmed that this works with QEMU on Linux. Unfortunately, while free memory is correctly reported to macOS, nothing actually seems to get freed. I'm planning to report this to Apple because it seems like memory ballooning (essentially a more limited and primitive version of page reporting) doesn't work as documented, whether it's Virtualization.framework or another VMM like QEMU. If/when Apple fixes this, it'll be possible to reduce memory usage significantly. Details from my investigation into what's going on with memory management on the XNU side: https://twitter.com/kdrag0n/status/1612309883411640321
The good news: From my testing, the issue isn't as bad as it appears. The "free" memory tends to compress quite well, so XNU's memory compression does a good job at taking care of it when you're actually running low on memory.
(Shameless plug on this topic: The app I'm working on already has quite a few improvements over others: fast networking (30 Gbps), VirtioFS and bidirectional filesystem sharing, Rosetta for fast x86, full Linux (not only Docker), lower CPU usage, native Swift UI, and other tweaks. Email in bio for waitlist. Details to avoid spamming this thread: https://news.ycombinator.com/item?id=34374176)
> This feature is powered by a Linux kernel patch that allows small contiguous blocks of memory to be returned to the host machine when they are no longer needed in the Linux guest. We updated the Linux kernel in WSL2 to include this patch, and modified Hyper-V to support this page reporting feature. In order to return as much memory to the host as possible, we periodically compact memory to ensure free memory is available in contiguous blocks. This only runs when your CPU is idle. You can see when this happens by looking for the ‘WSL2: Performing memory compaction’ message inside of the output of the dmesg command.
I didn't realize they already had triggers for compaction, thanks for sharing! I suspect fragmentation is still a major issue (along with page cache management, which DAX should help with), so it'll be interesting to see if/how Microsoft improves memory management.
The root of the problem is that VM memory is allocated on demand, but never freed after it's used for the first time. In other words, once used, it can never be released. Since my app has a lighter userspace, it starts out using less memory than other VMs, but eventually reaches the memory limit given enough usage and time. (My optimized memory management setup means the VM works well with a lower memory limit than others, but it doesn't solve the fundamental problem.)
Linux has a feature called "page reporting" to report chunks of memory that are no longer used to the hypervisor, which can then drop the to reduce usage on the host side. WSL 2 actually uses this feature, but I suspect it becomes less effective with longer VM uptime because memory becomes more fragmented over time. Since Hyper-V has been limited to dropping contiguous 2 MiB chunks of memory until recently [1], fragmentation is likely the reason many users report high memory usage. Page cache is definitely a contributor as well, but a much easier one to fix. It looks like Microsoft is working on the problem with page reporting.
Although Apple's Virtualization.framework doesn't support page reporting, I was able to implement it with some workarounds and confirmed that this works with QEMU on Linux. Unfortunately, while free memory is correctly reported to macOS, nothing actually seems to get freed. I'm planning to report this to Apple because it seems like memory ballooning (essentially a more limited and primitive version of page reporting) doesn't work as documented, whether it's Virtualization.framework or another VMM like QEMU. If/when Apple fixes this, it'll be possible to reduce memory usage significantly. Details from my investigation into what's going on with memory management on the XNU side: https://twitter.com/kdrag0n/status/1612309883411640321
The good news: From my testing, the issue isn't as bad as it appears. The "free" memory tends to compress quite well, so XNU's memory compression does a good job at taking care of it when you're actually running low on memory.
(Shameless plug on this topic: The app I'm working on already has quite a few improvements over others: fast networking (30 Gbps), VirtioFS and bidirectional filesystem sharing, Rosetta for fast x86, full Linux (not only Docker), lower CPU usage, native Swift UI, and other tweaks. Email in bio for waitlist. Details to avoid spamming this thread: https://news.ycombinator.com/item?id=34374176)
[1] https://lkml.org/lkml/2022/9/30/81