

Ultra KSM: transparent full-system memory deduplication for Linux - api
https://code.google.com/p/uksm/

======
hapless
Important notes:

\- RHEL 5 and derivatives include ksm right out of the box. There's a script
out there to make use of it: ksmd.

\- ksm breaks up transparent huge pages whenever a smaller page is found
inside of a huge page. (In other words, you can enable transparent huge pages
and ksmd at the same time, but ksmd is likely to negate the benefits of
hugepages. )

\- ksm is freaking dynamite -- it cut memory usage on my personal
virtualization host by a third.

~~~
joshu
What does transparent huge pages get you?

~~~
hapless
The OS presents a view of a flat memory space to userspace processes, but it's
not real. Memory is really segmented into 4 KB pages. Every time an
application accesses its flat memory space, there are layers of indirection to
trace before winding down to the real, physical page.

The system has a cache to store the indirection results, so that memory within
a cached page can be accessed quickly, without tracing the indirections.
Unfortunately, on x86 systems, this cache of indirection results (the TLB) is
very, very very small. One way to stretch the TLB is to allow the indvidual
TLB entries to handle more data by increasing the page size.

With transparent hugepages, if an application requests 32 MB of RAM, the
system can allocate a single "hugepage" of 32 MB. That "hugepage" occupies a
single entry in the TLB, instead of thousands of 4 KB pages. Fewer entries
eaten means fewer indirections traced when multiple processes are competing
for resources.

In other words, memory access under contention is faster and context switches
are less expensive when you have transparent hugepages enabled. That's a big
performance win for virtualization hosts, because they are likely to
experience a lot of contention.

------
wmf
The improved hash function might be a problem; IIRC KSM doesn't use hashing
because of VMware's patent.

~~~
fleitz
Perhaps they should use message authentication codes instead of hashing :)

------
rlpb
I wonder if hardware support for memory duplication is feasible. A de-
duplication controller could use DMA and only lock the bus when other devices
(eg. the CPU) isn't using it. It could then scan for duplicates with virtually
no overhead and interrupt the OS when it detects a hit.

As a further enhancement, could hardware pick up on memory writes and detect
duplication on the fly? What if a device kept track of a hash of all pages?
The concurrency possibilities in hardware allow for O(1) operation (if there
is enough silicon area available). Partial out-of-order page writes would have
to be queued for hash re-computation, but successful de-duplication is less
likely in these cases anyway.

