The main issues it was able to solve for Aerospike were critical. It added debugging capabilities that allowed telling the difference between a memory leak vs. memory fragmentation (which can seem the same to an untrained eye). It mitigated and minimized fragmentation to a great degree, allowing better RAM utilization. It also allowed Aerospike to get the most out of multi-CPU, multi-core, multithreaded system performance.
This paper was a good read. I recently have done some work using libhugetlbfs to use huge pages for an application and you might want to consider using an Arena backed by huge pages. For very large structures TLB misses can really add up, and huge pages can improve performance by a large margin for workloads where lots of large buffers are being accessed and scanned. If tests suggest that you should go this route on your workload, I would suggest managing the pages using libhugetlbfs directly instead of transparent huge pages.
Thanks for the comments! I'll check out "libhugetblfs".
On the CentOS 6.4 / Linux 2.6.32-358.11.1 kernel systems described in the paper, Transparent Huge Pages were enabled, and according to all my investigations, THP was working automatically and effectively as expected.
Specifically, correlating the "AnonHugePages" numbers between "/proc/meminfo" (system-wide) and "/proc/<PID>/smaps" (for the Aerospike database daemon), showed substantial amounts of the process VM was in huge pages, and the page table space ("VmPTE" in "/proc/<PID>/status") was not excessive.
Do you have a compelling use-case / metrics for manually controlling huge pages vs. simply relying on THP managed by the kernel?