The Discourse project are using a tool called thpoff to disable transparent hugepages for their Ruby apps[0], and Ruby 2.6+ are disabling transparent hugepages for all applications[1].
Is that still true with recent kernels? Around postgres we'd huge problems with earlier kernels with enabled THP, but with recent kernels its gotten to be a pretty small penalty.
There is nothing wrong in the Linux COW implementation. The problem is that 2MB pages are too much to duplicate for a single byte change. The 4kb page works very well to do that instead, and actually is very hard to do better in user-space with a different persistence model.
AFAIK only thing that could be improved in Linux is that when THP are enabled, it should split the 2MB page into 4kg pages and copy just that.
> The problem is that 2MB pages are too much to duplicate for a single byte change.
Splitting up such pages into their 4kb counterparts when dirtying a COWed huge page isn't particularly crazy. I don't think there's much reason to do that when huge pages are used, but given THP it seems pretty obvious that there's potential for substantial regressions without that feature.
> and actually is very hard to do better in user-space with a different persistence model.
Howso? It's not like management of a buffering layer over disks is a particularly new thing. There's plenty of different data stores that have persistence models that don't have the problems (uneven write rate, explosion in memory usage after fork, latency increases due to the increase in page misses, ...) of fork() and then writing out the data in the fork.
Maybe they could avoid this issue by cloning in smaller increments via memfd + mmap(..., MAP_PRIVATE)? That's assuming persistence doesn't fundamentally require atomic clone of the whole address space.
Huge pages are especially valuable to a workload where you cannot predict/control the memory access pattern into a large buffer.
Transparent huge pages are great for applications like the JVM example from the article. It's not terribly difficult to localize it for one buffer/pool if that's what's right for your application.
> Do not blindly follow any recommendation on the Internet, please! Measure, measure and measure again!
They have the advantage that they can be used opportunistically. Huge pages require contiguous physical memory regions which may not always be available due to fragmentation. So the kernel converting allocations to huge pages on a best-effort basis gives you the benefit of huge pages without the hard errors and constraints that relying on huge pages reserved at boot time entails.
Also note that THP opt-in instead of opt-out is already possible. You just have to set the sysfs control to "madvise" instead of "always", many distros do this by default already.
The problem I have with THP is that while it initially looks great on our workload (yay! a core saved per server!), it often starts to degrade badly after several days or even many weeks depending on memory fragmentation and pressure.
[0] https://github.com/discourse/discourse_docker/blob/ade283329...
[1] https://bugs.ruby-lang.org/issues/14705