Hacker News new | past | comments | ask | show | jobs | submit login
Transparent Hugepages: measuring the performance impact (2017) (alexandrnikitin.github.io)
53 points by DyslexicAtheist on March 31, 2019 | hide | past | favorite | 20 comments



The Discourse project are using a tool called thpoff to disable transparent hugepages for their Ruby apps[0], and Ruby 2.6+ are disabling transparent hugepages for all applications[1].

[0] https://github.com/discourse/discourse_docker/blob/ade283329...

[1] https://bugs.ruby-lang.org/issues/14705


Worth noting that the main reason Rubyists are disabling THP is memory consumption, which is something the author didn't measure.


THP + Redis = disaster. Never do it... Redis will warn you at startup if it detects such condition. Also the LATENCY DOCTOR command will yell at you.


Is that still true with recent kernels? Around postgres we'd huge problems with earlier kernels with enabled THP, but with recent kernels its gotten to be a pretty small penalty.


Still true and will always because the incompatibility between Redis and huge pages is at a more fundamental level.


What is the incompatibility?



IOW, a bad persistence design in redis currently interacts with a suboptimal COW-after-fork implementation in linux.


There is nothing wrong in the Linux COW implementation. The problem is that 2MB pages are too much to duplicate for a single byte change. The 4kb page works very well to do that instead, and actually is very hard to do better in user-space with a different persistence model.

AFAIK only thing that could be improved in Linux is that when THP are enabled, it should split the 2MB page into 4kg pages and copy just that.


> The problem is that 2MB pages are too much to duplicate for a single byte change.

Splitting up such pages into their 4kb counterparts when dirtying a COWed huge page isn't particularly crazy. I don't think there's much reason to do that when huge pages are used, but given THP it seems pretty obvious that there's potential for substantial regressions without that feature.

> and actually is very hard to do better in user-space with a different persistence model.

Howso? It's not like management of a buffering layer over disks is a particularly new thing. There's plenty of different data stores that have persistence models that don't have the problems (uneven write rate, explosion in memory usage after fork, latency increases due to the increase in page misses, ...) of fork() and then writing out the data in the fork.


Here the problem is different: write on disk a point in time snapshot of what is in memory. Redis is not a common data store.


Maybe they could avoid this issue by cloning in smaller increments via memfd + mmap(..., MAP_PRIVATE)? That's assuming persistence doesn't fundamentally require atomic clone of the whole address space.


Which is, I believe, exactly what FreeBSD does.


Huge pages are especially valuable to a workload where you cannot predict/control the memory access pattern into a large buffer.

Transparent huge pages are great for applications like the JVM example from the article. It's not terribly difficult to localize it for one buffer/pool if that's what's right for your application.

> Do not blindly follow any recommendation on the Internet, please! Measure, measure and measure again!

Amen!


> Huge pages are especially valuable to a workload where you cannot predict/control the memory access pattern into a large buffer.

They're also especially valuable to workloads where you can predict the memory access pattern.


Sure, if they're TLB-bound.


Applications that benefit from huge pages should have/add support for them. Transparent huge pages are a hack.


They have the advantage that they can be used opportunistically. Huge pages require contiguous physical memory regions which may not always be available due to fragmentation. So the kernel converting allocations to huge pages on a best-effort basis gives you the benefit of huge pages without the hard errors and constraints that relying on huge pages reserved at boot time entails.

Also note that THP opt-in instead of opt-out is already possible. You just have to set the sysfs control to "madvise" instead of "always", many distros do this by default already.


The problem I have with THP is that while it initially looks great on our workload (yay! a core saved per server!), it often starts to degrade badly after several days or even many weeks depending on memory fragmentation and pressure.

It keeps getting better, maybe one day..


Just turn it off..




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: