Transparent Hugepages: measuring the performance impact (2017)

kawsper · on March 31, 2019

The Discourse project are using a tool called thpoff to disable transparent hugepages for their Ruby apps[0], and Ruby 2.6+ are disabling transparent hugepages for all applications[1].

[0] https://github.com/discourse/discourse_docker/blob/ade283329...

[1] https://bugs.ruby-lang.org/issues/14705

nateberkopec · on March 31, 2019

Worth noting that the main reason Rubyists are disabling THP is memory consumption, which is something the author didn't measure.

antirez · on March 31, 2019

THP + Redis = disaster. Never do it... Redis will warn you at startup if it detects such condition. Also the LATENCY DOCTOR command will yell at you.

anarazel · on March 31, 2019

Is that still true with recent kernels? Around postgres we'd huge problems with earlier kernels with enabled THP, but with recent kernels its gotten to be a pretty small penalty.

antirez · on March 31, 2019

Still true and will always because the incompatibility between Redis and huge pages is at a more fundamental level.

justincormack · on March 31, 2019

What is the incompatibility?

hashhar · on March 31, 2019

https://redis.io/topics/latency#latency-induced-by-transpare...

anarazel · on March 31, 2019

IOW, a bad persistence design in redis currently interacts with a suboptimal COW-after-fork implementation in linux.

antirez · on March 31, 2019

There is nothing wrong in the Linux COW implementation. The problem is that 2MB pages are too much to duplicate for a single byte change. The 4kb page works very well to do that instead, and actually is very hard to do better in user-space with a different persistence model.

AFAIK only thing that could be improved in Linux is that when THP are enabled, it should split the 2MB page into 4kg pages and copy just that.

anarazel · on March 31, 2019

> The problem is that 2MB pages are too much to duplicate for a single byte change.

Splitting up such pages into their 4kb counterparts when dirtying a COWed huge page isn't particularly crazy. I don't think there's much reason to do that when huge pages are used, but given THP it seems pretty obvious that there's potential for substantial regressions without that feature.

> and actually is very hard to do better in user-space with a different persistence model.

Howso? It's not like management of a buffering layer over disks is a particularly new thing. There's plenty of different data stores that have persistence models that don't have the problems (uneven write rate, explosion in memory usage after fork, latency increases due to the increase in page misses, ...) of fork() and then writing out the data in the fork.

antirez · on March 31, 2019

Here the problem is different: write on disk a point in time snapshot of what is in memory. Redis is not a common data store.

the8472 · on March 31, 2019

Maybe they could avoid this issue by cloning in smaller increments via memfd + mmap(..., MAP_PRIVATE)? That's assuming persistence doesn't fundamentally require atomic clone of the whole address space.

trasz · on March 31, 2019

Which is, I believe, exactly what FreeBSD does.

wyldfire · on March 31, 2019

Huge pages are especially valuable to a workload where you cannot predict/control the memory access pattern into a large buffer.

Transparent huge pages are great for applications like the JVM example from the article. It's not terribly difficult to localize it for one buffer/pool if that's what's right for your application.

> Do not blindly follow any recommendation on the Internet, please! Measure, measure and measure again!

Amen!

coherentpony · on March 31, 2019

> Huge pages are especially valuable to a workload where you cannot predict/control the memory access pattern into a large buffer.

They're also especially valuable to workloads where you can predict the memory access pattern.

wyldfire · on April 1, 2019

Sure, if they're TLB-bound.

karmakaze · on March 31, 2019

Applications that benefit from huge pages should have/add support for them. Transparent huge pages are a hack.

the8472 · on March 31, 2019

They have the advantage that they can be used opportunistically. Huge pages require contiguous physical memory regions which may not always be available due to fragmentation. So the kernel converting allocations to huge pages on a best-effort basis gives you the benefit of huge pages without the hard errors and constraints that relying on huge pages reserved at boot time entails.

Also note that THP opt-in instead of opt-out is already possible. You just have to set the sysfs control to "madvise" instead of "always", many distros do this by default already.

atomt · on April 1, 2019

The problem I have with THP is that while it initially looks great on our workload (yay! a core saved per server!), it often starts to degrade badly after several days or even many weeks depending on memory fragmentation and pressure.

It keeps getting better, maybe one day..

dboreham · on March 31, 2019

Just turn it off..