
Transparent Hugepages: measuring the performance impact (2017) - DyslexicAtheist
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/
======
kawsper
The Discourse project are using a tool called thpoff to disable transparent
hugepages for their Ruby apps[0], and Ruby 2.6+ are disabling transparent
hugepages for all applications[1].

[0]
[https://github.com/discourse/discourse_docker/blob/ade283329...](https://github.com/discourse/discourse_docker/blob/ade283329cae98177ef2578a801c557eac36e81f/image/base/thpoff.c)

[1] [https://bugs.ruby-lang.org/issues/14705](https://bugs.ruby-
lang.org/issues/14705)

~~~
nateberkopec
Worth noting that the main reason Rubyists are disabling THP is memory
consumption, which is something the author didn't measure.

------
antirez
THP + Redis = disaster. Never do it... Redis will warn you at startup if it
detects such condition. Also the LATENCY DOCTOR command will yell at you.

~~~
anarazel
Is that still true with recent kernels? Around postgres we'd huge problems
with earlier kernels with enabled THP, but with recent kernels its gotten to
be a pretty small penalty.

~~~
antirez
Still true and will always because the incompatibility between Redis and huge
pages is at a more fundamental level.

~~~
justincormack
What is the incompatibility?

~~~
hashhar
[https://redis.io/topics/latency#latency-induced-by-
transpare...](https://redis.io/topics/latency#latency-induced-by-transparent-
huge-pages)

~~~
anarazel
IOW, a bad persistence design in redis currently interacts with a suboptimal
COW-after-fork implementation in linux.

~~~
antirez
There is nothing wrong in the Linux COW implementation. The problem is that
2MB pages are too much to duplicate for a single byte change. The 4kb page
works very well to do that instead, and actually is very hard to do better in
user-space with a different persistence model.

AFAIK only thing that could be improved in Linux is that when THP are enabled,
it should split the 2MB page into 4kg pages and copy just that.

~~~
anarazel
> The problem is that 2MB pages are too much to duplicate for a single byte
> change.

Splitting up such pages into their 4kb counterparts when dirtying a COWed huge
page isn't particularly crazy. I don't think there's much reason to do that
when huge pages are used, but given THP it seems pretty obvious that there's
potential for substantial regressions without that feature.

> and actually is very hard to do better in user-space with a different
> persistence model.

Howso? It's not like management of a buffering layer over disks is a
particularly new thing. There's plenty of different data stores that have
persistence models that don't have the problems (uneven write rate, explosion
in memory usage after fork, latency increases due to the increase in page
misses, ...) of fork() and then writing out the data in the fork.

~~~
antirez
Here the problem is different: write on disk a point in time snapshot of what
is in memory. Redis is not a common data store.

------
wyldfire
Huge pages are especially valuable to a workload where you cannot
predict/control the memory access pattern into a large buffer.

Transparent huge pages are great for applications like the JVM example from
the article. It's not _terribly_ difficult to localize it for one buffer/pool
if that's what's right for your application.

> Do not blindly follow any recommendation on the Internet, please! Measure,
> measure and measure again!

Amen!

~~~
coherentpony
> Huge pages are especially valuable to a workload where you cannot
> predict/control the memory access pattern into a large buffer.

They're also especially valuable to workloads where you can predict the memory
access pattern.

~~~
wyldfire
Sure, if they're TLB-bound.

------
karmakaze
Applications that benefit from huge pages should have/add support for them.
Transparent huge pages are a hack.

~~~
the8472
They have the advantage that they can be used opportunistically. Huge pages
require contiguous physical memory regions which may not always be available
due to fragmentation. So the kernel converting allocations to huge pages on a
best-effort basis gives you the benefit of huge pages without the hard errors
and constraints that relying on huge pages reserved at boot time entails.

Also note that THP opt-in instead of opt-out is already possible. You just
have to set the sysfs control to "madvise" instead of "always", many distros
do this by default already.

------
atomt
The problem I have with THP is that while it initially looks great on our
workload (yay! a core saved per server!), it often starts to degrade badly
after several days or even many weeks depending on memory fragmentation and
pressure.

It keeps getting better, maybe one day..

------
dboreham
Just turn it off..

