
LHD: Improving Cache Hit Rate by Maximizing Hit Density [pdf] - jsnell
http://www.cs.cmu.edu/~beckmann/publications/papers/2018.nsdi.lhd.pdf
======
jandrewrogers
Cache replacement algorithms are fascinating and have always been one of my
favorite research and development hobbies. I really like the LHD concept but
the _comparison_ to other algorithms makes it less informative for practical
use cases than it could be. Specifically:

\- It primarily compares against relatively weak, naive algorithms like LRU
because (inexplicably) that is what many open source systems use. All of the
high-performance closed source systems I've seen use better algorithms that
are not represented here. Maybe an availability bias? Also, some great
algorithms don't even have names AFAIK, just implementations, which doesn't
lend them to easy reference.

\- CPU cache efficiency of algorithm implementations is often highly relevant
to real-world performance of cache replacement algorithms. This is frequently
overlooked in the academic literature, which focuses on hit rate under various
workloads. Many algorithms that produce good hit rates have poor throughput in
practice due to CPU cache behavior; quite a bit of R&D at database companies
focuses on minimizing CPU cache thrashing from these algorithms. A slight
reduction in hit rate that substantially reduces CPU cache thrashing can be a
big win. Cache throughput often comes from surprising places.

\- Schedule-based replacement mechanics are ignored. Given some large number
of concurrent in-flight operations against the cache, you can adaptively
reschedule the order of operations to approximate Bélády's optimal algorithm.
For many workloads you can get a step function improvement in cache throughput
from this, and it applies to the aforementioned CPU cache efficiency as well.
Design of schedule-optimized cache replacement algorithms is not trivial; the
obvious way is NP-Hard, so the challenge is in finding efficient
approximations.

Generally though, I like the formulation of LHD. It would be interesting to
see how it sits in the broader scope of real-world cache replacement algorithm
implementations. This is one of those areas in database computer science where
the closed source design theory is richer than what makes it into the academic
literature, so it is more difficult to figure out what the state-of-the-art
even looks like.

~~~
NovaX
Can you provide examples of closed-source / unnamed algorithms? Most of the
advanced algorithms, like ARC, can only provide a modest improvement to LRU.
LIRS does quite well, but is complex and usually poorly implemented. The
newest generation of systems have begun adopting TinyLFU, with research
focusing on making it adaptive. I would be interested in understanding the
unpublished work (you can reach out privately if desired: ben.manes @ gmail).

------
tadkar
Disclaimer: I’m definitely not an expert in this area and my understanding of
the paper may be off.

I wonder if Jeff Dean (and google) have tried a neural network version of
this. The core idea here seems similar to “learned index structures”

[https://research.google.com/pubs/pub46518.html](https://research.google.com/pubs/pub46518.html)

~~~
Ninn
I dont think this works well as an adaptive workload/case, training the
network cant be done online and is a rather heavy task, as such it works well
when you have a well defined pattern you can detect/train against.

Although kudos to the authors, i really like the intuition of this approach!

~~~
tadkar
I think backprop can be made to work online and the cache gets instant
feedback on its predictions, so I see no reason why this couldn’t be made to
work. In some ways their conditional probability estimates are one form of
classifier. It would be really interesting also to see how this idea plays
with specialized NN training hardware like the TPU, it would certainly make
the computation easier.

------
hinkley
One of the http caches has a priority policy where you can prioritize small
files over large ones under the presumption that for some workloads
communication overhead (including setup tear down and connection pooling)
dominates the cost. Storing ten small results may be less wear and tear on the
system than one big one.

Which is essentially the same observation this policy is based on, but with
gradations instead of a binary decision.

------
dormando
Paper seems to miss the modified-2Q algorithm which was in memcached as of the
cited 1.4.33 (became default in 1.5.0). Has very different performance
characteristics than straight LRU.

Still need to give a more thorough read, but spotting other errors, ie; FB
memc (according to their paper) balances pages by tail age, not hit rate.

Seems interesting though.

------
skyde
It’s not clear from the paper how the rank is updated. If anyone understand
better, could you explain.

The eviction is doing random sampling of N entry and is straight forward. But
how does it compute the hit density online per entry.

------
yjh0502
Could the algorithm be applied to block cache problem with single item size,
like filesystem block cache or database cache?

~~~
smaddox
I was wondering the same thing. I suspect its value would decrease in those
cases, due to the lower-level representation providing less semantic
association, and therefore less information (per unit object) for modeling hit
rate and lifetime. Also, the higher objects count would require more
resources.

Edit: perhaps in a file system, the higher-level information could still be
provided?

