LHD: Improving Cache Hit Rate by Maximizing Hit Density [pdf]

jandrewrogers · on April 15, 2018

Cache replacement algorithms are fascinating and have always been one of my favorite research and development hobbies. I really like the LHD concept but the comparison to other algorithms makes it less informative for practical use cases than it could be. Specifically:

- It primarily compares against relatively weak, naive algorithms like LRU because (inexplicably) that is what many open source systems use. All of the high-performance closed source systems I've seen use better algorithms that are not represented here. Maybe an availability bias? Also, some great algorithms don't even have names AFAIK, just implementations, which doesn't lend them to easy reference.

- CPU cache efficiency of algorithm implementations is often highly relevant to real-world performance of cache replacement algorithms. This is frequently overlooked in the academic literature, which focuses on hit rate under various workloads. Many algorithms that produce good hit rates have poor throughput in practice due to CPU cache behavior; quite a bit of R&D at database companies focuses on minimizing CPU cache thrashing from these algorithms. A slight reduction in hit rate that substantially reduces CPU cache thrashing can be a big win. Cache throughput often comes from surprising places.

- Schedule-based replacement mechanics are ignored. Given some large number of concurrent in-flight operations against the cache, you can adaptively reschedule the order of operations to approximate Bélády's optimal algorithm. For many workloads you can get a step function improvement in cache throughput from this, and it applies to the aforementioned CPU cache efficiency as well. Design of schedule-optimized cache replacement algorithms is not trivial; the obvious way is NP-Hard, so the challenge is in finding efficient approximations.

Generally though, I like the formulation of LHD. It would be interesting to see how it sits in the broader scope of real-world cache replacement algorithm implementations. This is one of those areas in database computer science where the closed source design theory is richer than what makes it into the academic literature, so it is more difficult to figure out what the state-of-the-art even looks like.

NovaX · on April 15, 2018

Can you provide examples of closed-source / unnamed algorithms? Most of the advanced algorithms, like ARC, can only provide a modest improvement to LRU. LIRS does quite well, but is complex and usually poorly implemented. The newest generation of systems have begun adopting TinyLFU, with research focusing on making it adaptive. I would be interested in understanding the unpublished work (you can reach out privately if desired: ben.manes @ gmail).

tadkar · on April 15, 2018

Disclaimer: I’m definitely not an expert in this area and my understanding of the paper may be off.

I wonder if Jeff Dean (and google) have tried a neural network version of this. The core idea here seems similar to “learned index structures”

https://research.google.com/pubs/pub46518.html

Ninn · on April 15, 2018

I dont think this works well as an adaptive workload/case, training the network cant be done online and is a rather heavy task, as such it works well when you have a well defined pattern you can detect/train against.

Although kudos to the authors, i really like the intuition of this approach!

tadkar · on April 15, 2018

I think backprop can be made to work online and the cache gets instant feedback on its predictions, so I see no reason why this couldn’t be made to work. In some ways their conditional probability estimates are one form of classifier. It would be really interesting also to see how this idea plays with specialized NN training hardware like the TPU, it would certainly make the computation easier.

hinkley · on April 15, 2018

One of the http caches has a priority policy where you can prioritize small files over large ones under the presumption that for some workloads communication overhead (including setup tear down and connection pooling) dominates the cost. Storing ten small results may be less wear and tear on the system than one big one.

Which is essentially the same observation this policy is based on, but with gradations instead of a binary decision.

dormando · on April 15, 2018

Paper seems to miss the modified-2Q algorithm which was in memcached as of the cited 1.4.33 (became default in 1.5.0). Has very different performance characteristics than straight LRU.

Still need to give a more thorough read, but spotting other errors, ie; FB memc (according to their paper) balances pages by tail age, not hit rate.

Seems interesting though.

skyde · on April 16, 2018

It’s not clear from the paper how the rank is updated. If anyone understand better, could you explain.

The eviction is doing random sampling of N entry and is straight forward. But how does it compute the hit density online per entry.

yjh0502 · on April 15, 2018

Could the algorithm be applied to block cache problem with single item size, like filesystem block cache or database cache?

smaddox · on April 15, 2018

I was wondering the same thing. I suspect its value would decrease in those cases, due to the lower-level representation providing less semantic association, and therefore less information (per unit object) for modeling hit rate and lifetime. Also, the higher objects count would require more resources.

Edit: perhaps in a file system, the higher-level information could still be provided?

jandrewrogers · on April 15, 2018

Even within database caches, while the page size is often fixed you are still caching and operating on more abstract multi-page objects of variable size. In a good cache replacement algorithm there is some implicit awareness of the relative sizes of these objects to optimize the replacement policy.