Cache replacement algorithms are fascinating and have always been one of my favorite research and development hobbies. I really like the LHD concept but the comparison to other algorithms makes it less informative for practical use cases than it could be. Specifically:
- It primarily compares against relatively weak, naive algorithms like LRU because (inexplicably) that is what many open source systems use. All of the high-performance closed source systems I've seen use better algorithms that are not represented here. Maybe an availability bias? Also, some great algorithms don't even have names AFAIK, just implementations, which doesn't lend them to easy reference.
- CPU cache efficiency of algorithm implementations is often highly relevant to real-world performance of cache replacement algorithms. This is frequently overlooked in the academic literature, which focuses on hit rate under various workloads. Many algorithms that produce good hit rates have poor throughput in practice due to CPU cache behavior; quite a bit of R&D at database companies focuses on minimizing CPU cache thrashing from these algorithms. A slight reduction in hit rate that substantially reduces CPU cache thrashing can be a big win. Cache throughput often comes from surprising places.
- Schedule-based replacement mechanics are ignored. Given some large number of concurrent in-flight operations against the cache, you can adaptively reschedule the order of operations to approximate Bélády's optimal algorithm. For many workloads you can get a step function improvement in cache throughput from this, and it applies to the aforementioned CPU cache efficiency as well. Design of schedule-optimized cache replacement algorithms is not trivial; the obvious way is NP-Hard, so the challenge is in finding efficient approximations.
Generally though, I like the formulation of LHD. It would be interesting to see how it sits in the broader scope of real-world cache replacement algorithm implementations. This is one of those areas in database computer science where the closed source design theory is richer than what makes it into the academic literature, so it is more difficult to figure out what the state-of-the-art even looks like.
Can you provide examples of closed-source / unnamed algorithms? Most of the advanced algorithms, like ARC, can only provide a modest improvement to LRU. LIRS does quite well, but is complex and usually poorly implemented. The newest generation of systems have begun adopting TinyLFU, with research focusing on making it adaptive. I would be interested in understanding the unpublished work (you can reach out privately if desired: ben.manes @ gmail).
I dont think this works well as an adaptive workload/case, training the network cant be done online and is a rather heavy task, as such it works well when you have a well defined pattern you can detect/train against.
Although kudos to the authors, i really like the intuition of this approach!
I think backprop can be made to work online and the cache gets instant feedback on its predictions, so I see no reason why this couldn’t be made to work. In some ways their conditional probability estimates are one form of classifier. It would be really interesting also to see how this idea plays with specialized NN training hardware like the TPU, it would certainly make the computation easier.
One of the http caches has a priority policy where you can prioritize small files over large ones under the presumption that for some workloads communication overhead (including setup tear down and connection pooling) dominates the cost. Storing ten small results may be less wear and tear on the system than one big one.
Which is essentially the same observation this policy is based on, but with gradations instead of a binary decision.
Paper seems to miss the modified-2Q algorithm which was in memcached as of the cited 1.4.33 (became default in 1.5.0). Has very different performance characteristics than straight LRU.
Still need to give a more thorough read, but spotting other errors, ie; FB memc (according to their paper) balances pages by tail age, not hit rate.
I was wondering the same thing. I suspect its value would decrease in those cases, due to the lower-level representation providing less semantic association, and therefore less information (per unit object) for modeling hit rate and lifetime. Also, the higher objects count would require more resources.
Edit: perhaps in a file system, the higher-level information could still be provided?
Even within database caches, while the page size is often fixed you are still caching and operating on more abstract multi-page objects of variable size. In a good cache replacement algorithm there is some implicit awareness of the relative sizes of these objects to optimize the replacement policy.
- It primarily compares against relatively weak, naive algorithms like LRU because (inexplicably) that is what many open source systems use. All of the high-performance closed source systems I've seen use better algorithms that are not represented here. Maybe an availability bias? Also, some great algorithms don't even have names AFAIK, just implementations, which doesn't lend them to easy reference.
- CPU cache efficiency of algorithm implementations is often highly relevant to real-world performance of cache replacement algorithms. This is frequently overlooked in the academic literature, which focuses on hit rate under various workloads. Many algorithms that produce good hit rates have poor throughput in practice due to CPU cache behavior; quite a bit of R&D at database companies focuses on minimizing CPU cache thrashing from these algorithms. A slight reduction in hit rate that substantially reduces CPU cache thrashing can be a big win. Cache throughput often comes from surprising places.
- Schedule-based replacement mechanics are ignored. Given some large number of concurrent in-flight operations against the cache, you can adaptively reschedule the order of operations to approximate Bélády's optimal algorithm. For many workloads you can get a step function improvement in cache throughput from this, and it applies to the aforementioned CPU cache efficiency as well. Design of schedule-optimized cache replacement algorithms is not trivial; the obvious way is NP-Hard, so the challenge is in finding efficient approximations.
Generally though, I like the formulation of LHD. It would be interesting to see how it sits in the broader scope of real-world cache replacement algorithm implementations. This is one of those areas in database computer science where the closed source design theory is richer than what makes it into the academic literature, so it is more difficult to figure out what the state-of-the-art even looks like.