

Cache Eviction: When Are Randomized Algorithms Better Than LRU? - luu
http://danluu.com/2choices-eviction

======
nkurz
Nice article, and wonderful graphs!

I'm unfamiliar with DineroIV, but it might be interesting to see how the
companion line prefetcher affects your results for L3. On most current
systems[1], the system always fetches two 64B cachelines for each request to
memory --- the line requested, and the adjacent line in the same 128B block.

The usefulness of this obviously depends on the workload, and I don't know how
the SPEC CPU tests would be affected. But I suspect it might improve the
relative performance of the LRU strategy on those where the second cacheline
isn't ever used. Maybe the same would be true for the other prefetchers?

[1] Coincidentally, Intel just recently documented the appropriate MSR's for
turning on and off the various hardware prefetchers at runtime:
[https://software.intel.com/en-us/articles/disclosure-of-
hw-p...](https://software.intel.com/en-us/articles/disclosure-of-hw-
prefetcher-control-on-some-intel-processors)

------
qwerta
In my experience maintaining LRU queue has significant overhead. Random (hash
based) displacement has almost zero overhead.

~~~
mnw21cam
A fully-associative LRU cache can have significant overhead, yes. An 8-way
set-associative LRU cache, not so much.

