When the access pattern is random: recently used items are not more likely to be accessed than other items.
Or worse: the access pattern is "anti-recent": items recently accessed are less likely to be accessed again than other items.
This is undergraduate stuff I once had to know for exams in machine architectures and operating systems.
The choice of LRU replacement is favorable when there exists "locality of reference"; that's what it's predicated upon.
All of the replacement policy choices are only substitutes for a "magic oracle": a replacement strategy which can peer into the future and know which items will be accessed soonest. With a magic oracle, we can not only retain those items in the cache in preference to other items, to great advantage, but we can automatically prefetch the correct items that are not yet in the cache but are about to be accessed.
In general, randomized algorithms provide a defense against situations in which things don't work out according to the best case assumptions. For instance, they help defend against tickling worst cases in algorithms (e.g. sorting) that have good average-case behavior.
>When the access pattern is random: recently used items are not more likely to be accessed than other items.
Wouldn't they perform equally well in this case? There's no advantage in LRU, but also no disadvantage if they're really unpredictably random.
It can prevent any otherwise harmless sweep/scan of the system from completely busting your entire cache for all of the other uses.
That's the key take-away from my experience with optimizing control loops for microcontrollers with small caches. As soon as your loop is too bit, your code suddenly runs at a hundredth of the speed.
Conventional LRU uses linked-lists and when the LL no longer fits in the L1/L2 caches, you will see the performance hit you mention (given that accessing the associated LL node and reordering the list will most likely result in two cache misses).
Try libclc  for a compact segmented cache of n 64B containers that exactly fit a cache line, with each container having its own eviction policy. The overhead of LRU per container (of 7 data lines) is 9.14 bits/entry and the same cache line access will give you both the data lines and the LRU order.
He refers to the LRU-like policy of the I/D cache itself.
I was very excited when I actually got to implement it on a real world project.
I was writing a scale out job which used ffmpeg to make clips of video files. To speed it up I kept the downloaded files (which could be 150 GB in size) as a local cache. Quite often a clip is made of the same file. When the disk was full (there was a separate disk for download and clip output) selected two of the downloaded files randomly and deleted the older one. Loop till there was enough disk space, or no files.
It's something I thought I would never actually get to implement in the real word, and thus far is working very well, the caching speeds things up and the eviction seems to avoid too many cache misses.
Default block placement policy chose datanodes randomly for new blocks. This change selects 2 datanodes randomly and picks the one with lower used disk space.
I haven't used a true LRU in quite a while because the coordinated bookkeeping required is as disaster on multithreaded programs. ARC and 2Q can be largely lock-free.
BTW for expiration I have simply invalidated on next access. This seems to have worked well for me with DNS caching, for example, where basically you have three hot records (gmail.com, hotmail.com, yahoo.com) that expire every 5 seconds. In this case of course there is value in proactively refreshing cache entries with some probability proportional to the expiry time so you don't have thundering herds.
For fixed expiration (e.g. 5 min TTL), I use time-bounded LRU queues. Then if the head item has not expired, its successors have not either. If it has, it is merely a poll.
For custom expiration (varies per-entry), I use a hierarchical timer wheel. This replaces sorting with hashing, with an amortized O(1) cost.
For refresh, that's when I do it on access since its being used. I keep a refresh timestamp and an expiration timestamp, so that a refresh can occur while the data is valid but aging, but a blocking load is required when its too stale (expired).
You might enjoy this slide deck of the ideas that I use:
If the overhead of the cache itself is significant, as in kernel virtual memory management. A clock based algorithm is usually preferable which uses an array to approximate node based structures. ARC and LIRS have CAR and CLOCK-Pro respectively to simulate them. CAR is vulnerable to re-use distances near the boundary of the cache size.  CLOCK-Pro is widely used, notably in Linux VM.
It would be much more interesting if they used the traces from ARC and LIRS, perhaps also UMass and Wikipedia. These traces are publically available and used by multiple papers from a variety of groups.
I have not seen a sampling-based policy that outperforms a policy that has tracks history outside of its working set. Since the examples specifically avoided that comparison, I suspect that it under performs in comparison. That isn't to say it isn't a useful advancement, but that a more robust analysis is needed to validate their claims.
A treap is an unbalanced binary tree where the tree depth is proportional to the frequency of use to get fewer indirections per access (assuming an uneven distribution of lookups). Essentially it has a MFU or LFU queue built into it and now I'm curious about how it would behave as a cache data structure.
It is a tree algorithm for tracking the approximate last use time.
> The reason why Redis does not use a true LRU implementation is because it costs more memory.
...but it sounds like it wouldn't necessarily be totally desirable to use true LRU anyhow. :-)