Hacker News new | past | comments | ask | show | jobs | submit login
Multigenerational LRU: more performant, versatile, straightforward than Linux's (kernel.org)
156 points by marcodiego on March 13, 2021 | hide | past | favorite | 78 comments



What is the relationship of this change with mm/workingset.c? [1]

That implements a custom algorithm, Double Clock, which is inspired by LIRS. This patch touches related code, but not that file where the core algorithm is described. I tried to reimplement it in a simulator for analysis and found its hit rate varied significantly on workloads due to the percent given to LRU/LFU regions being based on machine's total memory capacity [2]. That caused it to do better in MRU/LFU workloads on high memory systems and better at LRU workloads on low-memory systems [3]. Talking to the author and there were other details not documented that might make those regions adaptive. In the end, I gave up trying to understand the page cache and if there was a benefit of switching to an alternative eviction policy. It is well written but very confusing code that seems scattered if not a kernel developer.

[1] https://github.com/torvalds/linux/blob/master/mm/workingset....

[2] https://github.com/torvalds/linux/blob/1590a2e1c681b0991bd42...

[3] https://docs.google.com/spreadsheets/d/16wEq5QBzqOtownEtZvZe...


This sounds similar to https://en.wikipedia.org/wiki/Adaptive_replacement_cache which has patent issues?


ARC shares the concept of non-resident (ghost) entries with LIRS, but uses that to dynamically adjust the SLRU partition (an LRU where probation entries are promoted to a protected region on a second hit). LIRS has fixed partitions and uses a larger history (~3x) to track the "inter-reference recency". This is a much more complex algorithm, but offers a higher hit rate and is more scan resistant. Linux's DClock calculates the "refault distance" which is their similar concept.

You can compare the code of ARC [1], LIRS [2], and DClock [3] to see how similar they are.

ARC obtained a lot of awareness but the hit rate is not that great. It is modestly better than LRU, but very wasteful in metadata overhead. LIRS is much better, except it is complex and the paper is confusing so almost all implementations are broken. DClock seems okayish by borrowing LIRS' ideas, but I think it performs worse in most comparisons.

[1] https://github.com/ben-manes/caffeine/blob/master/simulator/...

[2] https://github.com/ben-manes/caffeine/blob/master/simulator/...

[3] https://github.com/ben-manes/caffeine/blob/master/simulator/...


I wonder if ZFS on Linux could be adapted to use this instead of the custom ARC implementation it brings along?


I would recommend W-TinyLFU [1]. It offers higher hit rates, is adaptive, has lower metadata overhead, is O(1), and is the only policy that is robust against all workload patterns we have tried. This is what I proposed to the Linux folks when analyzing DClock.

I met Kirk McKusick (BSD fame) at Fast'20 where he expressed interest in that for ZFS. We tried introducing the idea to the contributors at FreeBSD, Delphix, and LLNL. There is frustration at ARC being memory hungry and the locking is expensive. Unfortunately while there is interest in replacing ARC, all of our contacts were too busy to explore that option.

[1] https://github.com/ben-manes/caffeine/wiki/Efficiency


Window-TinyLFU seems cool. Here's a blog post going through it at a level I mostly understand:

https://9vx.org/post/on-window-tinylfu/

The core trick is to use some kind of Bloom-filter-style structure to keep approximate counts of accesses. Caffeine uses a count-min sketch, but you could also use a counting Bloom filter, and these are really just two members of a family of very similar things. The counts are occasionally halved, so they are really a sort of decaying average, allowing for changes in access pattern over time.

There's a minor trick of using a simple Bloom filter (the post says "single-bit minimum-increment counting Bloom filter", but that is just a Bloom filter, isn't it?) as a 'doorkeeper', so you don't bother tracking frequency for objects until they are accessed for a second time. This helps if you have a long tail of objects which are only accessed once, because it means the main frequency counter can focus on a small subset of objects, and so be more accurate.

The actual cache is split into two parts. New objects are added to the first part, which is managed by simple LRU. When objects are evicted from the LRU queue, their access frequency is examined, and if it is high enough, the object is added to the second part. The use of frequency like this is called a cache admission policy.

There are a lot of details I don't understand. Mostly to do with how a cache admission policy is used in a two-part cache, rather than anything to do with Window-TinyLFU.

How is the second part of the cache managed? LRU? Or do you evict the object with the lowest frequency?

What is the frequency threshold for adding something to the second part?

Are objects added to the first part if they have only just been added to the doorkeeper? Or only if they were already known to the doorkeeper? Does the doorkeeper also get reset occasionally?

What are the relative sizes of the first and second parts? Do they change?

Is an object ever added directly to the second part? You might have an object with high historical frequency, but which has been evicted because it wasn't used recently.

I couldn't find a good general overview of cache admission policies.


Understanding that caching is a universal sequence prediction problem explains a lot, where efficiency maps to the abstract "compressibility" of the access sequence from the cache's perspective. Optimal sequence prediction is intractable, so any practical cache optimizes for good performance predicting some sequences to the exclusion of others.

An understudied approach to cache efficiency is to dynamically shape the sequence of accesses the cache sees to better match the set of sequences it is designed to predict. This means putting some code/logic in front of the cache to make the sequences more cache friendly, either by filtering out "noise" accesses that are likely to pollute the sequence prediction model or re-ordering accesses, when permissible, to look like a sequence the cache is better at predicting.

The objective is that the added overhead of cache admission is offset by improved efficiency of cache eviction by making the sequences more predictable.

Both types of cache admission policies -- noise filtering and sequence reshaping -- are commonly used in database engines in extremely limited ways. An example of the former are so-called "cold scan" optimizations. An example of the latter are so-called "synchronized scan" optimizations, which have minimal benefit in many architectures.

Cache admission is an open-ended algorithm problem with complex tradeoffs. Sequence reshaping is particularly interesting, and almost completely unstudied, because in theory it allows you to exceed the efficiency of Bélády's optimal algorithm since one of the limiting assumptions of that theorem are not always true in real systems. It is not trivial to design systems that can employ these kinds of optimizations broadly in a way that provides a net benefit.

Of course, in real systems, cache efficiency is about more than cache hit rate. Both cache admission and cache eviction also need to be CPU efficient, and most academic algorithms are not. Even though cache access/management is in the hot path, you want it to be invisible in the profiler.


Check out some of the papers from Coho Data and alums. When I was looking circa 2014-16 they had some really interesting takes on tracking access (inter reference gap) across huge object populations. They were positioning it towards tiered storage migrations but it seemed generally applicable to admission control & eviction beyond normal working sets.


There's some really good explanation (and implementation) in the Caffeine library[1]. Ben Manes worked with Gil Einziger on the implementation and actually contributed to an updated version of the TinyLFU paper, IIRC. Gil has a ton of very relevant work[2][3] around cache admission, control, and information density.

Unfortunately cache admission is an understudied and woefully unimplemented area of CS & computer engineering as far as I can tell. There's some decent work around hardware layer problems, cache line insertion/eviction/etc, but not much in general purpose software caching. Gils research was definitely the most relevant and relatable when I was involved in this area a few years back.

[1] https://github.com/ben-manes/caffeine/wiki/Efficiency#window... [2] https://scholar.google.com/citations?user=kWivlnsAAAAJ&hl=en [3] https://github.com/gilga1983


The minimum-increment approach for a frequency sketch is supposed to help improve the sketch's accuracy. Instead of incrementing all of the hashed counters (e.g. 4), it first reads to find the minimum, and then only increments those. When analyzing for the purposes of TinyLFU this wasn't beneficial because it only cares about heavy hitters to judge relative worth so hit rate was unchanged.

The two halves can be any algorithm, but LRU and SLRU were chosen in Caffeine. SLRU uses a probation and protected region (20/80 split) so that on a subsequent hit, the item is moved into the protected region. This more aggressively removes an item that was not used, e.g. inaccurately promoted by TinyLFU due to hash collisions. It is a cheap way to get a good recency/frequency eviction ordering.

The frequency threshold is a direct comparison of the two victims (admission window's and the main region's). If the candidate has a higher estimated frequency than the main's victim then it is admitted, otherwise it is discarded.

The admission window accepts all new items and is placed in front of TinyLFU. TinyLFU implements the doorkeeper, popularity sketch, and reset interval. When reset, the doorkeeper is cleared and the counters are halved.

The optimal admission/main sizes depends on the workload. A LRU-biased workload prefers a large admission window (e.g. blockchain mining), while an MRU/LFU-biased one prefers a smaller window (e.g. loop/scans). The policy is adaptive by using hill climbing to [1] walk the hit rate curve towards the best configuration. The design evolution is described in [2-4].

There is no fast track promotion to the main region. A popular item will end up being promoted. The SLRU ordering and reset interval will age it out if unused. The adaptive sizing will grow or shrink the frequency region based on how effective it is.

Cache admission has not been well studied. While TinyLFU is an admission policy, W-TinyLFU turns it into a promotion strategy by always admitting into the window cache. Prior to TinyLFU, bloom filters were used by CDNs to avoid one-hit wonders from polluting the cache [5]. Afterwards, AdaptSize [6] used entry sizes instead of frequency as an admission criteria. RL-Cache [7] tries to defer to ML to sort it out from a dozen signals. There is a paper under review that optimizes W-TinyLFU for entry size aware eviction.

[1] https://dl.acm.org/doi/10.1145/3274808.3274816

[2] http://highscalability.com/blog/2016/1/25/design-of-a-modern...

[3] http://highscalability.com/blog/2019/2/25/design-of-a-modern...

[4] https://docs.google.com/presentation/d/1NlDxyXsUG1qlVHMl4vsU...

[5] https://people.cs.umass.edu/~ramesh/Site/HOME_files/CCRpaper...

[6] https://www.usenix.org/conference/nsdi17/technical-sessions/...

[7] https://ieeexplore.ieee.org/document/9109339


There is no patent issue given that IBM is a member of OIN


I have to say how much I love the Chrome OS profiling ecosystem. Google engineers bring these changes to lkml with the data in hand. Reduces tab kills by a factor of 20x, in the field? Very nice.


It's not just Chrome OS, it covers data centers too. We call it Google-wide profiling. Check these papers out:

Profiling a warehouse-scale computer https://research.google/pubs/pub44271/

Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers https://research.google/pubs/pub36575/

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications https://research.google/pubs/pub45290/


I don't see how this would be useful for people other than Google/Amazon/Microsoft because, to my uneducated self, it seems like their problem is driven by this:

> Over the past decade of research and experimentation in memory overcommit, we observed a distinct trend across millions of servers and clients: the size of page cache has been decreasing because of the growing popularity of cloud storage. Nowadays anon pages account for more than 90% of our memory consumption and page cache contains mostly executable pages.


The implication of `growing popularity of cloud storage` is that today the user's data files are sitting in a remote cloud storage and not on local file system. This means a typical most-used application on user's Android/ChromeOS device (say a web browser, or a streaming app) has very little local file storage usage and hence very little page cache usage. Bulk of the memory is used by non-page-cache memory – that's anon memory. Based on this mix shift in memory use, this patchset is enhancing the swap system to evict anon pages better. It is useful for end-user linux devices like phones and laptops.


Well, you are just confirming parent's affirmation. Not everybody uses the cloud. A database like PostgreSQL very much has local file storage usage and hence very high page cache usage. Or am I missing something ?


No - the post carefully lays out that client devices _also_ are dependent on anon page cache.

Additionally, TFA has massive impressively statistics for client-side devices, as a cousin comment notes.


To me the TFA I've read says they've tested what is convenient for them: cloud compute node, chromebook (which are not desktop in the traditional sens) and phone. Which is fine since it suits their needs.

But I maintain it needs more independant testing, on good-old-not-cloud-server, especially databases (which are not client-side devices). It may very well be that it is positive for that workoad too. Or not. That is all I'm saying.


> But I maintain it needs more independant testing, on good-old-not-cloud-server, especially databases

Don't a lot of databases bypass the page cache for their payload data anyhow?


Yes. Bypassing the kernel cache is a major optimization used by virtually all databases that are serious about prioritizing performance. It won't add anything to the top end of database engine performance.

However, there are plenty of databases that do not prioritize performance which could benefit, including most open source ones.



Universally. The kernel's page cache is the very last thing that a database operator wants.


I'm one of the Google engineers who work on multigenerational LRU.

The kernel page cache doesn't know better than databases themselves when it comes to what to cache. All high performance databases do this in user space, and they use AIO with direct IO or the latest io_uring to bypass the kernel page cache. Modern cloud (distributed) file systems do the same, including GFS: https://en.wikipedia.org/wiki/Google_File_System

And speaking of io_uring, check this out to see how much improvement when copying without going through page cache: https://wheybags.com/blog/wcp.html


FWIW, the Use Cases section presents data about the impact in mobile and laptop environments:

On Android, our most advanced simulation that generates memory pressure from realistic user behavior shows 18% fewer low-memory kills, which in turn reduces cold starts by 16%.

...

On Chrome OS, our field telemetry reports 96% fewer low-memory tab discards and 59% fewer OOM kills from fully-utilized devices and no UX regressions from underutilized devices.


The cost of memory seems like a problem because end users experience P95+ but the people who set the spec sheet set P50.

Memory that looks expensive because it has little effect on P50 (expensive to the device maker) is cheap to the end user who experiences a big change.


I wonder how Linux with that compares to XNU (iOS)?

For some reason (maybe because I work with Android) I always thought that XNU is better at memory management (fewer low memory kills, better paging, etc.).


> This causes "janks" (slow UI rendering) and negatively impacts user experience.

If this makes phones faster, I am all for it.

Apple is pretty good at this with ios.


>"On Android and Chrome OS, executable pages are frequently evicted despite the fact that there are many less recently used anon pages.

This causes "janks" (slow UI rendering) and negatively impacts user experience."


[flagged]


"Eschew flamebait. Avoid unrelated controversies and generic tangents."

https://news.ycombinator.com/newsguidelines.html


I genuinely didn’t think a negative opinion about a marketing word would be this controversial but point well taken. Last time I criticize presentation.


A more engaging summary of the article: Google engineer proposes changes in caching algorithm based on observations from "millions of servers and clients", arguing for improved performance both on low-end (phone) and high-end (hundreds of gigabytes of ram) devices.


Probably would have made a good top-level comment. My comment is about the poor taste of the word “performant”


Performant is a pretty common word in technology, and has been since at least 2003 when I first heard it. It's used in the same vein as "scaleable" - in both cases, suggests the positive nature of the performance and scaling.


[flagged]


Yah, screw anyone who's on the leading edge of linguistic drift!

After all, if someone doesn't properly use the second-person singular pronoun of yore, they're an illiterate scoundrel! Language shouldn't change.

I think your argument has been decimated (which means reduced by a literal 10%; metaphorical and drifting meanings of words are bad!)


[flagged]


The point is that people do not view it as poor language, they understand that it's a protocol for communication and as such it evolves/shifts across time and particularly across generations (Emojis are a great example of this), and view people that look down upon others for relatively benign infractions upon the timeless holy grail that is The Way The English Language Should Be Used as rather pedantic idiots who often bring more disruption to a conversation than clarity.

There is no race to the top or the bottom. There is no race at all. Looking down on someone for their preferred set of words and the way they use them will only make you look bad and at the end of the day prevents you from communicating effectively with other members of your species.

You should work on that.


No I think I’ll continue to judge people for their behavior. I also look down on people when they use “ain’t” instead of “aren’t” or when they don’t correctly conjugate their verbs. Being prevented from communicating with people who fail to speak proper English has little practical downsides and many upsides. Looking bad to those who fail to speak proper English is probably more of an upside as well. An apparent failure to speak English well is indistinguishable from an inability to speak English well.

Demonizing intolerance of poor English does not help to improve anyone’s ability to communicate precisely and efficiently, it just seems to make people feel better about themselves. There’s nothing wrong with setting high standards for people unless you dislike excellence.


If you want to be guided on that lonely and cold road by nothing but a false sense of superiority then it ain't on me to stop you. Just don't be surprised when, or fail to realise why, people judge you right back for your behaviour.


> people judge you right back for your behaviour.

If certain people judge me back for promoting proper English then it’s highly likely their association with me is undesirable in the first place. I prefer to surround myself with people who value excellence, your opinion of what is excellence notwithstanding.

Btw non-native speakers are perfectly capable of speaking proper English and in no way does having a high bar for language exclude them. It is not uncommon for a non-native speaker to speak English better than a native speaker either.

In reality, people can and will always be judged not only for the content of the message but also for their presentation. It’s important that young people learn and accept this to maximize their chances of success.


And excellence is only achieved by having a thoroughbred grasp of the english language? Damn. Sucks to be a non-native speaker then!

I would also add that the truly excellent people in this world would probably not want to surround themselves with you at the first sign of your rather patronizing language superiority complex.

You should judge people for the content of their message, not the exact method of delivery. And in that regard most people will find you severely lacking despite your Flawless Usage Of The Proper Way To Speak English.


Huh? I bet I could come up with a few words and a few different ways of using them that you'd look down on me for. I could probably do it in one word and one usage.


Even deliberately corrupting language can be joyous.

My friends were all telling each other "chillax motherfuton" for awhile because of how it made another friend rage. :D


"relatively benign infractions" is the key part of my reply.


You're doing the same thing that other dude is doing; you just think you're better.


What's wrong with this particular drift though? Presumably you don't hold that all linguistic drift is regrettable?


I’ve made no claim about “linguistic drift,” and I’ve explained many times in this thread why “performant” is in poor taste.


Use of performant in relation to efficiency in this fashion goes back to at least the 60s, from a fairly cursory glance on https://books.google.com.

Starting with a few from the 70s:

The IEEE from 1979: https://books.google.com/books?id=IJ4XAQAAMAAJ&q=%22performa... "It should be noted that the problem of scailing is considerably simplified from the fact that the operators dispose of greater ranges ( from 1 to 238 for l - 20 , instead of 1 to 104 for the more performant analog computers "

We can even step outside of the electrical engineering community, here's a reference to it from the Iron and Steel society, in the same year: https://books.google.com/books?id=D9RaAAAAYAAJ&q=%22performa... "As this cooling system revealed itself as unsufficiently performant , we have started with the study of literature in order to define if other more efficient equipments were available"

We can push back to 1974, in relation to satellites in space: https://books.google.com/books?id=ZXHvAAAAMAAJ&q=%22performa... "So , the program is very descriptive in the effects development , and very performant . Furthermore , it has been compared with numerical integration at each step ( after the computation of each effect and each coupling )..."

Or just a little further in relation to APL from the ACM: https://books.google.com/books?id=0qApAQAAIAAJ&q=%22performa... "However , this theorical gain is nothing but an encouragement ; the execution phasis is very performant because , before it , a lot of preparation has been done during the deferment phasis : this should be taken into account in a realistic ..."

So in the 60s, 1964: https://books.google.com/books?id=l1_vAAAAMAAJ&q=%22performa... Such a progress comes partially from the availability of faster and larger computers and of more performant numerical methods . It is however important to remark that , at the same time , analytic progress has also been achieved , which is very ..."

Australian Mechanical Engineer are using it in relation to the Wankel engine, https://books.google.com/books?id=rzIiAQAAMAAJ&q=%22performa...

Judging from the uses I'm seeing in the Google search, it's a fair claim that Computer Science has been using "performant" in reference to efficiency and performance for most of its existence.

Judging people for using a word in a way that has that much history in the field tends to reflect mostly on you.


You can quote a million papers from your cursory Google Books search, it’s still poor English in addition to being vague and essentially meaningless in most contexts. In nearly every instance it can be replaced with “good” or just dropped altogether with no loss of information, appeals to authority notwithstanding. I’ve rarely regretted judging someone’s intelligence based on their use of “performant,” it has a surprisingly high SNR.


It doesn't matter if you can or cannot replace it with other words. It's what the industry has decided is an appropriate term to use in the circumstances.

> in addition to being vague and essentially meaningless in most contexts

It's not vague, it has a literal meaning. Efficient. The definition comes from the French word "performant", the meaning of which goes back to the 17th century.

This isn't people going "ohh it's related to performance, let's make up a word that sounds similar", like "supposably" or similar.

It's "Let's take an established word with the same roots that has the same meaning as we want and use it".


> It's not vague, it has a literal meaning. Efficient.

See a sister thread for commenters mock debating over whether it means improved throughput or reduced latency. If a speaker indeed intends to mean “efficient,” then it would be thoughtful to opt to instead use “efficient” for clarity’s sake. Performant has been so often recklessly used that the meaning has been diluted to mean general and vague “fastness,” from which most thoughtful people cannot glean any specific meaning.


> You can quote a million papers from your cursory Google Books search

Still better data than the literally nothing that you brought to the discussion so far.


“Literally nothing” is false:

> In nearly every instance it can be replaced with “good” or just dropped altogether with no loss of information,

That is obviously true and easily checkable. If that is not true, explain how in rational terms.


> Multigenerational LRU: better, more versatile, more straightforward than Linux's

> Multigenerational LRU: more versatile, straightforward than Linux's

Neither of these have as much information as the original headline. The version I came up with that's actually equivalent to the original without using the word "performant" has to use a word that means the same thing.

> Multigenerational LRU: faster, more versatile, more straightforward than Linux's

And that still means I don't get to use the "parallel more" construct.


"performs better" the dictionary will catch up

edit - apparently oxford dictionary has it already as "performs well"


“Performs better” is just as vague. What does it mean to perform? What does it mean to do it better? Most of the time people actually mean “more efficient CPU usage”


for this case where the thing is a cache and the description a headline, "more performant" seems perfectly cromulent. it says here's a cache that does all the stuff a cache is expected to do well but better


Performant is problematic because it's tech jargon that is only now gaining grudging acceptance as a word in the overall population.

It's not wrong, but it's best avoided if you want to avoid pedants waving their King's English stick at you.


FYI “problematic” is also a vague and essentially meaningless term that is no more specific than “bad.”


English has lots of words that mean mostly the same thing. Do you seethe at every entry in a thesaurus? How does one determine which word is blessed by the higher minded beings you choose to exclusively associate with?


I don’t usually seethe, it’s more like remote embarrassment. I think there is a German word for that. I often recommend a “non-hire” as well.

Edit: I changed “non hire” to “non-hire” because I have no personal issue with acknowledging my usage of improper punctuation. Why should it be controversial to point out usage of improper vocabulary in others?


FYI, non-hire is the type of phrase that should be hyphenated.


Problematic is just fine, brah-- with usage dating to the late 17th century and with a pretty clear connotation of "presenting numerous problems" instead of just being undesirable/bad.


I suggest "performy" and "problemy" as shorter. (Although many people seem to think length a virtue in words) And instead of "more performant" – "performore"?!


Take it up with the people who coined problematic 4 centuries ago.


Seems like a very proactive development


So, we need words that say on what dimension it is performing better.

I hereby coin “latencyant” and “throughputant”.


If a system is highly latencyant, does that mean it performs well in terms of latency, or does it mean it exhibits high latency?


I was thinking “lower latency”. Is there a suffix that means “less of”? “Timeless” is already taken...


Well, there are hypo- and hyper-.


Now I like hypolatent and throughputful.


latencyless and throughputful?

latencyette and throughputous?

latencycule and throughputty?


what about hit rate and time and space efficiency? i think "cachy" says it all - the cachiest cache


Awesome work! But I don't understand why they haven't implemented a neural network to take care of this. Jeff Dean even mentioned something about it[0].

[0] - https://learningsys.org/nips17/assets/slides/dean-nips17.pdf (pp 29, 50 etc)


I guess your comment has been downvoted since it comes off a bit like "neural network all the things".

But it did make me wonder, so I went searching and found this paper: "Applying Deep Learning to the Cache Replacement Problem"[1], which sounded quite interesting.

In the paper they train a LSTM network to predict if an item should be cached or not, but then analyze the resulting model and identify key features of its performance. From the paper:

We use these insights to design a new hand-crafted feature that represents a program’s control-flow history compactly and that can be used with a much simpler linear learning model known as a support vector machine (SVM). Our SVM is trained online in hardware, and it matches the LSTM’s offline accuracy with significantly less overhead; in fact, we show that with our hand-crafted feature, an online SVM is equivalent to a perceptron, which has been used in commercial branch predictors.

We use these insights to produce the Glider4 cache replacement policy, which uses an SVM-based predictor to outperform the best cache replacement policies from the 2nd Cache Replacement Championship. Glider significantly improves upon Hawkeye, the previous state-of-the-art and winner of the 2nd Cache Replacement Championship.

Training a neural network and then using it to restate the problem in more optimal way is an approach I haven't seen before (though I'm just a casual observer of the field), and which sounds very interesting.

[1]: https://www.cs.utexas.edu/~lin/papers/micro19c.pdf


I don't think I've read that before, thanks! Yeah, I believe there's also a very primitive NN on AMD's latest chips as well. I believe they used it on their branch predictors. I can't find the URL at such a short notice, though.


Two other papers that may be of interest are RC-Cache [1] (admission policy) and Deep Cache [2] (prefetch policy).

[1] https://ieeexplore.ieee.org/document/9109339

[2] https://dl.acm.org/doi/10.1145/3229543.3229555


Do you think its possible to run a neural net all the time without destroying cache coherency? Would it need to be running on dedicated silicon? I think the idea could be interesting but it seems like there are practical ramifications to work through


From my understanding of NNs and operating systems I don't see any problem with it, but then again I'm not an expert. But Jeff definitely knows what he's talking about.


The cynic in me summarizes "Google engineer says page cache is shrinking as Google has all your data instead".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: