Hacker News new | past | comments | ask | show | jobs | submit login
Cache Is King: The Inevitability of L4 (nextplatform.com)
17 points by Katydid on Jan 17, 2020 | hide | past | favorite | 12 comments



Why don't we have two sets of memory - the main, slow, cached memory, and a smaller set of super fast low latency memory, like cache, but that the programmer and compiler can use explicitly?

Sort of like special purpose performance instructions like SIMD but on the memory side


Special instructions won't help latency. There are prefetch instructions that basically just do loads. Each load in x64 (at least on Intel) grabs 128 bytes (two cache lines) from memory to cache minimum.

The Cell from Sony did have explicit cache control (I think) and it was notoriously difficult to program for.

The real reason cache isn't handled explicitly though is because it isn't necessary. Getting good performance and cache usage can be done at the C++ level, you just have to know how the CPU works and access memory linearly so it can be prefetched. I've tried to use prefetch instructions and beating the out of order buffer in the CPU is actually very difficult.


It's a common misconception that cache is "low latency". It certainly is compared to main memory (which can take ~200 cycles before it stats feeding the CPU the bytes it wanted), but L3 hit can take 40-60 cycles easily as well, so it's not even an order of magnitude difference. By the time you're hitting L3, you're kind of already screwed.

For perf, I'd much rather have larger L1, or a (much) bigger register file.


Some processors allow for caches to be configures as scratchpad memory which is what you're describing.


We already have L4 cache, but we call it RAM. It caches our shared libraries, and swap. Its line size is 4k (8k on POWER).

This was all understood perfectly even in the original von Neumann paper in 1948. There, registers cached RAM which cached drum, which cached tape.


This article is specifically about another level of cache between L3 and main memory becoming standard in modern PC architectures.

It has already happened in a few places not mentioned in the article, Xeon Phi with high bandwidth memory and broadwell CPUs that could embedded ram as either video memory or a level 4 cache.


The point is that there is nothing at all special about cache vs any other kind of memory. There are purely practical distinctions made according to relative timing vs bus cycles and, at times, interrupt latency and head-arm settling time, that dictate the mechanism for faulting in misses. All these times have varied by many orders of magnitude over the years without producing any fundamental change in the principles involved. It's all just memory hierarchy, and has been for fully seven decades, since the very beginning.

It takes a very small time window, and very limited historical awareness, to see Ln, for any n, as indicating anything special. In this case, it indicates what we have lately called "RAM" will now be called "L4" -- just as what was so recently RAM is now L3 -- as persistent phase-change memory moves into the place lately occupied by disks and, more recently, SSDs, and the distinction between "RAM" and "storage" gets fuzzy once again.

Even persistence across power loss has phased in and out, over the decades. The slowest memory happens to now be both random-access and persistent, again, and pretty darn fast this time.


Not only is that not true, that's not the point of this article at all. It is only talking about speculation for when a L4 cache will be added to modern architectures.

> It takes a very small time window, and very limited historical awareness, to see Ln, for any n, as indicating anything special.

This article actually goes through the history of cache additions and talks about how it isn't special and more a matter of evolution. You seem to be arguing against points no one is making.

> it indicates what we have lately called "RAM" will now be called "L4"

This is a bizarre semantic game to play, since cache is fundamentally about a heirarchy of locality and main memory is a way for the entire computer to communicate through storage that may or may not use the CPU.


What is bizarre is pretending to understand what you read better than other people, while supporting your argument by directly contradicting what it actually says right there in black and white.


Would you please stop posting in the flamewar style to HN? We've had to ask you this before. Continuing to do it eventually ends in your account getting banned.

https://news.ycombinator.com/newsguidelines.html


Your replies had nothing to do with the article at all. Respond with actual information or not at all please.


Please don't be a jerk in HN comments, regardless of how wrong another commenter is or you feel they are.

https://news.ycombinator.com/newsguidelines.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: