
Linux: Controlling access to the memory cache - signa11
https://lwn.net/Articles/694800/
======
restalis
This is made to sound so revolutionary and novel but in fact it is only an
intermediary, fast, transparent to the processes, automatically managed
memory. Badly managed as it turned out. What hurts the most is that there
definitely is some sort of influence on how a process can take advantage of
caching, yet _not in its control_ , and even with the best practices on memory
optimization it all falls flat when a few processing layers get stacked on
this hardware clever cache memory management. The fast memory should have been
left in the control of the software. It should have been left small and
accessible like in the old computers (16 bit range) and what is now considered
"the main memory" to be developed as a new thing, that could be left to grow
massively. I think that the current situation is less the result of quality
engineering vision and more of the result of marketing departments putting
pressure on the development teams to come out with clever tricks to beat the
competition.

___________

P.S.: I already thought about the backward compatibility considerations being
primordial. That would have meant that the technical advancement transparent
to the existing usage was a reason in itself, but that doesn't quite hold
water. The 16 bit to 32 bit architecture switch opened the possibility of
adopting a secondary (but slower) layer of RAM going over 64 KiB and keeping
the 16 bit manageable memory range close to CPU, as (at least one of) the
cache layers.

~~~
obl
Unless you want programmers to explicitely manage this fast memory the
hardware is a better place to do those optimizations than the software.

Compilers (even the high-end modern ones) are very quickly clueless about both
memory and execution profiles. Those problems are hard to reason about
statically. Is this pointer the same as this other one ? Will this loop be
executed 10 times or 10^6 times ? The hardware being a dynamic optimizer has a
lot of runtime information (pointer values, cache entry usage, ...) to take
advantage of.

Sure, one can argue that we just need better JITs, PGO, and programming
languages that are easier to analyze statically. In the current state of
affair we are way better off leaving this to the hardware.

By the way you do have "some" control over it using NT moves on x86. Turns out
almost nobody uses them except by hand in very specific peak-bandwidth code.
Compilers don't dare emitting those since the penalty of getting it wrong
outweights the benefits of getting it right.

~~~
restalis
_" In the current state of affair we are way better off leaving this to the
hardware."_

That is something I'm not so sure about. That might have been true before,
when the hardware served well the then simple computational model of a single
high frequency processing core. Things got more complicated since the times of
first CPU cache offerings and hardware general use-case implementation can
only give you so much. The performance boost that once could be relied on
without much care now dissipated. This cache thing now become just a leaky
abstraction. We can not abstract it away completely if we care for
performance, nor we can control it in any meaningful way. It is not a good
model anymore. Just recognize it as a failed experiment.

 _" Unless you want programmers to explicitely manage this fast memory..."_

Yes, that's exactly what I have in mind (and implied before). Most
applications do not need maximal performance and can disregard the fast memory
completely. That will only mean that those processes either will not consume
their share and leave it more for the processes that need it, or that the fast
memory will (occasionally) be used indirectly, through the optimized bits in
the layers that the given process relies on. This memory, however, won't be
unproductively overwritten or invalidated on every cold cache or whatnot.

------
revelation
There is already a sysfs for every CPU in the system, just add another file
with the bitmap?

Or maybe we can add another pseudo-socket interface in the style of rtnetlink,
or the ioctl this one fella has proposed. Frankly, they need to take a page
from Windows and offer proper programmatic Interfaces to all of this stuff.

~~~
signa11
> Frankly, they need to take a page from Windows and offer proper programmatic
> Interfaces to all of this stuff.

cgroup does provide programmatic control over this feature. perhaps you had
something more in mind. may you please elaborate ? thanks !

