
PSI: Pressure stall information for CPU, memory, and IO - henridf
https://lwn.net/Articles/759658/
======
Scaevolus
Handling OOM livelocks is exciting, and they have a good explanation for why
the current OOM-killer fails:

> One usecase is avoiding OOM hangs/livelocks. The reason these happen is
> because the OOM killer is triggered by reclaim not being able to free pages,
> but with fast flash devices there is _always_ some clean and uptodate cache
> to reclaim; the OOM killer never kicks in, even as tasks spend 90% of the
> time thrashing the cache pages of their own executables. There is no
> situation where this ever makes sense in practice.

------
SomeHacker44
Interesting!

After reading it, I realized I was actually hoping for information at a lower
level than VM for memory pressure. Actually finding live, actionable
information about DRAM bandwidth usage, delays caused by the hardware system
of caches including TLBs, L1/2/3 caches and main memory contention, etc. I
have not found that existing tools are insufficient in monitoring/dealing with
VM swapping - OTOH I usually seek to keep that at zero and leave a little swap
just to allow for some chance of alerting and recovery before OOM killer kicks
in.

~~~
grandmczeb
Can you give a little more detail about the kind of information you’d like
from the hardware?

~~~
Filligree
How much time does each CPU core spend running/waiting for L1/L2/L3/DRAM? How
often are those stalls due to cross-core contention for the same bit of
memory? Which execution units are in use? What's the limiting factor on
throughout?

That's what I can think of in the first two minutes, anyhow. It all comes down
to the last one.

~~~
zlynx
Intel CPUs provide a crazy amount of performance counters. I bet that the
numbers you want are in there somewhere. Look into the perf tool.

~~~
Filligree
Alas, I'm using AMD.

------
zokier
Digging the LKML thread, this appears to be the corresponding userland
component for the OOM use-case:

[https://github.com/facebookincubator/oomd](https://github.com/facebookincubator/oomd)

There was also more minimal proof-of-concept example posted by Endless OS
guys:

[https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd](https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd)

------
teddyh
Sounds good. The next step would be to start using this instead of load
average in all the appropriate places, like batch(1), etc.

------
everybodyknows
Curious that there is no mention of the existing "memory" cgroup. On some
desktop Linux, you'll find it here:

    
    
      ls -l /sys/fs/cgroup/memory/
    

The 000-permission 'pressure_level' file controls asynchronous notifications
to apps, advising prompt shedding of load. This is apparently the mechanism
alluded to in a Googler's recent blog post, writing from the point of view of
Go server coding:
[https://news.ycombinator.com/item?id=17551012](https://news.ycombinator.com/item?id=17551012)

------
politician
I'm happy to see a new take on trying to produce a meaningful load metric.

~~~
lolc
Especially as it promises to disentangle IO and CPU load.

------
cjhanks
I have long looked for an efficient metric for measuring VM pressure. Hope to
see this, or something like this merged.

