I've heard that is problem is caused by Linux's overcommitting strategy. Basical...

diegocg · on April 10, 2020

It's more complex than that. Doing lazy allocation is not the problem, it's a common optimization. The problem comes when Linux does allow programs to (lazily) allocate a total amount of memory than is larger than the available RAM+SWAP before failing allocations. Then, when processes actually try to use that RAM, there is no physical place where to place that memory, and the only solution is to kill a process (OOM).

This may certainly seem stupid at first sight. I don't remember the exact reason why Linux does this, but I remember that it was said that not doing it would imply not using all available RAM efficiently and allocations would start failing before expected or something like that.

It's actually pretty easy to change this behaviour, there is a sysctl (/proc/sys/vm/overcommit_memory) that defaults to 0, but you can disable the overcommitting behaviour and even tune it. "2" does disable the entire overcommitting logic and it's what some people use to avoid memory trashing situations (but you still can get OOM in some situations IIRC) https://www.kernel.org/doc/html/latest/vm/overcommit-account...

wmf · on April 10, 2020

Overcommit is needed when a large process fork-execs a smaller process. If overcommit is disabled then forking a large process will fail even if it would be safe in practice. A proper implementation of spawn() could fix this but that's not the Unix way.

nialv7 · on April 10, 2020

Turning off overcommitting can break certain applications that relies on it. For example, the address sanitize allocates huge address spaces as shadow memory.

I do not think overcommitting is the problem. I believe the problem is Linux won't allow memory accesses to fail. It would stuck in a loop trying to free up memory, eventually triggers the oom killer.

It could have just let the memory access fail.

lallysingh · on April 10, 2020

.. or swap?

bluedays · on April 10, 2020

or both!?