Yeah. I didn't really think of it as a bug at first, but I'm glad someone called it that. I wish the system would just kill the browser or low priority processes instead of freezing everything in an instant.
You can configure Linux to return ENOMEM errors. The theory is that most applications will effectively treat this as fatal and die; so, would you rather kill the app that happened to make the most recent memory allocation or would you rather kill the app using all the memory (and have some control over keeping processes like sshd from dying)?
I've tried using a VM with overcommit turned off and a modest amount of memory. Among other things, my mail reader, mutt, used more than half the system's memory when looking at my mail archive, so it couldn't fork to exec an editor to write a new mail.
The fundamental problem here is that the kernel can't time-travel into the future far enough to make sure no higher-priority processes will try to allocate more memory, as the system has to make the decision to return ENOMEM at the time of the allocation. There is a reasonable(-ish?) default to deny allocations that request more memory than currently available (using the Linux definition of available, that is total memory without the sum of resident sets of all processes and a bit more). Of course, this limit only works properly if other processes do not add anything to their working sets, but again, the kernel's fortune-telling abilities are quite limited.
It can't look into the future, but it certainly could use the information from the past.
Has this application already allocated 90% of the memory? Has it been steadily growing the allocation without releasing much back? Well then, why let the system run out completely? Why not stop it at say 10% or 5% left?
For a desktop OS that would be very unusual in my experience.
For a server, perhaps, but I'd say even for a server it would be better that say the core application returns a 500 internal error or whatever than forcing the system to start killing random processes.
A lot of processes don't handle ENOMEM well. Your mail client asks for memory for a buffer to write an email and it gets ENOMEM, what's it going to do? Silently fail to do anything? Pop up an error message (which will probably take memory to display)? Exit?
Pop up an error message (which will probably take memory to display)?
I don't know how common this is on the Linux/POSIX side, but in DOS and Windows it is usual practice to allocate some amount of memory at startup and use that for error-handling code, so that things like showing error messages will not cause any more allocations.
You can configure your kernel for that, but you can't configure most software written for linux not to make absurd allocations.
It's also hard to get a good idea of how much memory a program is really using, which makes setting reasonable limits for things tricky.
The best you can do is to have a small swap space, and alert when it gets to 50% and fix whatever. But then you have the problem of filesystem pages evicting anonymous pages to swap which ruins the utility of the swap usage as a gauge and/or drives you to configure way more swap than is reasonable. Although, maybe a bigger swap space and alerts on swap i/o rate might be ok, other than it's awful painful to have swap space of even 0.5x ram if you have 1TB of ram.
IMO, that's exactly what it should do. The time when RAM was expensive is gone. Hard-drive swap was always more than a bit of a kludge. So the OS needs to list available options to the user (they -should- choose), and tell apps 'you're SOL, so do whatever you need to do."
It can. System-wide it's one extreme or another. With overcommit enabled, you'll pretty much never get refused. With overcommit disabled, you'll get refusals was soon as you reach the max memory, which means lots of mapped pages wasting space.
The middle ground is your own config unfortunately - cgroups can limit available memory, but you'll have to set it up by hand.
An operating system should never overallocate memory because one cannot build reliable applications and infrastructure on top of a kernel which is lying to the application.
Windows will never overcommit memory. As a result on my PC unless I dedicate a substantial (over 20%) fraction of my drive to swap (most of which will never, ever be touched) I will 'run out of RAM' far before even half of the physical RAM in my PC has been used. This seems extremely wasteful.
> one cannot build reliable applications and infrastructure on top of a kernel which is lying to the application.
You cannot make reliable systems thinking like that. The point of resilience is __not__ to rely on correctness of kernel's advertised behavior nor correctness of your assumptions about it.
Actually I can and have system engineered ultra-reliable production systems which ran over ten years without issues: I'd come back to work at a former employer and the system would still be there, serving. This was not an isolated scenario.
One cannot build highly available systems and networks on unreliable, lying software. Basic guarantees are required. If one has 10,000 systems and one loses power on even a portion of such lying systems without basic guarantees, no amount of distribution will guarantee data consistency. I'm no stranger to building very large distributed clusters, but those builds start with an operating system and software which do not overcommit and which are paranoid about data integrity and correctness of operation. In fact, I'm specialized in designing such networks and systems, from hardware to storage all the way up to application software.
Mapped memory is part of that lying. If you want to use efficient mmap of huge files without lots of manual bookkeeping, you don't want the system to actually allocate as much memory as you specify. With overcommit off you're setting aside memory you're almost never going to use.
Many databases rely on the overcommit being possible.
Partially. Without overcommit the system needs to guarantee you have enough backing memory to allow you changing the whole file before the drive writes back anything.
There's the rub. Different people in different situations will want different outcomes from a low-memory situation. One of those different outcomes has to be the default.
Actually most Linux users are blissfully unaware that Linux overallocates memory and of those that know about it, very few understand the consequences to applications (such as data corruption or data loss). I doubt anyone who fully understood the consequences would want to run their infrastructure on something so unreliable and I doubt it because it's logical to me that they wouldn't.
You got it backwards. Those of us who care about reliability would never rely on reliability of a single system to safely store data. And overallocating memory is not even remotely that important compared to everything else that can go wrong on a system.
IPv4 on its own is not reliable, but TCP adds reliability on top of it. Aren't distributed systems in general also designed to be reliable despite being composed of unreliable components?