This has two effects:
(a) malloc() never returns NULL. It always returns a valid address, even though your system may have out of memory.
(b) by the time the kernel finds out that it's run out of physical pages, your process is already trying to use that memory... which means there's no way for your process to cope gracefully. You have to trust the kernel to do the right thing (either to scavenge a page from elsewhere, or to kill a process to make space). If you're very lucky, it'll send you a SIGSEGV...
This is not always true. For example, you can use ulimit to set an upper bound on the amount of address space the process can consume. Also, on a 32-bit system it is easy to imagine a process trying to allocate over 4G of memory and running out of address space.
jes@themisto:~$ cc -o alloc alloc.c
jes@themisto:~$ ulimit -v 100000
malloc returned NULL after allocated 94227K
"oh yes it does"
"oh no it doesn't"
demo of malloc failing
"alloca() won't fail!"
Stop moving the goalposts. Besides, alloca() tends to crash your program a lot sooner as stack allocation is usually very limited, especially in a shared-memory multithreading context. Hardly anyone uses alloca().
One reason why overcommit is popular is because of fork(). Imagine a process that does malloc(lots of memory), then forks and execs a tiny program (/bin/true or something like it). If the fork() call succeeds, this means the OS has guaranteed that all the memory in the child process is available to be written over. i.e. it has had to allocate 'lots of memory' x 2 in total, even if only 'lots of memory' x 1 will actually be used.
Without overcommit, fork() can fail even if the system won't ever use anywhere close to the limit of RAM+swap space.
That's a non-sequitur.
If you don't check for NULL return, you have undefined behaviour--that is: your program might end up doing just about anything. If the OOM killer kills you, all visible effects of your program will still be perfectly consistent with the semantics of your program. And you have to be able to safely deal with that scenario anyhow, as much more fundamental resources such as electricity might run out at any time as well.
Imagine your program is processing a folder of emails. If the program is halted part-way through (or if you yank the power cable out) then the mail folder will be left in an inconsistent state.
If the program runs in an OS that does not overcommit memory, it is possible to write code that checks every malloc() and, if it hits a memory limit, you could shut down gracefully, fixing the mail folder so that its state is correct.
If the OS overcommits, then even if you checked every malloc() call, your program might die at any time because of a SIGSEGV or the OOM killer nuking it. There's no way to tidy up an incomplete run.
It's nothing to do with undefined behaviour, or dereferencing NULL pointers.
However, if you don't check for allocation failures, you get undefined behaviour, which you indeed cannot handle safely ... other than by avoiding the undefined behaviour in the first place by checking for allocation failures.
Also, if you check for allocation failures, you won't get a SIGSEGV. SIGSEGV is for invalid virtual addresses, not for lack of resources.
Allocation failures can and do occur outside of malloc() when overcommit is the memory policy. Malloc might return memory that it believes is valid, but when you try to write to it later on, the OS discovers that it has no free space in the swap and no spare pages to evict. Result: process death (SIGSEGV? SIGBUS? not sure, but it doesn't matter)
Did anyone claim otherwise?
> Whether that is a SIGSEGV or some other signal, or just the heavy boots of the OOM killer nuking your process, doesn't really matter.
It does, because you can catch SIGSEGV.
> The key point is that your program cannot handle them. It is impossible to handle every case.
It can, and it has to, if you don't want it to be defective. (Where "handling" does not mean "continue execution", but "don't corrupt persistent state"--you cannot continue execution with insufficient resources anyway, there is no way around that).
> Result: process death (SIGSEGV? SIGBUS? not sure, but it doesn't matter)
SIGKILL. See above.
Have a system call that allocates memory to a process or process group, but doesn't assign it. This call will fail if overall memory usage is too high.
When the process tries to get memory, it will only tap into that reserved allocation if the system is out of memory.
The OOM killer will never kill processes that have reserved spare memory.
The result? Long-running daemons with stable memory usage can be relatively easily protected against the OOM killer. Ideally the OOM killer never runs amok. But in the real world, it would be nice to be able to limit the damage if it has, and guarantee a usable shell for a superuser while it is going on.
Also, if memory is guaranteed to be available at a later point, it cannot really be used for anything, as there is nowhere to put the contents of that memory the moment the process it's reserved for wants to use it.
But there are two APIs that can be used to implement the same goal: Linux has a procfs setting "oom_score_adj" that can be used to decrease the risk of a particular process being killed (commonly used for sshd for obvious reasons) and there are the mlock()/mlockall() syscalls that you can use to make sure some address space is backed by actual RAM rather than potentially swap.
It is true that the reserved memory could not be used for anything. This is a feature. The memory has been reserved for emergencies, and will keep part of the system usable when everything else goes belly up.
The two alternate methods that you specify are not as useful.
I cannot with oom_score_adj have a shell that can be logged in to and be entirely usable during a fork bomb. Yes, you can ssh in. But good luck running arbitrary commands.
With mlock()/mlockall() I can guarantee that a process is responsive during heavy memory pressure, but good luck if it needs to allocate more memory.
And neither system makes it particularly easy to set things up so that the OOM killer will avoid killing all daemons whose memory usage remained stable during memory pressure. (And that is the biggest problem with the OOM killer, that it killed random things you'd have preferred stayed up.)
But that's due to PID exhaustion, not due to memory exhaustion.
> With mlock()/mlockall() I can guarantee that a process is responsive during heavy memory pressure, but good luck if it needs to allocate more memory.
Well, good luck if the process needs more memory than it had reserved in your scheme. Obviously, you have to mlock() sufficient memory for peak need.
> And neither system makes it particularly easy to set things up so that the OOM killer will avoid killing all daemons whose memory usage remained stable during memory pressure. (And that is the biggest problem with the OOM killer, that it killed random things you'd have preferred stayed up.)
No, it doesn't kill random things, it kills things that are easy to reconstruct (little CPU use so far) and that have lots of memory allocated, and where the oom_score_adj doesn't tell it to spare the process for other reasons.
As for the OOM killer, its logic has changed over time. All that I definitely know is that if a slow leak in application code causes Apache processes to grow too quickly over time, it is usually a good idea to reboot the machine because you never know what random daemon got killed before the actual culprit.
Of course you also need to fix application code...
The logic that is applied probably works well in the desktop/developer case where what went wrong probably went wrong recently and there is a person who can notice. That isn't the context where I've usually encountered it.
Your point being? That it's untechnically correct?
> The solutions you mention are non-trivial. It is really easy to make a mistake. While it may not be the operating systems fault at the end of the day, designing a system like this is setting up the overall experience for failure. A well designed system should make it easy to get right and not the other way around.
Which is all kindof true, but doesn't change that those APIs are the way they are, for historical reasons. Just because some API is broken, doesn't mean your code will work correctly when you pretend it's not broken.
edit: just to be sure: yes, the special-casing of zero-sized allocations is somewhat of a bug, which we still have to live with. Other than that, the API is actuall perfectly fine--if you don't check for allocation failures and you can't handle it if your program gets interrupted at any point, that's just a bug in your code, there is no useful general way to abstract those problems away.
That isn't the case anymore, at least on Linux and BSD. fork() uses a copy-on-write scheme so the OS only allocates/copies the parent memory space if the child attempts to write to it.
The point was that after you fork() in a copy-on-write scheme, the OS now has promised that more memory is available to write on than may actually exist. If the OS avoided overallocating, it would have to right there and then reserve lots of memory (without necessarily writing to it) just to be sure that you wouldn't run out at a later date.
This is only true on Linux, for example, FreeBSD by default has partial overcommit and WILL start returning null after overcommitting some percentage of memory (I think 30% or so by default)
You can also enable behaviour like you describe by setting overcommit_memory=2 and some ratio overcommit_ratio (default 50) or exact number of kbytes overcommit_kbytes
Unless of course you run out of virtual address space.
It's really annoying, your application may get killed without any way to react, even to just print a "Low memory" error.
You can run out of both physical memory and swap space. Or your system is swapping is so heavily that, for all practical purposes, the system becomes unusable.
I should probably add this to my list of non-security reasons (or an angle at least) for using Nizza-style architecture  where critical functionality is taken out of Linux part to run on microkernel or small runtime. Maybe even potential to go further and have such apps that monitor for this sort of thing with a signal to apps that works across platforms. Not just this design flaw, but potentially others where app doesn't have necessary visibility. What you think?