Hacker News new | past | comments | ask | show | jobs | submit login

Plus, of course, a lot of modern systems use memory overcommit --- if you ask the OS for memory, it gives you uninitialised address space, and it only allocates pages of physical RAM the first time you touch each page of address space.

This has two effects:

(a) malloc() never returns NULL. It always returns a valid address, even though your system may have out of memory.

(b) by the time the kernel finds out that it's run out of physical pages, your process is already trying to use that memory... which means there's no way for your process to cope gracefully. You have to trust the kernel to do the right thing (either to scavenge a page from elsewhere, or to kill a process to make space). If you're very lucky, it'll send you a SIGSEGV...




> (a) malloc() never returns NULL.

This is not always true. For example, you can use ulimit to set an upper bound on the amount of address space the process can consume. Also, on a 32-bit system it is easy to imagine a process trying to allocate over 4G of memory and running out of address space.


With overcommit you will only get NULL from malloc() when you run out of the address space (or from the subset of address space that is used by malloc). When you overrun the address space ulimit, you are killed by some signal.


That's not right: http://pastebin.com/aA7dasDx

  jes@themisto:~$ cc -o alloc alloc.c
  jes@themisto:~$ ulimit -v 100000
  jes@themisto:~$ ./alloc
  malloc returned NULL after allocated 94227K
EDIT: Ubuntu 15.04, x86_64, Linux 3.19


however, ulimit will only affect your process; if other processes consume enough RAM, you can still run out of pages while under the influence of ulimit.


OK, mmap() will fail and thus malloc() will return NULL. The point is that you cannot rely on that. For example try what will happen with recursive functions that call alloca(1024).


Sorry how is this relevant to heap allocation? alloca just allocates memory off the stack - it's literally just bumping the stack register by the number of bytes you requested (and hence it's freed when the stack pointer is reset on returning from the function call). Alloca should fail when your current thread runs out of stack space - which will generally be only a few megabytes. The total stack for a new thread is allocated from the heap at its creation, not during its execution and certainly not as part of a call to alloca.


"malloc() won't fail"

"oh yes it does"

"oh no it doesn't"

demo of malloc failing

"alloca() won't fail!"

Stop moving the goalposts. Besides, alloca() tends to crash your program a lot sooner as stack allocation is usually very limited, especially in a shared-memory multithreading context. Hardly anyone uses alloca().


As other people are commenting, malloc can still return NULL, but the critical point here is that even if you handle the NULL cases, you will still miss the other situations where your process runs out of memory. In short, if the OS over-commits RAM, it is impossible for a program to safely handle out of memory situations.

One reason why overcommit is popular is because of fork(). Imagine a process that does malloc(lots of memory), then forks and execs a tiny program (/bin/true or something like it). If the fork() call succeeds, this means the OS has guaranteed that all the memory in the child process is available to be written over. i.e. it has had to allocate 'lots of memory' x 2 in total, even if only 'lots of memory' x 1 will actually be used.

Without overcommit, fork() can fail even if the system won't ever use anywhere close to the limit of RAM+swap space.


I with you on the no-point-checking-for-malloc-fail issue. It has to be done in every place, or it rarely is effective. My old colleage Mike Rowe used to call it "Putting an altimeter on your car. So if you drive over a cliff, you can see how far it is to the ground."


Definitely. In practice, it's impossible to do correctly. Even if your own code gets every malloc() correct (and also properly checks the return code of every syscall and library call that might fail due to lack of memory), you still have to trust that every library you use is also perfectly written and handles its memory failures just as perfectly. It'll never happen.


There's just one place that it makes sense: for very large buffers. If they fail, it doesn't mean the system is doomed. So it doesn't hurt to check that video-frame buffer alloc, or that file-decompression buffer. But for any of the small change (anything within orders of magnitude of the mean allocation size) its pointless.


> In short, if the OS over-commits RAM, it is impossible for a program to safely handle out of memory situations.

That's a non-sequitur.

If you don't check for NULL return, you have undefined behaviour--that is: your program might end up doing just about anything. If the OOM killer kills you, all visible effects of your program will still be perfectly consistent with the semantics of your program. And you have to be able to safely deal with that scenario anyhow, as much more fundamental resources such as electricity might run out at any time as well.


You're splitting hairs.

Imagine your program is processing a folder of emails. If the program is halted part-way through (or if you yank the power cable out) then the mail folder will be left in an inconsistent state.

If the program runs in an OS that does not overcommit memory, it is possible to write code that checks every malloc() and, if it hits a memory limit, you could shut down gracefully, fixing the mail folder so that its state is correct.

If the OS overcommits, then even if you checked every malloc() call, your program might die at any time because of a SIGSEGV or the OOM killer nuking it. There's no way to tidy up an incomplete run.

It's nothing to do with undefined behaviour, or dereferencing NULL pointers.


If you leave something in a corrupted state when execution is aborted at some random point, that means that your code (and possibly your on-disk data structure) is defective. It is perfectly possible to handle that case safely (using transactions, journaling, rollback, atomic replace, ...).

However, if you don't check for allocation failures, you get undefined behaviour, which you indeed cannot handle safely ... other than by avoiding the undefined behaviour in the first place by checking for allocation failures.

Also, if you check for allocation failures, you won't get a SIGSEGV. SIGSEGV is for invalid virtual addresses, not for lack of resources.


You can get memory errors outside of malloc() on an OS that overcommits RAM. Whether that is a SIGSEGV or some other signal, or just the heavy boots of the OOM killer nuking your process, doesn't really matter. The key point is that your program cannot handle them. It is impossible to handle every case.

Allocation failures can and do occur outside of malloc() when overcommit is the memory policy. Malloc might return memory that it believes is valid, but when you try to write to it later on, the OS discovers that it has no free space in the swap and no spare pages to evict. Result: process death (SIGSEGV? SIGBUS? not sure, but it doesn't matter)


> You can get memory errors outside of malloc() on an OS that overcommits RAM.

Did anyone claim otherwise?

> Whether that is a SIGSEGV or some other signal, or just the heavy boots of the OOM killer nuking your process, doesn't really matter.

It does, because you can catch SIGSEGV.

> The key point is that your program cannot handle them. It is impossible to handle every case.

It can, and it has to, if you don't want it to be defective. (Where "handling" does not mean "continue execution", but "don't corrupt persistent state"--you cannot continue execution with insufficient resources anyway, there is no way around that).

> Result: process death (SIGSEGV? SIGBUS? not sure, but it doesn't matter)

SIGKILL. See above.


Here is the mechanism that I wish existed...

Have a system call that allocates memory to a process or process group, but doesn't assign it. This call will fail if overall memory usage is too high.

When the process tries to get memory, it will only tap into that reserved allocation if the system is out of memory.

The OOM killer will never kill processes that have reserved spare memory.

The result? Long-running daemons with stable memory usage can be relatively easily protected against the OOM killer. Ideally the OOM killer never runs amok. But in the real world, it would be nice to be able to limit the damage if it has, and guarantee a usable shell for a superuser while it is going on.


The thing is: There isn't really any such thing as "the memory of a process". What about a page of a shared library that hasn't been loaded yet, but is mapped in more than one process, for example?

Also, if memory is guaranteed to be available at a later point, it cannot really be used for anything, as there is nowhere to put the contents of that memory the moment the process it's reserved for wants to use it.

But there are two APIs that can be used to implement the same goal: Linux has a procfs setting "oom_score_adj" that can be used to decrease the risk of a particular process being killed (commonly used for sshd for obvious reasons) and there are the mlock()/mlockall() syscalls that you can use to make sure some address space is backed by actual RAM rather than potentially swap.


Nothing you said disagreed with what I said.

It is true that the reserved memory could not be used for anything. This is a feature. The memory has been reserved for emergencies, and will keep part of the system usable when everything else goes belly up.

The two alternate methods that you specify are not as useful.

I cannot with oom_score_adj have a shell that can be logged in to and be entirely usable during a fork bomb. Yes, you can ssh in. But good luck running arbitrary commands.

With mlock()/mlockall() I can guarantee that a process is responsive during heavy memory pressure, but good luck if it needs to allocate more memory.

And neither system makes it particularly easy to set things up so that the OOM killer will avoid killing all daemons whose memory usage remained stable during memory pressure. (And that is the biggest problem with the OOM killer, that it killed random things you'd have preferred stayed up.)


> I cannot with oom_score_adj have a shell that can be logged in to and be entirely usable during a fork bomb. Yes, you can ssh in. But good luck running arbitrary commands.

But that's due to PID exhaustion, not due to memory exhaustion.

> With mlock()/mlockall() I can guarantee that a process is responsive during heavy memory pressure, but good luck if it needs to allocate more memory.

Well, good luck if the process needs more memory than it had reserved in your scheme. Obviously, you have to mlock() sufficient memory for peak need.

> And neither system makes it particularly easy to set things up so that the OOM killer will avoid killing all daemons whose memory usage remained stable during memory pressure. (And that is the biggest problem with the OOM killer, that it killed random things you'd have preferred stayed up.)

No, it doesn't kill random things, it kills things that are easy to reconstruct (little CPU use so far) and that have lots of memory allocated, and where the oom_score_adj doesn't tell it to spare the process for other reasons.


I have definitely been on a system with plenty of pids available, that was completely unusable due to memory exhaustion.

As for the OOM killer, its logic has changed over time. All that I definitely know is that if a slow leak in application code causes Apache processes to grow too quickly over time, it is usually a good idea to reboot the machine because you never know what random daemon got killed before the actual culprit.

Of course you also need to fix application code...

The logic that is applied probably works well in the desktop/developer case where what went wrong probably went wrong recently and there is a person who can notice. That isn't the context where I've usually encountered it.


You might be technically right, but I guarantee you that by your definition the vast majority of software out there is defective. The solutions you mention are non-trivial. It is really easy to make a mistake. While it may not be the operating systems fault at the end of the day, designing a system like this is setting up the overall experience for failure. A well designed system should make it easy to get right and not the other way around.


> You might be technically right, but I guarantee you that by your definition the vast majority of software out there is defective.

Your point being? That it's untechnically correct?

> The solutions you mention are non-trivial. It is really easy to make a mistake. While it may not be the operating systems fault at the end of the day, designing a system like this is setting up the overall experience for failure. A well designed system should make it easy to get right and not the other way around.

Which is all kindof true, but doesn't change that those APIs are the way they are, for historical reasons. Just because some API is broken, doesn't mean your code will work correctly when you pretend it's not broken.

edit: just to be sure: yes, the special-casing of zero-sized allocations is somewhat of a bug, which we still have to live with. Other than that, the API is actuall perfectly fine--if you don't check for allocation failures and you can't handle it if your program gets interrupted at any point, that's just a bug in your code, there is no useful general way to abstract those problems away.


I think the point is that it's easier to use techniques that keep the mail folder always in a consistent or recoverable state, such as journaling or copy-modify-move, than it is to fully handle out-of-memory conditions. Since the former technique is both easier and can handle problems that the latter can't, it's always preferable.


> If the fork() call succeeds, this means the OS has guaranteed that all the memory in the child process is available to be written over. i.e. it has had to allocate 'lots of memory' x 2 in total, even if only 'lots of memory' x 1 will actually be used.

That isn't the case anymore, at least on Linux and BSD. fork() uses a copy-on-write scheme so the OS only allocates/copies the parent memory space if the child attempts to write to it.


You jut repeated the parent's point while thinking you disagreed with it.

The point was that after you fork() in a copy-on-write scheme, the OS now has promised that more memory is available to write on than may actually exist. If the OS avoided overallocating, it would have to right there and then reserve lots of memory (without necessarily writing to it) just to be sure that you wouldn't run out at a later date.


You're correct. I do agree with the point, but I misinterpreted where he was going with that example.


Use posix_spawn not fork and exec, then at least the OS may not allocate twice the memory, eg the BSDs will not.


Good luck getting any existing 3rd-party code to use posix_spawn ...


> (a) malloc() never returns NULL. It always returns a valid address, even though your system may have out of memory.

This is only true on Linux, for example, FreeBSD by default has partial overcommit and WILL start returning null after overcommitting some percentage of memory (I think 30% or so by default)


Most major linux distros default to "heuristic overcommit", aka vm.overcommit_memory=0. The intent being to refuse obviously excessive allocations without sticking to a strict limit.

You can also enable behaviour like you describe by setting overcommit_memory=2 and some ratio overcommit_ratio (default 50) or exact number of kbytes overcommit_kbytes

Source: https://www.kernel.org/doc/Documentation/vm/overcommit-accou... https://www.kernel.org/doc/Documentation/sysctl/vm.txt


It should be noted though that (a) this behaviour is configurable, and (b) it's not actually true. If you malloc() so much memory that it cannot possibly be provided, Linux will return a failure, even in overcommit mode.


> (a) malloc() never returns NULL. It always returns a valid address, even though your system may have out of memory.

Unless of course you run out of virtual address space.


Linux can easily be configured not to do that. Then malloc returns null, when the underlying mmap nicely fails, and this can happen when the address space is nowhere near exhausted.


Hm, do you have any test/references on that? I always thought that in such cases system simply swaps memory to free some RAM.


Google for Linux OOM Killer (Out Of Memory Killer).

It's really annoying, your application may get killed without any way to react, even to just print a "Low memory" error.


The entire concept of the Linux OOM Killer is asinine


It is a well known design feature (I would call it a flaw) of Unix/Linux which makes forking processes with copy-on-write memory possible.

You can run out of both physical memory and swap space. Or your system is swapping is so heavily that, for all practical purposes, the system becomes unusable.


Probably one of the best examples of my claim that UNIX has bad architecture. A well-designed system will either prevent or catch this sort of thing with apps able to detect and/or recover from it without geniuses writing them. Then, there's UNIX/Linux...

I should probably add this to my list of non-security reasons (or an angle at least) for using Nizza-style architecture [1] where critical functionality is taken out of Linux part to run on microkernel or small runtime. Maybe even potential to go further and have such apps that monitor for this sort of thing with a signal to apps that works across platforms. Not just this design flaw, but potentially others where app doesn't have necessary visibility. What you think?

[1] http://genode-labs.com/publications/nizza-2005.pdf


Coming from a Windows viewpoint, I would have thought the same thing. AFAIK, under Windows you just end up hitting the system swap file. Of course, you can start getting into a situation where physical memory is so low that everything grinds to a (ahem, virtual) halt....


Yes, that's the first response. "running out of physical pages" here in the GP really means "running out of free or reclaimable pages and running out of swap".




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: