

If malloc fails, it's not for the obvious reason - scvalex1
http://www.scvalex.net/posts/6/

======
saurik
In addition to this guy having a total misunderstanding of how this worked and
why it eventually failed ("control structures"?!") the article glosses over
how this behavior relates to configuration options such as vm.overcommit... it
doesn't even use the correct term (overcommit), so readers don't manage to
even leave with the ability to rapidly learn more about the subject.

Yet, I have now been tricked into seeing it twice, as in addition to being on
the HN home page for over 6 hours (with not that many upvotes, but still way
too many) it was renamed by a moderator from the original title it was posted
with ("Malloc never fails", weirdly in the opposite direction of the policy of
"use the original title" that is normally cited for legitimizing renames) so
you couldn't recognize "oh, this is that wrong article from earlier".

Worse, a somewhat informative comment on this site from the user "khm", one
that links to a better article that discusses these issues with the right
terminology an in context with the OOM killer, is marked [dead], caught by
some spam filter or maybe even dowvoted/flagged to death by readers.

Frowny _pants_.

~~~
pootch
I agree, this was a non-enlightening article which did not educate on the
concept of memory overcommit, just focused on malloc failure.

------
munin
> We just need to be careful because malloc returning successfully does not
> always mean that we can use the requested memory.

this is not really true and it's dangerous to think this way (IMO).

one of the awesome things about virtual memory is you can "use" more memory
than you actually have. if you only have 4GB of physical pages on your system
and you allocate 18GB of virtual memory, things will be fine until you write
to more pages than you have.

at this point, things are still fine! the memory manager will pick some pages
(usually ones that haven't been accessed in a while) and page them out to a
paging file. this behavior is transparent to your application. when your
application tries to access those pages, the memory manager will bring those
pages back in. if it can't bring the pages back in because all the pages are
taken, it will choose some pages to put back into the page file. and so on ...

"so what happens when the page file is full?" well, modern operating systems
have a page file that grows. so really, "what happens when the page file is
uncomfortably large" is the question, and the answer there is the kernel will
start killing tasks, and the first on the chopping block will probably be
yours.

some platforms allow you to register for notification from the memory manager
that there is a lot of pressure for physical pages or virtual addresses. the
problem with trying to gain awareness like this is that the pressure on memory
is generally totally outside your control. if the memory manager tells your
app 'hey, free any memory you're not using' you are going to say 'oh yeah
right i just had these hundreds of megabytes just sitting around but now that
you mention it i probably should release them'. uh huh.

imo short takeaway:

1\. check if malloc is null because the behavior of the allocator is platform
dependent.

2\. if you get a non-null pointer from malloc, treat it is valid. there is
nothing else you can do. perhaps touching it will make the kernel kill your
task perhaps not! it could be because your app is using too much memory. it
could be because the app next door is using too much memory! you can't know.

3\. you should take an operating systems class from a university if you want
to really explore this topic in depth

~~~
joshhart
With an application using garbage collection it's quite easy to have hundreds
of megabytes just sitting around, so I think it could be a good idea in some
cases.

~~~
cbhl
Perhaps, but an application using garbage collection will likely have malloc
abstracted away from the developer.

~~~
anders0
But couldn't the language's runtime (or the GC library, or whatever) register
for these notifications?

~~~
cbhl
Yes, but my impression is that the set of conditions where this helps (you
have memory pressure AND the runtime has lots of memory it can free up) is
rather rare. If you have memory pressure, that's usually the same time that
the runtime is actually trying to make use of lots of memory.

IIRC, Android provides something along these lines (as does Windows Phone
7.1), and when you get under memory pressure (common on Honeycomb-based
tablets) you end up with thousands of these notifications, each freeing up 1KB
or 2KB when you really need to free up tens of megabytes (which you can only
do by serializing and terminating an application).

------
tjdetwiler
This isn't really that surprising for anyone who's taken a half decent OS
course.

------
negamax
Makes me respect the early engineers and programmers who have put in great
thought in building these blocks. Seriously, reading through all this in
HTML5, Android, iOS world of today with IDE, code samples, every book and
whole communities to help you, make me respect early programmers so much. They
were the real risk takers to venture into something entirely unknown and
created a whole new discipline which is empowering every existent discipline
and creating many more.

------
khm
This guy is wrong, for the record. The default behavior of malloc on linux
involves kernel magic to decide whether to allocate the memory. Since he
wasn't actually writing any data to any of his ram, it didn't fail until he
ran out of address space. If he'd actually used the memory, it would have
failed sooner.

The other, bigger problem is his assumption that all linux installations work
like this. Frequently vm.overcommit_memory is set to disable malloc calls for
more than the available amount of memory, which means the oom-killer never
gets inolved. He's making a ton of dangerous assumptions based on tooling
around with code instead of reading the documentation on the matter (i.e.
malloc(3)).

malloc can fail, and you should deal with it well: <http://ewontfix.com/3/>

------
paupino_masano
I love articles like this. I always learn something new. I must admit that I
did think that malloc would use VM if necessary hence giving a potentially
"unlimited" address space, but I guess it is nice to have that cemented rather
than theorized. Can we ever really count on memory being allocated to RAM? I
think not (it's against the kernel/user buffer right?), but it's good to know
these things.

Of course: I attempt to always be in a learning phase so if anyone can expand
(or elaborate) then I'd appreciate it :)

~~~
thwarted
_I must admit that I did think that malloc would use VM if necessary hence
giving a potentially "unlimited" address space_

There's no distinction between "VM" and non "VM" as far as malloc is
concerned. From the perspective of the process, it's one big, flat address
space. It's always "virtual", because the values of the pointers do not
necessarily equal the physical layout of the memory that is directly backed by
RAM.

------
ch0wn
Meta: The article itself is wrong as it was pointed out here. However, I
learned a lot from the discussion around it. Should I upvote the submission or
not?

------
joshavant
The Phrack link brings back fond, fond memories.

------
afhof
So once you've allocate more memory than you have, what happens when you to to
write to any of it?

~~~
jws
I altered the program to first allocate 64G then go through and write on each
page of the allocation on my 16G machine, dropped my swap partition.

After suitable screwing around to keep gcc from optimizing away the writes, it
goes a bit like this…

    
    
      jim@gvdev:~$ ./fail
      Before
      Allocated 1 GB
      Allocated 2 GB
      ...
      Allocated 64 GB
      After
      Writing in 0 GB
      Writing in 1 GB
      ...
      Writing in 15 GB
      ... rather long delay here...
      Write failed: Broken pipe
      upstairs:~ jim$    <----- bad news here, notice the machine.
      upstairs:~ jim$ slogin gvdev.XXXXX.net
      ssh: connect to host gvdev.XXXXX.net port 22: Network is unreachable
    

… so now I have to get back in "going out" clothes, go downstairs, and drive
over to gvdev and reboot it.

~~~
mileswu
Kinda OT, but out of interest, why do you use 'slogin' instead of 'ssh'? Just
old habits dieing hard from 'rlogin'?

~~~
jws
Yes.

------
kephra
Now run the same code on a VZ/Virtuozzo instance, and malloc will fail much
earlier, because VZ limits allocated memory and not dirty memory.

In result a Xen system with 128mb RAM can run much more than a VZ with 512MB.

------
feralchimp
Does Linux behave similarly by default if one uses calloc instead?

~~~
EdiX
Of course, it's basically a wrapper around malloc. All allocated memory is
subject to this behaviour: malloc/calloc, forked "copy-on-write" memory, mmaps
and statically allocated segments (especially the stack).

In a system with MMU all allocating memory does is tell the kernel to not give
you a segfault when you access some range of virtual addresses. To actually
materialize a virtual page you have to access it.

~~~
alexkus
It doesn't behave the same with calloc because calloc zeros the memory and
therefore writes to each page.

Just try the same program replacing malloc( 1 << 30 ) with calloc( 1, 1<<30).

~~~
pedrocr
>because calloc zeros the memory and therefore writes to each page.

One does not imply the other. Internally what the kernel can do is link the
page address it gives you to the zero page and mark it as copy on write. Only
when you actually write to it will it allocate an actual page to back it. Only
if your libc implements calloc as malloc+memset would this be a problem. Does
glibc do that?

In fact the copy on write is probably also done on malloc as well. Even though
the manpage implies different behavior (malloc doesn't guarantee setting the
memory to 0, while calloc does) I don't think any sane kernel will give you
someone else's free()'d memory. It would be a security leak.

~~~
vsync
> Only if your libc implements calloc as malloc+memset would this be a
> problem. Does glibc do that?

I just checked (see my reply to the parent) and it doesn't.

> In fact the copy on write is probably also done on malloc as well. [...] I
> don't think any sane kernel will give you someone else's free()'d memory

You won't get someone else's freed memory but you're quite likely to get your
own back and in that case it won't necessarily be zeroed.

~~~
pedrocr
>You won't get someone else's freed memory but you're quite likely to get your
own back and in that case it won't necessarily be zeroed.

Surely the kernel never gives a process it's own pages back. It would keep an
unneeded page around that could just be pointing to the zero page.

Reading your other comment I assume what you mean is that it is a two step
process where the kernel always gives zero pages but glibc's malloc
implementation keeps some stock of pages and will return them back in a
malloc() after a free(). That way you're not guaranteed to get zero'd memory
on every malloc() since not all of it comes straight from the kernel.

The calloc() implementation has checks for that and will do the clearing when
the memory is coming from the glibc stock and not the kernel. But even in that
case it's only doing clearing when the page is already in the process address
space. So a _process_ will always receive zero pages from the kernel, but the
malloc() implementation is made more efficient by giving you back some of your
own free()'d memory that from the kernel's point of view was never given back.

Does that sound about right?

~~~
vsync
I didn't look in enough detail on the Linux side to see if it _always_ gives a
zero page reference, or does/doesn't clear out a fresh page when it's
referenced based on how it was allocated and by whom. I could see that saving
time but I can easily see just nuking a whole page being faster than the work
of tracking and checking.

From the glibc side I believe you are exactly correct.

------
rgbrgb
For anyone interested: <http://lxr.linux.no/linux+v3.1/lib/inflate.c#L244>

------
arekp
False. malloc() fails with a negative argument.

~~~
to3m
malloc's argument is of type size_t, which is unsigned.

~~~
arekp
Extending my comment, I am almost sure that passing a small negative argument
(which is converted to a large positive number) causes malloc to return NULL
on Linux.

------
throwaway_95014
tl;dr : "Hi, I have a blog and it's taught me nothing about virtual memory".

