
Does malloc lazily create the backing pages for an allocation on Linux? (2009) - tzhenghao
https://stackoverflow.com/questions/911860/does-malloc-lazily-create-the-backing-pages-for-an-allocation-on-linux-and-othe
======
blibble
it'll call mmap (probably), which will then find 1GB(+ overhead) of contiguous
address space

mmap will go and create the page table mappings (entries) for that address
space (~262144 of them with 4k pages)

unless someone's been messing with overcommit: the physical memory allocation
is normally lazy

this is done by setting the "faulted" bit on the allocated page table entries

when the application first reads/writes from a page not yet allocated: a fault
will occur, the kernel will step in, (hopefully) find an unused physical 4kb
of memory, update the page table entry to point to it, unmark the faulted bit,
then return to the application (which will be blissfully unaware that anything
happened)

~~~
dividuum
> it'll call mmap (probably), which will then find 1GB(+ overhead) of
> contiguous address space

It does. Last time I dug into that I think there was a 128KB threshold.
Smaller allocations would use sbrk while larger once would use mmap.

~~~
saagarjha
The limit is now dynamic on glibc:
[https://sourceware.org/glibc/wiki/MallocInternals#Malloc_Alg...](https://sourceware.org/glibc/wiki/MallocInternals#Malloc_Algorithm),
[http://man7.org/linux/man-
pages/man3/mallopt.3.html](http://man7.org/linux/man-
pages/man3/mallopt.3.html)

------
OskarS
Suppose you WANT malloc to be eager (i.e. make sure all the pages are
committed right away, so you don't pay a performance penalty for it later),
what do you do?

The obvious answer is just looping through it and setting it to zero would be
replaced with a call to `calloc` by an optimizing compiler, and doesn't calloc
also do some kind of similar "lazy" trick? So what do you do? Mark the pointer
as volatile and loop through it and set all the memory to 0 (or some other
value)? Call memset, I guess?

EDIT: apparently malloc + memset compiles to calloc as well [0], so if you
want to allocate, zero out memory, and make sure it's committed, using
volatile seems like the best bet to me...

[0]: [https://godbolt.org/z/qhaaYc](https://godbolt.org/z/qhaaYc)

~~~
htfy96
First of all, you only need to touch a single byte of each page to populate
it. Besides, this approach doesn't ensure that the backing pages is allocated,
for it's (technically) legal to only allocate one physical zero page after the
calloc.

Therefore, you must use OS-dependent primitives to force eager allocation. On
linux this can be achieved by two flags for mmap: MAP_POPULATE initializes the
page table entries eagerly (but doesn't guarantee a backing store allocation),
and MAP_LOCKED for touching all pages (just like the calloc, but at kernel
level and more deterministic). Note that MAP_LOCKED doesn't raise error for
touching errors due to low memory, so you still need to touch it again at
user-level.

~~~
OskarS
Why wouldn’t the “volatile” approach be valid? I mean, the only way to
guarantee that you can write to memory without error (or to make sure it’s
backed by actual memory pages) is to actually write to it, and the way to
guarantee you’re doing that in a platform independent way in C is to use
volatile, right?

Are you really saying that this is not possible with malloc, you have to drop
down to platform-dependent system calls like mmap?

~~~
codys
Because it's legal for a platform to perform page deduplication as long as
from the perspective of the program the correct operating model is followed,
many operating systems (including Linux) have a "zero page" that is used for
_all_ zeroed memory blocks in the system. Writing a non zero value then would
replace it with a different page.

This means that writing zero is unlikely to cause a real allocation to occur.
Writing a non zero value may be more likely to cause a page allocation.

That said: it can be entirely legal for a platform to do more page
deduplication than just special casing the zero page. Linux has some kernel
tooling called KSM (kernel same page merging).

------
zabzonk
malloc() is a C (and C++) Standard Library function that you could write user-
level code for yourself (in fact, there's an extended exercise in TCPL to do
just that). It doesn't do the low-level handling of pages - that's down to the
Linux kernel.

~~~
acqq
But that malloc is implemented in C lib doesn’t mean that the question and the
answers are wrong. Eventually system specific calls have to occur. The
question is very specific, asking what happens in Linux when one makes a
malloc call requesting a big allocation:

“does it actually create 1 GiB worth of swap pages? Or does it mprotect the
address range and create the pages when you actually touch them like mmap
does?”

And how it is implemented can be seen, as an example, here:

[https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c...](https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/)

Note both mprotect and mmap references.

~~~
temac
The question is only semi-specific and the result depends on a ton of
variables, like: which libc do you use; which kernel (maybe a patched linux
kernel like on Android, don't know if they change anything regarding memory
alloc though) kernel configuration do you use (esp. in regard with
overcommit), do you have any other additional tuning, etc.

> does it actually create 1 GiB worth of swap pages

Create? No; probably never.

> does it mprotect the address range and create the pages when you actually
> touch them like mmap does?

Both mmap and mprotect are syscalls, and syscalls do not (directly) call other
syscalls. So mmap does not do that. And malloc can't because it does not run
anymore after it has returned.

On top of all of that, malloc(insanely_big_value) can still fail immediately
by returning NULL even if you have overcommit activated. Granted, 1GB is
probably never an "insanely_big_value"? Not even sure on 32 bits systems
though.

~~~
acqq
Asking "in Linux" obviously didn't mean "on Android" and also not "in Linux
after I patch the kernel configuration."

The question was clear, the answer on the site was clear, by the answer here
to which I responded was by somebody who I suspect simply hasn't read what was
written there.

------
lightedman
Is this why when I create an encrypted RAM drive that Linux shits itself? I
assume that it should not fill itself up yet it does almost instantly and
brings the system, down to its knees.

~~~
tedunangst
No.

