
Once upon a time, memory allocators made sense - dchest
https://github.com/Tarsnap/libcperciva/commit/cabe5fca76f6c38f872ea4a5967458e6f3bfe054
======
david-given
Plus, of course, a lot of modern systems use memory overcommit --- if you ask
the OS for memory, it gives you uninitialised address space, and it only
allocates pages of physical RAM the first time you touch each page of address
space.

This has two effects:

(a) malloc() never returns NULL. It always returns a valid address, even
though your system may have out of memory.

(b) by the time the kernel finds out that it's run out of physical pages, your
process is already trying to use that memory... which means there's no way for
your process to cope gracefully. You have to trust the kernel to do the right
thing (either to scavenge a page from elsewhere, or to kill a process to make
space). If you're _very_ lucky, it'll send you a SIGSEGV...

~~~
joosters
As other people are commenting, malloc can still return NULL, but the critical
point here is that even if you handle the NULL cases, you will still miss the
other situations where your process runs out of memory. In short, if the OS
over-commits RAM, it is impossible for a program to safely handle out of
memory situations.

One reason why overcommit is popular is because of fork(). Imagine a process
that does malloc(lots of memory), then forks and execs a tiny program
(/bin/true or something like it). If the fork() call succeeds, this means the
OS has guaranteed that all the memory in the child process is available to be
written over. i.e. it has had to allocate 'lots of memory' x 2 in total, even
if only 'lots of memory' x 1 will actually be used.

Without overcommit, fork() can fail even if the system won't ever use anywhere
close to the limit of RAM+swap space.

~~~
JoeAltmaier
I with you on the no-point-checking-for-malloc-fail issue. It has to be done
in every place, or it rarely is effective. My old colleage Mike Rowe used to
call it "Putting an altimeter on your car. So if you drive over a cliff, you
can see how far it is to the ground."

~~~
joosters
Definitely. In practice, it's impossible to do correctly. Even _if_ your own
code gets every malloc() correct (and also properly checks the return code of
every syscall and library call that might fail due to lack of memory), you
still have to trust that every library you use is also perfectly written and
handles its memory failures just as perfectly. It'll never happen.

~~~
JoeAltmaier
There's just one place that it makes sense: for _very_ large buffers. If they
fail, it doesn't mean the system is doomed. So it doesn't hurt to check that
video-frame buffer alloc, or that file-decompression buffer. But for any of
the small change (anything within orders of magnitude of the mean allocation
size) its pointless.

------
jheriko
erm... allocating zero was never a good idea. it had varying behaviours on
varying platforms for as long as i could remember and was one of those special
cases to avoid.

the allocators didn't necessary even meet the alignment specification
either... even if the documents said they did. i remember reading that malloc
on windows would return things on 8-byte boundaries only to find pointers
ending in 4 (rather than 0 or 8) coming out of it...

and as many have pointed out returning null may or may not happen in a number
of situations. i've seen it happen enough times when i ask for too much memory
to believe that it is a useful fail case to check for...

~~~
cperciva
_erm... allocating zero was never a good idea. it had varying behaviours on
varying platforms for as long as i could remember and was one of those special
cases to avoid._

No, allocating zero bytes is a perfectly reasonable thing to do, as long as
you remember that NULL is not necessarily an allocation failure.

    
    
        if (((buf = malloc(buflen)) == NULL) && (buflen > 0))
            goto OUTOFMEMORY;
    

is perfectly good code.

~~~
jheriko
thats not even close to reasoning imo, its just an assertion.

... but i agree that the code is safer if you check for bad cases.

i can't see why allocating zero ever is more sensible than not doing it at all
(and therefore avoiding even having to know about this problem, much less deal
with it).

------
kazinator
The fix is to wrap malloc and friends with your own routines.

You can have it so that "my_malloc" always returns NULL on a zero length
object. Or, if you prefer, so that it always returns a unique pointer:

    
    
       void *my_malloc(size_t size)
       {
          return size ? malloc(size) : 0;  /* option A */
       }
    
    
       void *my_malloc(size_t size)
       {
          return malloc(size ? size : 1);  /* option B */
       }
    

And you can make my_realloc have the nice freeing behavior:

    
    
       void *my_realloc(void *ptr, size_t size)
       {
         return (ptr && size)
                ? realloc(ptr, size)
                : !ptr
                  ? my_malloc(size) 
                  : (free(ptr), 0);
       }
    

Note how we still carefully implement a parallel requirement to the one in ISO
C. Namely that my_realloc(NULL, size) calls my_malloc(size), regardless of
size, just like realloc(NULL, size) is required to be equivalent to
malloc(size). We want all the routines in this allocator to be drop-in
replacements such that any code which is written to the ISO C allocator spec
can be blindly retargetted to use them.

Pretty much any serious C program, if indeed it doesn't have an entire
allocator of its own, at least wraps the standard one, for the sake of more
uniform behaviors, as well as easy retargettability to embedded scenarios.
(Not only embedded machines, but say, embedding in a larger application, where
you are told "you must use this table of funtion pointers as your allocator"
and it doesn't quite look like malloc: zero allocations are not allowed,
realloc is missing, ...)

In this particular program, realloc is wrapped under a function called resize;
resize had to be patched not to rely on realloc(nonnull, 0), but users of
resize don't have to change. The programmer doesn't have to look for fifty
uses of resize to fix them. However, it's worth it to wrap the functions at a
lower level anyway, with an identical API.

The C library is far from perfect. Do not expect consistency, completeness,
beauty, symmetry and so on.

There are worse things in it than realloc(ptr, 0) not behaving in the neat way
that you would like. Oh such as:

    
    
       isalpha(str[42]);   // str is char array
    

being undefined behavior if char happens to be signed, and str[42] is
negative, because isalpha takes an int argument which is expected to hold a
positive byte value [0, UCHAR_MAX), and not char value. Whoooops!

~~~
klodolph
You can make my_realloc() simpler:

    
    
        void *my_realloc(void *ptr, size_t size)
        {
            return size ? realloc(ptr, size) : (free(ptr), 0);
        }
    

According to the C standard, realloc(NULL, size) is guaranteed to be
equivalent to malloc(size). From n1548 7.22.3.5 para 3:

> If ptr is a null pointer, the realloc function behaves like the malloc
> function for the specified size.

Also, free(0) is well-defined. From n1548 7.22.3.3 para 2:

> If ptr is a null pointer, no action occurs.

------
antirez
The fact when you reallocate you mean to trow the data away does not make
sense... In general the article seems honestly a bit confused in what it
states.

~~~
cperciva
realloc can change the size of a memory allocation and return the same
pointer.

~~~
antirez
What I mean is that, when you say:

> Unfortunately, some people are not sane, and have decided that it's equally
> valid for realloc(p, 0) to return NULL meaning-failure and _not free p_.

What you want is the following semantics:

    
    
        char *my_old_vector = ... ; /* We created / allocated in some way. */
        add_important_stuff_to(my_old_vector);
    
        char *new_vector = realloc(my_old_vector,size+100);
        if (new_vector == NULL) {
           /* I still have the old vector and can recover. */
        }

------
acqq
What I don't understand is why some programmers invest so much energy in using
realloc. When you know that for most cases realloc actually does a new alloc
and copy of the existing content, but has the corners of undefined or unwanted
behavior, why not doing it in your own function in a way that you fully (for
your needs) control?

~~~
acconsta
For large allocations, realloc can remap virtual memory instead of doing a
naive copy:

[http://blog.httrack.com/blog/2014/04/05/a-story-of-
realloc-a...](http://blog.httrack.com/blog/2014/04/05/a-story-of-realloc-and-
laziness/)

~~~
cperciva
And for small allocations, realloc can expand or shrink an allocation in-
place, because it knows where it is placing other memory allocations.

~~~
acqq
Yes. If the allocation block is 16 bytes for example, the string growing from
10 to 11, 12, 13 etc could still be on the same place.

But in which use cases are frequent reallocs actually needed, so much that you
can recognize the performance impact? I'd really like to know, as I personally
never had such problems. When the single allocations were too expensive I've
just used some kind of memory pool. For small stuff realloc is still more
expensive than just a few instructions on average when some pool is used.

~~~
cperciva
The classic example of frequent reallocs is this perl code:

    
    
        $x .= $_ while (<>);
    

I believe this has been fixed now, but perl used to realloc for each append
operation, which resulted in O(N^2) time complexity if realloc didn't operate
in-place.

~~~
acqq
But I wouldn't expect of you to assume that a "every time realloc" should be
an optimal and at the same time a portable solution in such a case?

~~~
cperciva
No, I don't. If you look at the code in question, every time I expand an
allocation I at least double it.

~~~
acqq
I'd expect that such growth seldom allows the realloc to remain in-place
(unless it's the last thing allocated before and already in a big preallocated
chunk of the allocator)? Have you observed what you then get in your program?
Which allocator is used underneath?

~~~
cperciva
I haven't looked. I use whatever allocator is in libc anyway, so it will
depend on what platform you're running on.

At worst, using realloc produces the same results as malloc/memcpy/free. At
best, it might save a memcpy. No harm in giving it that flexibility.

------
bjornsing
To me it seems more reasonable to draw the insanity line at returning NULL for
a successful zero byte allocation.

(While it's true that dereferencing a pointer to a zero byte allocation should
probably trigger undefined behavior that doesn't mean that it shouldn't be
possible to do the != NULL test to see if the allocation was successful.)

~~~
akeruu
To me, the insanity is trying to allocate zero byte

~~~
zAy0LfpBZLC8mAC
That's because you think of it as a special case, which it isn't. If you want
to store a string that's n bytes long, you need to allocate n bytes, so a
natural thing to write is, for example
dst.len=strlen(src);dst.buf=malloc(dst.len); --it's insanity if you need to
add a special case for strings of a specific length.

~~~
tbirdz
A standard C string will never be 0 bytes. There is the zero terminator at the
end of the string, so all strings will be at least 1 byte large.

~~~
carussell
The example is clearly forgoing the use of standard C strings for a char array
tagged with the length.

------
asveikau
This is way too hyperbolic. I don't know what specific "once upon a time"
period is being described. However malloc(0), realloc(NULL, x), and realloc(y,
0) has been a portability headache for as long as I have been writing C. None
of this is all that surprising.

~~~
klodolph
The behavior of realloc(NULL, x) is always equivalent to malloc(x) according
to the C standard. But yes, I never remember a time when malloc(0) or
realloc(y, 0) played nice.

~~~
asveikau
> The behavior of realloc(NULL, x) is always equivalent to malloc(x) according
> to the C standard.

AFAIK if you try to do this with even a fairly recent Microsoft libc it won't
work. I haven't checked the standards and history on this one, but keep in
mind it's been only a recent development that MS gives a crap about C99.

The idea that C89 did not mandate this makes sense to me because I remember
now-obsolete Unixes not liking this either.

[Edit: The MS documentation claims that it supports realloc(NULL, x) going
back quite a while. I know I've been bitten by it not doing that in the
current century, however...]

~~~
masklinn
> The idea that C89 did not mandate this makes sense to me because I remember
> now-obsolete Unixes not liking this either.

C89 did mandate this:

> If ptr is a null pointer, the realloc function behaves like the malloc
> function for the specified size.

C89 also mandated that realloc(ptr, 0) be equivalent to free(ptr), which was
removed in C99:

> If size is zero and ptr is not a null pointer, the object it points to is
> freed.

------
carussell
> So what happens if realloc(p, 0) fails? It's going to return NULL, since
> that's what realloc does if it fails; but should it also free the allocation
> p? The sane answer is yes

If realloc(p, 0) is a synonym for free(p), what does it mean for it to "also
free the allocation p" in the event of failure? Unpacking it, the statement
seems to be saying the same thing as, "if it fails, then what it should do is
to not fail".

~~~
cperciva
Freeing a memory allocation is guaranteed to never fail. Allocating zero bytes
can fail (with dumb C libraries). The question is "if realloc fails to
allocate zero bytes, should it still free the original allocation".

~~~
pcwalton
> Allocating zero bytes can fail (with dumb C libraries).

They might not be dumb; they might be forced into it to maintain compatibility
with buggy programs and/or configure scripts (e.g. [1]).

[1]:
[http://www.openwall.com/lists/musl/2013/01/15/1](http://www.openwall.com/lists/musl/2013/01/15/1)

~~~
cperciva
Unfortunately, as I wrote in the commit message which spawned this thread,
this just encourages people to write more buggy programs.

~~~
pcwalton
Or, more likely, it just encourages customers to abandon your alternative C
library in favor of glibc to avoid breaking configure scripts (which is what
forced musl's hand). I definitely think you're right in principle, but I can't
blame musl's author for bowing to market pressure any more than I can blame
Intel for sticking to x86 backwards compatibility.

------
kbart
I don't get it. For me it's always a simple logic: if you didn't have anything
(0 bytes) to allocate/reallocate you simply don't do that. Period. It's like
division by zero -- you always have to check if divisor is not equal to zero;
if it was -- skip division altogether.

~~~
zAy0LfpBZLC8mAC
Except the semantics of a zero-length buffer are perfectly well-defined, while
the semantics of division by zero are not defined at all in the types commonly
used in computing. A proper comparison therefore would be addition, maybe,
which is also perfectly well-defined for zero operands. Do you special-case
"zero addition" in your code?
if(a==0){if(b==0){x=0;}else{x=b;}}else{if(b==0){x=a;}else{x=a+b;}} ?

~~~
kbart
I'd argue that zero-length buffer is far from "perfectly well-defined".
Quoting C99:

 _" If the size of the space requested is zero, the behavior is
implementation- deﬁned: either a null pointer is returned, or the behavior is
as if the size were some nonzero value, except that the returned pointer shall
not be used to access an object."_

To maintain my sanity as a C programmer, I steer away from "implementation
specific behavior" as much as possible, thus I prefer checking if buffer size
is non-zero before allocating.

~~~
zAy0LfpBZLC8mAC
I am not talking about C, but about the abstract concept. C's implementation
is obviously kindof broken. Just as C would be broken if the behaviour of
"zero additions" was somehow "implementation defined". That doesn't mean that
something is fundamentally wrong with adding zero, it just means that you have
to work around a broken language.

~~~
kbart
The topic _is_ about C and I never said that such behavior is normal or
anticipated. It's one of many cases where C standard lacks concreteness and we
can only workaround such issues. In a perfect world, many things would be
different.

~~~
zAy0LfpBZLC8mAC
Well, then I might have misread your comment. I thought you meant that "it's
just illogical to allocate 0 bytes anyway, just as it doesn't make sense to
divide by 0".

------
theseoafs
I don't know if the distinction between which behaviors are "sane" and
"insane" is a good one here. Evidently the only insane choice is to rebuff the
standard in your implementation of the core C memory management functions.

------
coldpie
The REAL problem here is C's garbage way of dealing with errors. NULL is
overloaded both to be a valid pointer and an error signal. In the author's
ideal world, how would a malloc failure be communicated?

~~~
scott_s
Null is not a valid pointer, though. It's a valid pointer _value_ , but it's
undefined to dereference such a pointer. If the purpose of malloc is to return
an address to usable memory, then null it outside of that range. Values
outside of the legal range of a function are fine for error conditions.
Because of that, null seems like a reasonable return value for error on
malloc: I could not allocate memory for you, which I indicate by returning an
address that you are not supposed to dereference.

However, I do agree it would be better if there was some other kind of error
condition, preferably something like the option type from various other
languages, or just an easy way to return multiple values like in Go. But C is
not that language.

~~~
dragontamer
> Null is not a valid pointer, though.

There are embedded systems out there (that have C compilers) that have
addresses at 0 actually. 8051 C programmers know what I'm talking about.

IIRC, 0 is totally an address when inside the Linux kernel as well. What else
do you call the first block of memory on a system?

Just because in userland the Linux Kernel pages memory to a very high-numbered
region does not mean that under all circumstances "0" is an invalid address.
The entire concept of "null == 0 == invalid" is an innate falsehood, an
abstraction brought about by the malloc function (that a lot of other
libraries have decided to copy).

But really, what do YOU call the first physical byte of RAM? Most systems I
know of... from 8051 all the way to even Linux Kernel... calls it 0.

~~~
zAy0LfpBZLC8mAC
Nobody says that NULL has to be 0. C's NULL is just a pointer value that can
be compared for equality but can not be used for pointer arithmetic or be
dereferenced. Implementations commonly reserve some address (often 0) to
represent that pointer value, as that is just the most efficient way to do it
(rather than making pointers larger than the normal word size to be able to
represent the sentinel value)--also, object sizes are limited to one less than
the number of possible addresses anyway, simply because you can have objects
that are 0-sized.

~~~
avian
From the point of view of C, it seems that NULL is indeed always defined as 0.
At least according to the C FAQ:

[http://c-faq.com/null/machnon0.html](http://c-faq.com/null/machnon0.html)

It might be that the internal representation used by the compiler is
different, but from C a NULL pointer will always be equal to 0. See also:

[http://c-faq.com/null/ptrtest.html](http://c-faq.com/null/ptrtest.html)

~~~
masklinn
You're confusing a "null constant" and a null pointer. An integral 0 being
converted to a pointer at runtime is not required to be a null pointer, only
an integral 0 converted at compile time is.

