
Why does calloc exist? - wyldfire
https://vorpus.org/blog/why-does-calloc-exist/
======
_RPM

        buf = calloc(huge, huge);
        if (errno) perror("calloc failed");
        printf("calloc(huge, huge) returned: %p\n", buf);
        free(buf);
    

This has a flaw. errno doesn't magically get reset to zero. You should check
the return value of calloc, then use errno. Checking if(errno) is not the
right way to determine if there was an error.

~~~
avar

        Its [errno's] value is significant only when the return value of
        the call indicated an error (i.e., -1 from most calls; -1 or NULL
        from most library functions); a function that succeeds is allowed
        to change errno.

~~~
imron
Note the difference between "is allowed" vs "must".

If you write a program that _relies_ on this behaviour you're going to have a
hard to track down bug at some point.

~~~
takeda
That's even stronger case for not relying on errno to catch errors.

The code should be something like this:

    
    
        buf = calloc(huge, huge);
        if (!buf)
            perror("calloc failed");
        printf("calloc(huge, huge) returned: %p\n", buf);
        free(buf);

~~~
imron
Yes, my post above was agreeing that you should check the error condition of
the return value, rather than relying on errno to be cleared on success.

~~~
takeda
Ah, my bad.

------
bluefox
That's a nice alternative history fiction.

Here's an early implementation: [https://github.com/dspinellis/unix-history-
repo/blob/Researc...](https://github.com/dspinellis/unix-history-
repo/blob/Research-V7/usr/src/libc/gen/calloc.c)

~~~
LukeShu
You haven't proven it wrong.

Here's the earliest implementation in that repo (in Research UNIX V6; your
link in V7): [https://github.com/dspinellis/unix-history-
repo/blob/Researc...](https://github.com/dspinellis/unix-history-
repo/blob/Research-V6/usr/source/iolib/calloc.c)

    
    
        calloc(n, s)
        {
        return(alloc(n*s));
        }
    

There are several interesting things we learn from poking around V6 though:

\- `calloc` originated not on UNIX, but as part of Mike Lesk's "iolib", which
was written to make it easier to write C programs portable across PDP 11 UNIX,
Honeywell 6000 GCOS, and IBM 370 OS[0]. Presumably the reason calloc is the-
way-it-is is hidden in the history of the implementation for GCOS or IBM 370
OS, not UNIX. Unfortunately, I can't seem to track down a copy of Bell Labs
"Computing Science Technical Report #31", which seems to be the appropriate
reference.

\- `calloc` predates `malloc`. As you can see, there was a `malloc`-like
function called just `alloc` (though there were also several _other_ functions
named `alloc` that allocated things other than memory). (Ok, fine, since V5
the kernel's internal memory allocator happened to be named `malloc`, but it
worked differently[1]).

[0]: [https://github.com/dspinellis/unix-history-
repo/blob/Researc...](https://github.com/dspinellis/unix-history-
repo/blob/Research-V6/usr/doc/iolib/iolib) (format with `nroff -ms
usr/doc/iolib/iolib`)

[1]: [https://github.com/dspinellis/unix-history-
repo/blob/Researc...](https://github.com/dspinellis/unix-history-
repo/blob/Research-V6/usr/sys/ken/malloc.c)

~~~
ksherlock
OpenBSD added calloc overflow checking on July 29th, 2002. glibc added calloc
overflow checking on August 1, 2002. Probably not a coincidence. I'm going to
say nobody checked for overflow prior to the August 2002 security advisory.

[https://github.com/openbsd/src/commit/c7b2af4b3f7e78424f8943...](https://github.com/openbsd/src/commit/c7b2af4b3f7e78424f8943119b1397773e619e77)

[https://github.com/bminor/glibc/commit/0950889b810736fe7ad34...](https://github.com/bminor/glibc/commit/0950889b810736fe7ad340a13a5ecf76672e1a84)

[http://cert.uni-stuttgart.de/ticker/advisories/calloc.html](http://cert.uni-
stuttgart.de/ticker/advisories/calloc.html)

~~~
mnay
It is embarrassing for glibc not to check for overflow in _calloc_
implementation prior to 2002. It is not only a security flaw but also
violation of C Standards (even the first version ratified in 1989, usually
referred to as C89).

The standard reads as follows:

    
    
      void *calloc(size_t nmemb, size_t size);
    
      The calloc function allocates space for an array of nmemb objects, each of whose size is size.[...]
    

and,

    
    
      The calloc function returns either a null pointer or a pointer to the allocated space.
    

So if it cannot allocate space for _an array of nmemb objects, each of whose
size is size_ , then it has to return null pointer.

------
wyldfire
> So basically, calloc exists because it lets the memory allocator and kernel
> engage in a sneaky conspiracy to make your code faster and use less memory.
> You should let it! Don't use malloc+memset!

On the flip side, if your critical metric is latency then these tricks of
calloc's and the OS's are exactly what you try to avoid. memset() the buffer,
and if you have the privileges you should mlock() it to prevent it from being
paged out. Of course, this all presumes that it's not an ephemeral buffer to
begin with. Best to change your design to leverage a long-lived resource if
possible.

~~~
valarauca1

        Best to change your design to leverage a long-lived
        resource if possible.
    
        On the flip side, if your critical metric is latency
        then these tricks [...] are exactly what you try to avoid
    

If you keep the buffer alive as long as possible with a slab allocator, or
just smart/good memory management strategy. _How_ you acquire the buffer will
ultimately be trivial, likely dwarfed your _other_ startup tasks (reading
config, opening sockets, etc.)

~~~
rcfox
I think he was referring to the part where a calloc'd memory page will be
zeroed the first time it's used, rather than all at once at the beginning.

In a real-time system, the start-up time matters less than having predictable
response times.

~~~
valarauca1
In a realtime system you can't use virtual memory because the access times are
unpredictable.

~~~
wyldfire
I specifically avoided that word because it triggers particular deadlines that
people have in mind. If my application requires no more than X ms latency I
don't care to handwring over realtime vs soft realtime vs whatever, but it's
still critical to fit in the budget. But indeed you can get reliable low-
latency products to work on linux, with virtual memory. But like I said
pinning is a great way to keep those peaks down.

~~~
valarauca1
The division you are looking for is

 _Hard Realtime_ : Embedded system, no virtual memory/OS. Or special OS
provisions to let them run.

 _Soft Realtime_ : Responsive.

In the case you are aiming for the second. There are several million things
that'll net greater performance. We're talking about saving a matter of nano-
seconds in C/C++. How you load your config will have more effect then this.

If you want to save $1,000,000 rolling pennies a start. But there are likely
way bigger savings elsewhere, worth way more time, and less effort.

~~~
karmakaze
I didn't think how it's achieved matters to the classification which is by the
consequence of missing a deadline according to good ol' Wikipedia:

Hard – missing a deadline is a total system failure.

Firm – infrequent deadline misses are tolerable, but may degrade the system's
quality of service. The usefulness of a result is zero after its deadline.

Soft – the usefulness of a result degrades after its deadline, thereby
degrading the system's quality of service

~~~
valarauca1
Your difference between soft/firm is an arbitrary decision made by a manager,
not really a hard/fast _If I can 't meet this deadline my system is a total
technical failure_.

You've created a false dichotomy.

------
jblow
Sorry, but this is just goofy and bad.

If you depend on copy-on-write functionality, then you need to use an API that
is specced to guarantee copy-on-write functionality. If that means you use an
#ifdef per platform and do OS-specific stuff, then that is what you do.

Anything else is amateur hour.

If copy-on-write is a desirable feature, then as the API creator, your job is
to expose this functionality in the clearest and simplest way possible, not to
hack it in obscurely via the implementation details of some random routine.
(And then surprise people who _didn 't_ expect copy-on-write with the
associated performance penalties.)

This is why we can't have nice things.

~~~
mannykannot
>Anything else is amateur hour.

Such as writing the optimizing compilers that make it feasible for you to use
C at all?

~~~
jblow
I don't understand your reply. How is this not a non sequitur?

~~~
sowbug
The output of a modern C compiler is unpredictable in terms of performance.
Yet mannykannot suspects you still use such tools. Please explain your
inconsistency.

~~~
et1337
Even if this weren't a fallacious argument, he's actually in the process of
replacing Thekla's C/C++ workflow with a custom language called Jai.

~~~
megabochen
Isn't Jai piggybacking on C?

~~~
et1337
He started with two backends. One generates bytecode for an internal
interpreter, and this is still needed because all Jai code can be run at
compile time. The other backend generates C code, but it's a temporary
measure. He just added the LLVM backend:
[https://www.youtube.com/watch?v=HLk4eiGUic8](https://www.youtube.com/watch?v=HLk4eiGUic8)

~~~
megabochen
oh thanks, I didn't see the last one :)

------
Animats
The real reason "calloc" exists was that it was really easy to hit 16-bit
overflow back in the PDP-11 days.

~~~
LukeShu
Historically, not quite true.

No version of Research UNIX V1 through V7, nor any of BSD 1, 2, 3, 4, or 4.4
did overflow checking. They all just did `m * n` or `m *= n`.

~~~
jcranmer
If you look through the history of CVEs, you'll find that pretty much every
implementation of calloc or a calloc-like function starts with m * n and ends
up only changing after someone points out the security flaw.

------
ben_bai
> Plus, if we wanted to, we could certainly write our own wrapper for malloc
> that took two arguments and multiplied them together with overflow checking.
> And in fact if we want an overflow-safe version of realloc, or if we don't
> want the memory to be zero-initialized, then... we still have to do that.

Like reallocarray(3) does?

    
    
        buf = malloc(x * y);
        // becomes
        buf = reallocarray(NULL, x, y);
        
        newbuf = realloc(buf, (x * y));
        // becomes
        newbuf = reallocarray(buf, x, y);

~~~
syntheticnature
reallocarray(3) looks nifty, but until it's available on a wider, ideally more
standard-driven basis than just OpenBSD and FreeBSD, it's likely to not see
wide uptake.

~~~
brynet
It's already gaining adoption outside of the BSDs. OS X/iOS seem to have it as
part of their libmalloc. Android Bionic libc has it as part of the code they
sync from upstream OpenBSD.

Many open source projects include their own, or simply bundle the OpenBSD
implementation:

    
    
      * mandoc
      * flex
      * unbound and nsd
      * tor
      * tmux
      * libbsd
      * libressl
      * xorg-xserver
      * ...

The list only continues to grow, several more examples to be found on GitHub.

~~~
achivetta
There's code for it in Darwin's libmalloc, but it's not exposed as API.

reallocarray() has some difficulties as an interface, mostly inherited from
realloc(). I'm a bigger fan of reallocarr(), but that's NetBSD only. We (the
operating systems community) need to find a consensus here, but I'm not
convinced that reallocarray() is that consensus yet.

~~~
ben_bai
reallocarray is a very thin layer around realloc. No surprises there. Simple
find and replace to bring overflow checking into your code.

reallocarr changes the semantics. Equally easy in new pieces of code and a
little harder when converting existing code.

------
nicolast
And then there's of course when calloc returns non-zeroed memory once in a
while, which causes... 'interesting' bugs.

[https://bugzilla.redhat.com/show_bug.cgi?id=1293976](https://bugzilla.redhat.com/show_bug.cgi?id=1293976)
[http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2015-5229](http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2015-5229)

------
AceJohnny2
> _And at least we aren 't trashing the cache hierarchy up front – if we delay
> the zero'ing until we were going to write to the pages anyway, then that
> means both writes happen at the same time, so we only have to pay one set of
> TLB / L2 cache / etc. misses._

Ooh, nice one. My first impression was that calloc was just lazy-allocating,
which is fine in most cases but when you want precise control over timing,
maybe you want to be _sure_ that memory is zero'd at allocating time rather
than pay the cost unexpectedly at use time.

But the cache-awareness makes that a moot point. You'd be paying double cache-
eviction costs if you were clearing that memory up front: once at clearing
time, and once at actual-writing time. This implementation of calloc avoids
that.

------
Waterluvian
Not sure how I feel about, "oh everyone's looking this way! Let me get
political"

~~~
tdeck
Really felt like a bait and switch to click that link and get an angry rant.

------
Manishearth
I've always been surprised that memset is usually just a nonmagical for loop.
I used to expect that the OS does things to magically make it faster (running
lazily, etc).

~~~
caf
CPUs are pretty great at running nonmagical for loops, so you'd have be
zeroing a pretty giant block of memory before it made sense to get the OS
involved at all.

~~~
Manishearth
Of course :) But the OS could get involved for larger blocks of memory.

Also, I wonder if zeroing large chunks of memory would be faster to do in
kernel space using real addresses. You can avoid the multiple real memory
lookups involved in a single virtual write.

(Of course, we already avoid those often, but it _could_ be useful to avoid
entirely. Not sure what the tradeoffs are here)

~~~
caf
Paging is still enabled in kernel mode, the kernel uses virtual addresses.

(The kernel's linear mapping of physical memory _can_ take advantage of huge
pages though, which means that there might be one or two less levels of page
tables involved with those addresses). TLB misses aren't significant if you're
bulk-writing to a block of memory anyway, you'll max out the bandwidth of the
memory without that being an issue.

------
Const-me
Let’s see what happens after the allocation.

With malloc + memset, the OS will likely allocate that memory in huge pages,
on PC that would be 2-4MB / page depending on the architecture,
[https://en.wikipedia.org/wiki/Page_(computer_memory)#Huge_pa...](https://en.wikipedia.org/wiki/Page_\(computer_memory\)#Huge_pages)

If I calloc then write, the OS can’t give me huge pages because of that copy
on write thing. Instead, the OS will gradually give me the memory in tiny 4kb
pages. For large buffers you should expect TLB cache misses, therefore slowing
down all operations with that memory.

~~~
mikeash
There's no reason the OS can't use huge pages for a calloc. In fact, "give me
some zeroed memory" tends to be the _only_ interface exposed by the kernel,
since security requires zeroing memory before handing it out to userspace
anyway.

~~~
Const-me
This article _creates_ a reason why the OS might not be able to use huge pages
for a calloc.

If substantial count of people will read this article, believe what’s written,
and [re]design their software under the wrong assumption calloc returns sparse
copy-on-write memory buffer at no performance cost — the OSes will no longer
be able to use huge pages for a calloc. Because doing that would dramatically
increase physical memory usage for such software.

------
drfuchs
Originally, calloc was the function Unix programmers were expected to use by
default, since it avoids any sort of intermittent bugs due to your forgetting
to initialize some field in the data structure you're allocating. But clearing
the memory to all zeros took precious time, so if you were an advanced
programmer, and knew for a fact that you were going to fill it all in
yourself, you could optimize by calling malloc.

------
MaulingMonkey
It's harder to forget to multiply by sizeof(T) when calloc-ing as well.

~~~
topkekz
you can do sizeof(T[n]) instead of sizeof(T) * n

------
IgorPartola
I don't get it. The two behaviors are completely orthogonal. Why can't I have
a malloc() that does lazy copy-on-write for large arrays and why can't I have
an error checking malloc() and why can't I have a calloc() that allocates the
memory up front and doesn't zero it out? I get the "it's historic" argument,
but this seems like a silly distinction. Sounds like what you want to do
practically is basically just make your malloc() wrap a calloc() with size 1,
and stop explicitly memset()ing. Or just introduce your own functions:

    
    
        moarmem(n) // malloc(n)
        moarmemslower(n) // p = malloc(n); memset(p, 0);
        moarmemfaster(n) // calloc(n, 1)
        evenmoarmem(p, n); // realloc()
        fuggetaboutit(p) // free()

~~~
kr7
> Why can't I have a malloc() that does lazy copy-on-write for large array

malloc will generally do lazy copy-on-write above a certain limit; it was
128kb for glibc last time I checked.

> why can't I have an error checking malloc()

reallocarray is becoming the de-facto standard for that.

> why can't I have a calloc() that allocates the memory up front and doesn't
> zero it out?

Er, malloc?

------
kamouth
"(I mean, let's be honest: if we really cared about security we wouldn't be
writing in C.) "

Why so ?

~~~
BugsBunnySan
Should be the other way around, shouldn't it: "Only if you really care about
security should you be allowed to write in C" :)

(i.e., don't use a professional-grade band-saw, if you're not a professional)

------
duaneb
This is a great example of why _alloc is an abstraction over virtual memory.

What this doesn't express is that dealing with page allocation directly can be
quite annoying to get correct cross platform. You generally don't want to do
that unless a) you're optimizing past the "knuth level" and know you need to
for performance (e.g. mapping files to memory), b) you're writing something
where you run dynamic code (JIT or dynamic recompilation) or c) you're writing
your own allocator and/or using page faults to get some functionality, ala
Go's stop-the-world hack.

Basically, don't bypass _alloc unless you have a reason.

------
notacoward
I always thought it was because of padding. An array of M structures each N
bytes long could require more than M*N bytes (certainly has on some
architectures I've worked with). But I guess that's not it after all.

~~~
mikeash
C accounts for padding in the size of the individual type. By the time you do
sizeof, it's already rounded up to where you can just do M*N. For example:

    
    
        struct S {
            long a;
            char b;
        };
    

On my computer (64-bit Mac), sizeof(struct S) is 16, due to 7 bytes of padding
after b. Since the compiler handles the padding, that means calloc doesn't
have to.

~~~
notacoward
_Today 's_ compilers. How about the compilers when calloc was first defined?
As I said, I've worked on compilers that behaved differently, either always or
according to various options. The computing universe has actually become less
diverse in some ways than it used to be, so we should be careful of drawing
conclusions about old interfaces based on today's monoculture. Is it really
_impossible_ for people here to imagine that some of the dozens of platforms
that had their own compilers and C libraries chose to do the rounding up in
the latter? It would actually be a pretty reasonable choice, for different
microarchitectures capable of running the same instruction set and binaries
but with different cache subsystems. That way you could make the decision at
run time instead of compile time. Many of the early RISC machines did even
weirder tricks than that to wring out the last bit of performance on multiple
generations without having to recompile.

------
rcthompson
I you calloc some memory and then the first thing you do is write to it, can
the compiler optimize away the initial write of zeros since they will just be
overwritten?

~~~
Serplat
For smaller memory allocations that don't go directly to the OS, I suppose
it's theoretically possible (though I'm not sure any compilers do it). For the
larger allocations that the author mentions that go directly to the OS to
fulfill, however, the compiler wouldn't be able to optimize this away. In that
case, the zero'ing occurs in the kernel, which is something that the compiler
has no control over.

The zero'ing is done significantly for security reasons anyway. If a program
could somehow disable that feature and get leftover memory from the kernel it
could very easily contain password, secret keys, and other important bits of
data that you wouldn't want random programs on your computer to have access
to.

------
jedisct1
Good operating systems also provide `reallocarray()`.

~~~
astrobe_
It's in the standard library, not in the operating system.

~~~
binarycrusader
The standard library is usually delivered as part of the operating system on
*nix-like platforms.

~~~
masklinn
In fact the standard library is usually part of the operating system in the
sense that it's the interface to the OS, on both _nix-line and non-_ nix like
systems. Linux is the exception (in that raw syscalls are officially
supported) not the rule.

~~~
binarycrusader
Yes, exactly. This is especially true on Darwin and Solaris.

------
wfunction
No, the 2 GB array should still take a quota of 2 GB. It just wouldn't take 2
GB's worth of time to initialize. The overcommit "feature" in Linux is a bug
that crashes C programs in places that violate the language's guarantees (such
as when a write occurs to a location in memory that was allocated correctly).

------
dimman
There are some unfortunate statements in there (if taken out of context) that
requires you to read the whole thing for it to make sense. Like "...but most
of the array is still zeros, so it isn't actually taking up any memory..."
which is a bit ambigious if not read in the complete context, then it makes
sense.

------
smegel
> But calloc lives inside the memory allocator, so it knows whether the memory
> it's returning is fresh from the operating system, and if it is then it
> skips calling memset. And this is why calloc has to be built into the
> standard library, and you can't fake it yourself.

Err...mmap(2)?

~~~
theseoafs
You can't fake it yourself from within the C stdlib is what the author means.

------
MichaelBurge
I suppose another alternative would be for memset() to check if the page is
already mapped to the zero page, and to do nothing if it is. There are some
bitset-related data structures that should make that pretty efficient.

------
ericfrederich
Somebody needs to go update all the StackOverflow answers saying that malloc
is faster. According to this, calloc seem to always be faster with several
other benefits as well.

~~~
bcpermafrost
It is not as you say.

The article suggests that Malloc + Memset is slower than Calloc.

Malloc will be faster depending on your use case. If your plan is to
eventually call memset. Then just use Calloc, otherwise malloc will be faster
all the time.

~~~
Twirrim
n00b question: Why would you not memset? I would assume you'd want to start
with all zeroed memory in almost all cases.

~~~
lerpa
Not always, you could be planning on filling the data with something else.
Very common to do that.

~~~
elua
I.e.: reading from a file or copying memory?

------
angusp
> I mean, let's be honest: if we really cared about security we wouldn't be
> writing in C.

How so? C is low level, so to be secure you must be fully aware of the
behaviours and side effects of what you're doing. In another, perhaps higher
level language, sure, there may be less of these gotchas but to be properly
secure you need a similar amount of knowledge about background behaviour.

------
edblarney
Why do so many people disagree on something that should be nearly empirical?

