
Reallocarray() in OpenBSD: Integer Overflow Detection for Free - nkurz
http://lteo.net/blog/2014/10/28/reallocarray-in-openbsd-integer-overflow-detection-for-free/
======
dezgeg
Unfortunately, having this kind of function doesn't fix the real problem,
which is those certain kinds of programmers who won't let a single machine
cycle be wasted.

Here's just a completely random example from the Linux kernel:
[https://lkml.org/lkml/2014/8/19/194](https://lkml.org/lkml/2014/8/19/194)

    
    
        > You are implying that such an overflow could arise in this code.
        > This is false.
        [...]
        > bNumEndpoints is an unsigned 8-bit integer.
    

So instead of changing the code code to be obviously correct (wrt.
multiplication overflow), we'll instead keep the old code whose correctness
becomes clear only after the programmer has looked up the types of variables
involved.

This mindset is exactly why vulnerabilities in C code are still common. It
really needs to stop.

~~~
kazinator
Overflows can crop up all over the place in C: you have to instrument every
single +, - and * operator instance with overflow checks, each one of which
needs a sane recovery strategy.

Some overflows indicate internal errors that should panic the kernel. Others
are configuration or resource issues that can bubble up some errno value to
user space. Some can have a fallback strategy: this calculation overflowed,
try another one. Some overflows can be handled by "saturation" (a.k.a
clamping): common in digital signal processing and supported natively in some
DSP machine languages.

In other words, a silly patch that fixes a non-problem in one tiny place in a
huge codebase doesn't accomplish anything. It's a big, wide-reaching task.

~~~
dezgeg
> In other words, a silly patch that fixes a non-problem in one tiny place in
> a huge codebase doesn't accomplish anything.

I don't understand this line of thinking at all.

"reallocarray()/kmalloc_array() doesn't fix all the possible integer overflow
issues in C, so why bother using it at all."

Several people in this discussion have mentioned exact vulnerabilities that
would have been prevented by reallocarray():

meowface: "Here is one of many examples of an integer overflow resulting in an
improper size being passed to a memory allocation function further leading to
remote code execution: [https://docs.google.com/document/d/1tHElG04AJR5OR2Ex-
m_Jsmc8...](https://docs.google.com/document/d/1tHElG04AJR5OR2Ex-
m_Jsmc8S5fAbRB3s4RmTG_PFnw/edit) "

tedunangst: "You've never heard of the one openssh exploit?"

~~~
kazinator
I took realloc_array and applied it to a program I'm working on which contains
a bunch of realloc calls. It was completely useless; I had to invent a
different interface.

It was useless because several calculations take place in every instance to
produce the size, whereas realloc_array only checks one multiplication.

------
alexlarsson
Or you could use glib which has this exact function (g_realloc_n) since a long
time.

~~~
tedunangst
I think you mean g_try_realloc_n.

~~~
alexlarsson
True

------
foxhill
am i the only one that thinks this is a total non-issue?

if you're going to overflow a size_t, you're already in trouble. realloc is
used for.. um, reallocating data. it's probably grown many times before you
come close to exhausting 32 bits of size (assuming i386). if you are coming
close to 4gb allocations, you're already likely running out memory. as such,
you're program is probably going to be killed for using too much memory before
you are in any risk of overflowing size_t.

similarly, in a 64bit runtime.. well, you will certainly be killed by the
kernel before you come close to overflowing size_t.

if you're reallocing based on untrusted user input.. well, again, you're
already in trouble. "fixing" realloc will only make problems occur earlier
(which i suppose is good).

EDIT: what i really should have said, is "make problems occur somewhere else".

finally, returning NULL for failure.. well, malloc doesn't fail. most (sane)
codes don't check the return value of malloc because it doesn't make sense to.
the kernel doesn't allocate memory until first access, at which point you're
going to have trouble regardless.

colour me sceptical, but this seems like a fix to a non-existent problem.

~~~
Sanddancer
The Linux kernel doesn't allocate memory until first access. Most other Unixes
actually allocate when you ask them to allocate. So on BSDs, etc, malloc can
and will fail when you ask the computer to allocate more memory than it can
actually access. Overcommitting RAM is a rather silly "optimization" and tends
to cause more trouble than it actually solves.

~~~
Dylan16807
It's also rather silly to need 50GB of never-used swap space to back up a
fork-heavy program.

------
kazinator
The multiplication factor is often some small constant, like 2 in the case of
an array grown by doubling. It's worth making reallocarray an inline function
to handle a few constant values:

    
    
        inline void *reallocarray(void *old, size_t x, size_t y)
        {
           switch (y) {
           case 1:
             return realloc(old, x);
             ...
           case 2:
             // drastically simplified overflow check
             ...
    
           default:
             // perhaps a switch (x) with mirror cases here
             return realloc_array_general(old, x, y);
           }
        }
    

The 1 case occurs when programmers are following some coding convention
whereby malloc calls are replaced with this "single allocator function to bind
them all": char *buf = reallocarray(size, 1);

When the function is inlined, the non-matching cases turn to dead code since
the switch expression is constant.

~~~
kazinator
I just looked at some of my recent code where I'm using realloc. It's obvious
that this approach is insufficient, because it assumes there is only one
multiplication. I have situations where an array is being resized. The size is
expressed in terms of elements, rather than bytes. So two multiplications are
going on: the size is increased first, and then in the realloc call, the
argument expression has a multiplication by the element size also.

An overflow checking realloc won't help if the first multiplication oveflows.

If I make this version instead:

    
    
        void *realloc_array(void *old,
                            size_t existing_size,
                            size_t resize_factor,
                            size_t elsize);
    

that's no good either because I want the new size also if everything went
well. I need something like:

    
    
        void *realloc_array(void *old,
                            size_t existing_size,
                            size_t resize_factor,
                            size_t elsize,
                            size_t *p_newsize_out);
    

Okay, now we're talking. Resize my vector old, which contains existing_size
elements, by a factor of resize_factor. The individual elements have size
elsize. If all goes well (non-NULL is returned), give me the new size (in
elements, not bytes) in the location *p_newsize_out.

~~~
kazinator
After some thought and experimentation, I have arraived at this interface:

    
    
        void *resize_array(void *old,
                           size_t existing_size,
                           size_t new_size
                           size_t resize_factor,
                           size_t elsize);
    

The client code computes the new size, while retaining the old. If these are
supposed to be related by an integer factor, then this is passed as
resize_factor. If they are not related that way, then resize_factor is passed
in as zero. The elsize parameter must be nonzero, in any case.

If a nonzero resize_factor is supplied, then the calculation new_size =
existing_size * resize_factor is checked for overflow, otherwise the zero
value of resize_factor indicates that this check is not applicable; however,
the function still at least validates that new_size > existing_size.

The calculation bytes = new_size * elsize is always checked for overflow.

------
PythonicAlpha
Sure a good thing, but for me also a good example, why I changed from C/C++ to
a scripting language.

In application development, you seldom have the need for maximum speed in all
your coding. And always worrying about such things as what is addressed by
this small enhancement, is a pain.

In most scripting languages, such details are just falling away.

I still program in C/C++ -- but just for very speed-intensive "hot-spot"
functions, in form of Python enhancement modules. And in those, I have the
time and patience to look for all these implementation details. All the rest,
I do in Python.

~~~
bstamour
You won't need this enhancement in C++ if you just use `std::vector` or
`std::array`.

~~~
PythonicAlpha
I was talking about Python enhancement modules -- and no, I won't consider C++
a good replacement for Python.

~~~
bstamour
> In application development, you seldom have the need for maximum speed in
> all your coding. And always worrying about such things as what is addressed
> by this small enhancement, is a pain.

That's what my comment was addressing: In C++ with vector, map, etc, you don't
need to remember to safely realloc manually, because the containers do it for
you automatically.

~~~
PythonicAlpha
Might be, but in C++ there are still enough problems lingering, that using
Python is worthwhile. Many workarounds have been forged in C++, but still
there are so many problems lingering and to use C++ right, you really must be
a master of a language that is 3 times as difficult to learn than most other
programming languages.

------
kazinator
Issue: the function should be treating nmemb == 0 or size == 0 as errors
rather than a feature.

The way it is, you can do

    
    
        /* obfuscated way of doing free(ptr); */
        realloc_array(ptr, 0, whatever); 
    

There is no need to carry on the tradition that realloc serves as a complete
memory allocator in a single function.

The program could be pass nmemb or size as zero due to a bug, rather than the
intent to free.

Or as a compromise, zero could be accepted for nmemb (zero sized array), but
not for size (the base elements must have nonzero sizes).

------
SEJeff
I wonder if GNU libc will ever get something like this.

~~~
vyqfi
I don't get the fuzz. This is like the 3rd or 4th submission about this
function. And all it does is testing that

> ((SIZE_MAX / nmemb) >= size)

before doing the size_t multiplication. That's no rocket science. Anyone who's
worried about his size_ts overflowing should be able to come up with this.

~~~
quotemstr
I don't understand why more people don't define local helper functions. I've
worked on FOSS codebases with _hundreds_ of invocations of strncpy, each
carefully followed with an additional NUL-termination check to work around
strncpy's well-known problem. Why wouldn't you just write a little wrapper?

~~~
PythonicAlpha
In my opinion the C library was one of C's most best innovations. But there
are also some dark spots. strncpy is one of it -- and one of the things, where
the C library is gravely outdated today.

------
psgbg
This is an amazing post.

So, if this is available in libbsd it means that is also available for linux
too, I'm right?

------
Demiurge
I have only had to use C for a small amount of work over the years, so I have
a naive question. This seems like something very essential, why is it that
this function took more than 30 years to add?

~~~
tedunangst
For a long time the prevailing attitude was "don't make mistakes." Then after
it became apparent some problems were both common and serious, people argued
(and continue to do so) that if a function doesn't solve _every_ problem, it's
not worth adding. Whataboutism.

------
tribaal
That's very neat, and also a surprisingly concise bit of code!

