Unfortunately, having this kind of function doesn't fix the real problem, which is those certain kinds of programmers who won't let a single machine cycle be wasted.
> You are implying that such an overflow could arise in this code.
> This is false.
[...]
> bNumEndpoints is an unsigned 8-bit integer.
So instead of changing the code code to be obviously correct (wrt. multiplication overflow), we'll instead keep the old code whose correctness becomes clear only after the programmer has looked up the types of variables involved.
This mindset is exactly why vulnerabilities in C code are still common. It really needs to stop.
One of the main criticisms I hear about OpenBSD is that it's performance isn't as good as Linux. I've looked at a couple articles quoting benchmark comparisons, so I'll assume for the sake of argument that it's true.
The thing is that I don't care. I'm still an OpenBSD advocate and run it whenever I get the chance. I don't actually care about performance at that level. What matters more to me is consistency, and good man pages.
I imagine more people are like me, but the thing about performance measurements is they're quantifiable. They can be easily measured. We all probably over optimize to some degree, whether we're C developers or not.
There's just something satisfying about getting a quantifiably better score. This is why we get code that sacrifices almost everything for runtime speed.
The slowness isn't a generally impacted much by stuff like this. It's more things like OpenBSD's poor multicore performance that are the issue. Just compare the performance of FreeBSD's multithreaded PF to OpenBSD's singlethreaded PF. Still, none of this matters, because there's one major guarantee that the OpenBSD devs try to fulfil: correctness.
I suppose there are probably large groups of people who care about multicore performance on the metal, but my dominant use case is herds of single core VMs, each doing one thing (well!). In this case, host CPU and memory consumption matter, and OpenBSD is very hard to beat in this metric. My OpenBSD VMs normally consume 1/4 of the CPU time and run happily in half the RAM as my FreeBSD, Linux and Windows VMs. I don't know if there is a significant performance difference on a single CPU, but I certainly don't have any complaints about performance.
The thing is, the "slowness" in question is different every time someone wants to talk about it. Because people don't benchmark it, don't actually know what is slower and by how much, and just go on the "common knowledge" that openbsd is slow. The openbsd is slow meme was going strong in the 1990s, when it didn't support SMP at all and 99% of the people talking about it were using a single CPU.
It's different because the slowness is across the board.
FreeBSD and, especially, Linux have optimizations all over the place. A call to time(2) or gettimeofday(2) is hundreds of times faster in Linux than on OpenBSD. In most scenarios the difference doesn't matter. But it does matter in, e.g., an asynchronous network service with lots of events with timeouts being constantly registered.
I prefer OpenBSD, and I like that they focus on correctness, bugs, and more high-level features. I run all my personal services on OpenBSD. But commercially and at scale I run Linux, and treat each box as a throw-away in terms of security.
How does that make it not a bug? They've repeatedly responded to the "openbsd is slow" meme by pointing out that they do not have the time to benchmark anything, and if people do find slow things they should submit bug reports so they can be fixed. Nobody has ever submitted such a bug report.
I mentioned it on the mailing-list a long time ago, when I was more naive about these things. They said I should submit a patch. However, adding virtual syscalls is a non-trivial undertaking, and I'm not experienced enough with kernel internals, which will require wide-ranging changes, including adding special mapping support, hacking the program loader, and tweaking the clock subsystem.
Plus you also have to hack userspace. Most importantly you need to add VDSO support to the linker. Refactoring time(2) would be easy, but for higher granularity clocks--gettimeofday(2) and clock_gettime(2)--you need to make careful use of RDTSC and similar instructions for each architecture. _And_ you need to be able to test it well because of synchronization issues on older platforms. On x86, for example, RDTSC is synchronized, but for a long time it varied across cores, and when Intel and AMD fixed that there was a still a small product window where it could vary across CPU packages.
And the part where the developers explicitly say being slower than necessary is a bug and should be reported as such means it is a bug. This honestly is not complicated.
My favorite thing with the OpenBSD folks isn't that they add APIs that avoid classes of security bugs but rather that they then ruthlessly update their code base to rely on the safer version.
Well, you might waste a cycle on the first run (or any-run after the branch has been flushed from the branch prediction table), as well as wasting cycles elsewhere from clobbering the table.
One solution to both of these problems I have thought about, but do not work at a low enough level to know if it is already a thing, is an opcode for unlikely branches. Sementiclly, this would behave like a normal jmp instruction, but the CPU branch predictor would always assume that the jump is not taken, and not bother keeping track of it.
In this particular case, it would also involve a way of signalling to the compiler (or the compiler inferring), but for loops it could be usefull.
Of course, in this case it is also possible that the compiler can infer that the branch can never be taken and remove it, so you really do not loose anything.
> One solution to both of these problems I have thought about, but do not work at a low enough level to know if it is already a thing, is an opcode for unlikely branches.
x86 actually does have jump instructions with branch prediction hints, but surprisingly modern CPUs will completely ignore those hints. Perhaps they never really caused large enough performance improvements to be worthwhile, or something.
> In this particular case, it would also involve a way of signalling to the compiler (or the compiler inferring), [...]
I suspect nobody playing with the USB code would have to look up the type of bNumEndpoints. The `b' prefix tells you it's a byte (this is a common thing in the USB spec), and you'll probably know it's a byte anyway due to familiarity...
Another way of looking at the matter is that you shouldn't change code that's already correct.
> I suspect nobody playing with the USB code would have to look up the type of bNumEndpoints.
I could very well imagine a security auditor (who is not an USB expert) reading this code.
> Another way of looking at the matter is that you shouldn't change code that's already correct.
I personally find this kind of attitude extremely worrying. Perceptions about what is bad/good code or secure/insecure code have definitely changed since the first days of Unix. We shouldn't let code slip through the cracks just because it's old.
Overflows can crop up all over the place in C: you have to instrument every single +, - and * operator instance with overflow checks, each one of which needs a sane recovery strategy.
Some overflows indicate internal errors that should panic the kernel. Others are configuration or resource issues that can bubble up some errno value to user space. Some can have a fallback strategy: this calculation overflowed, try another one. Some overflows can be handled by "saturation" (a.k.a clamping): common in digital signal processing and supported natively in some DSP machine languages.
In other words, a silly patch that fixes a non-problem in one tiny place in a huge codebase doesn't accomplish anything. It's a big, wide-reaching task.
You don't have to instrument every single arithmetic operation. Multiplicative overflow can only happens if the value of one or the other operand is >= 2^(N/1), where N is the number of value bits.
On 64-bit machines, this means if you use 32-bit values and properly promote them, or one 32-bit value and one 64-bit value, you don't have to instrument.
Also, often times the better strategy is to design your code to be immune to overflow. Garbage-in is garbage-out, so if somebody gives you garbage then they can expect to get garbage out. All that matters is that you don't allow that garbage to taint the state of your program. I usually use unsigned arithmetic, because then I don't need to worry about undefined behavior in C. And for general bounds checking tasks my only condition is whether something is >= object-size.
Some of us rigorously analyze our code for overflow problems.
Those of you who don't are like the programmers from 20 years ago who thought that it was too tedious to do bounds checking on buffers, and that because of all the existing code it wouldn't improve security.
Well it did matter. Like with buffer overflows, it's a collective action problem. The dilemma you point out doesn't exist if you stop being so resistant to change.
Your comments basically state the same thing: "we avoid these problems in C by conscientiously choosing the data representations and defending against the problems while creating and implementing the original design" which is true. But this is different from going back into a large and complex C program whose design and implementation neglected the issues.
Fixing some realloc calls is possible only because the function has an error return code, and so when the overflow occurs in the multiplication, it can "piggy back" on that null return. The hope is that the caller is already handling null properly and so everything is cool. (Is it? Was a test case ever executed in which that particular realloc call returned null?)
In general, multiplications in C take place in situations where there is no expectation of an error, and so no error handling strategy in place in the surrounding code. It may be difficult to introduce it, and even more difficult to test all the cases.
> I usually use unsigned arithmetic, because then I don't need to worry about undefined behavior in C.
That is wrongheaded, I'm afraid. The unsigned types behave poorly under arithmetic, because they have a large discontinuity immediately to the left of zero. And because their wrapping behavior is well-defined, it basically means that unintended wrapping is not diagnosable as an error (without generating reams of false positives).
It is unfortunate that size_t is an unsigned type. This is for historic reasons. When C ran on machines where size_t was 16 bit, making it signed would have meant limiting the size of an allocation request to 32767, rather than allowing up to 65535. That would have hurt, and induced people into workarounds such as writing their own allocators. (Which, ironically, they do anyway, to this day.)
Unsigned types are best used for things like bit fields, and simulating the arithmetic of particular machines. They are also useful for doing clock arithmetic on a wrapping interrupt timer register. (This is best done with macros or functions that safely wrap it, like "time_is_before(t1, t2)".)
"unsigned char" is useful, but note that values of this type undergo default promotion (on most compilers, by and large) to a signed value of type int.
Signed overflow is undefined in C. Period. That means that detecting overflow for signed types is really messy.
For addition one method would be to subtract the maximum possible positive value. The only way to do that is to use a pre-defined macro. Which means the logic for your check is divorced from the actual operation--in other words, if the type of the operand changes, your check becomes _silently_ worthless.
The alternative exhibits at least _two_ bad security practices: 1) an independent constraint on input divorced from the risky operation is a recipe for bugs because you have no warning when the conditions change such that your input verification fails to properly constrain the values; and more generally, 2) relying on assumptions about a type, rather than relying on the actual behavior of a type or information derived directly from the type, is especially bad form in C because of its loose typing.
I think I'll stick with my strategy, which has worked out well, doesn't rely on undefined behavior, and is less susceptible to bit rot.
Regarding your objection of making changes to existing code, I would just say that we shouldn't let the perfect be the enemy of the good. I also like tedunangst's description of that predisposition: whataboutism.
I took realloc_array and applied it to a program I'm working on which contains a bunch of realloc calls. It was completely useless; I had to invent a different interface.
It was useless because several calculations take place in every instance to produce the size, whereas realloc_array only checks one multiplication.
The discussion has changed; it is about the pointless patch to some USB code where an 8 bit endpoint count is multiplied by the size of some structure.
am i the only one that thinks this is a total non-issue?
if you're going to overflow a size_t, you're already in trouble. realloc is used for.. um, reallocating data. it's probably grown many times before you come close to exhausting 32 bits of size (assuming i386). if you are coming close to 4gb allocations, you're already likely running out memory. as such, you're program is probably going to be killed for using too much memory before you are in any risk of overflowing size_t.
similarly, in a 64bit runtime.. well, you will certainly be killed by the kernel before you come close to overflowing size_t.
if you're reallocing based on untrusted user input.. well, again, you're already in trouble. "fixing" realloc will only make problems occur earlier (which i suppose is good).
EDIT: what i really should have said, is "make problems occur somewhere else".
finally, returning NULL for failure.. well, malloc doesn't fail. most (sane) codes don't check the return value of malloc because it doesn't make sense to. the kernel doesn't allocate memory until first access, at which point you're going to have trouble regardless.
colour me sceptical, but this seems like a fix to a non-existent problem.
1) malloc absolutely returns NULL on Linux, even with overcommit. Using setrlimit (which properly sandboxed daemons should be doing), it's trivial to get malloc to return NULL.
2) The issue isn't whether malloc returns NULL. Technically speaking, unsigned arithmetic doesn't even "overflow", which in C is a term meaning the result is undefined. It returns the value modulo 2^N. In other words, in the vast majority of situations malloc will succeed, but return a buffer smaller than you requested.
3) According to your logic, people shouldn't check object boundaries, either. Many buffer overflow exploits utilize arithmetic overflow, because usually the memory you're interested is nowhere near the object you're overflowing. And we know these overflows happen and can be triggered even when the application isn't using very much memory.
People with your opinions shouldn't be allowed anywhere near C code.
>"fixing" realloc will only make problems occur earlier (which i suppose is good).
Yes, you should certainly suppose it's good, because it means the difference between an early out-of-memory error and a serious security vulnerability that can result in arbitrary code execution, like a buffer overflow.
That is the point of wrappers like these: to improve security.
>if you're reallocing based on untrusted user input.. well, again, you're already in trouble.
You are in trouble, but the trouble will be considerably less severe if the untrusted input hits this function instead of hitting malloc or realloc.
> Yes, you should certainly suppose it's good, because it means the difference between an early out-of-memory error and a serious security vulnerability that can result in arbitrary code execution, like a buffer overflow.
my point is that this scenario is so improbable that it's not worth worrying about.
> You are in trouble, but the trouble will be considerably less severe if the untrusted input hits this function instead of hitting malloc or realloc
heartbleed, SQL injection, and hundreds of other vulnerabilities are more likely to occur from untrusted user input. if you overflow a realloc, you don't know where the destination address is. you're far away from the stack, too. there is very little (if any) risk. i have never come across a realloc based exploit.
there is no need to apologise. if i am wrong, i am wrong.
i must say, this is the first (or second..) time i've heard of such an attack. and, whilst i'm not saying you're incorrect, i will say this - if you take user input and act upon it without validation, you're going to have a bad time. realloc does exactly what it says it will do.
is this a fault of realloc, or the programmer that wrote that code? my opinion is that it's a fault of the programmer. and i can already see that others disagree with that.
Yes, it is programmer error, but the key takeaway is that it's pretty damn easy to make a subtle yet critical error when writing in C or C++. Even if you're a developer at Google and have been writing C/C++ for many years.
Tools and wrappers like these try to mitigate the risk of these errors.
You appear to be vehemently denying the existence of a class of bugs that has plagued programs for longer than you've been programming. Perhaps reconsider how much experience you actually have.
even though, prior to this thread, i'd never heard of 2 bugs exploiting this issue, i'm not denying their existence at all. only their likelihood of occurring.
10 years since i started programming, 7 of which i consider good enough to be "experience", is the answer to your statement. less than many here, but i'm certainly not a novice.
"Likelihood" is the wrong way to think about security vulnerabilities. Once an exploit such as this is known to be possible, attackers can contrive scenarios to exploit it.
Whether or not mmap and malloc nicely return null, or whether you get a segfault on first access, depends on what OS you're using and how it is configured. You can configure Linux not to ovecommit memory. I don't think it's good practice to write user space code on the assumption that "modern OS's overcommit anyway, so we will never see null bubble out of malloc". There are conditions where that may be justified, but one example where that doesn't fly is writing portable, reusable components.
Defence in depth. This single overflow check could very well be the difference between denial-of-service-via-OOM and heap overflow.
Should we then abandon ASLR or PIE or other defensive measures as well, just because some code is "already in trouble" due to a bad strcpy or something similar?
> This single overflow check could very well be the difference between denial-of-service-via-OOM and heap overflow
OOM in general cannot be used for remote code execution. heap overflow can.
> Should we then abandon ASLR or PIE or other defensive measures as well, just because some code is "already in trouble" due to a bad strcpy or something similar
certainly not, ASLR is so general that it doesn't make sense to not do it. more importantly, they're entirely transparent to the programmer, and can be used on existing codes.
> OOM in general cannot be used for remote code execution. heap overflow can.
That was exactly my point. Using reallocarray() instead of malloc(x * sizeof(foo)) can prevent a heap overflow bug. For a concrete example (assuming 32-bit size_t):
size_t user_input = ...;
size_t* ptr = malloc(user_input * sizeof(size_t));
if (!ptr)
return "FAIL: Out of memory";
for (size_t i = 0; i < user_input; i++)
ptr[i] = whatever();
Now, a user_input of 2^30 + 1 overflows the multiplication so only 4 bytes is allocated, and then the for loop causes a heap overflow. Boom.
But if the code was this:
size_t user_input = ...;
size_t* ptr = reallocarray(NULL, user_input, sizeof(size_t));
if (!ptr)
return "FAIL: Out of memory";
for (size_t i = 0; i < user_input; i++)
ptr[i] = whatever();
Now, a user_input of 2^30 + 1 would cause reallocarray() to return NULL, and the function returns. Heap overflow avoided by using reallocarray().
Of course, that code is still somewhat bad, since a user_input of 2^30 - 1 will cause the program to use large amounts for memory.
as you correctly state, user input of 2^30 is.. well, it's unlikely. more likely, is a user being able cause a program to do billions of reallocs. in which case, OOM is going to happen before overflow and exploitation.
The Linux kernel doesn't allocate memory until first access. Most other Unixes actually allocate when you ask them to allocate. So on BSDs, etc, malloc can and will fail when you ask the computer to allocate more memory than it can actually access. Overcommitting RAM is a rather silly "optimization" and tends to cause more trouble than it actually solves.
darwin, NT, and linux all do this. that is the majority of systems. it's not a silly optimisation at all. at worst, it performs exactly as well as malloc. at best, it saves allocating memory. which in the mobile platform (again, most of computing), is a big deal.
Imagine that this allocation is for a buffer that is supposed to take user input. If the size is too small, you could easily be exposed to the classic buffer-overflow style of attacks. (That is, users know it's possible, craft a scenario to induce it, then deliver payloads.) So, yes, it's an issue.
well the solution is to validate user input. user input should certainly be limited to 4gb, at least. a wrapper around an allocator is definitely not the solution.
If you're going to validate user input, you first need to get it into kernel space, which requires allocating memory for it.
User input is not limited to 4 GB. Users can provide whatever they want; that's the point. You have to code for malicious users who are pushing your system towards its edge cases, trying to find holes to break through.
> If you're going to validate user input, you first need to get it into kernel space, which requires allocating memory for it.
i don't buy this - if you are going to allocate space, you must know the size of the space you are allocating. which you can validate, and certainly does not require going into kernel space to do.
Oh come on, what do you think the purpose of reallocarray is? It's exactly to include that boilerplate for you so that you don't have write your own "validation" before every alloc.
Letting the user (try) use as much as he needs can be perfectly okay. Even if it weren't, it doesn't necessarily mean the program itself needs to impose some limit to check the inputs against. There are other means, e.g. rlimits (which are enabled by default on OpenBSD).
What reallocarray does is the bare minimum validation that shoud always be done. And in many cases that is enough, though not always.
> am i the only one that thinks this is a total non-issue?
OpenBSD's slogan boasts that there have only ever been "two remote holes in the default install", one of those holes would have been prevented by this.
whilst i appreciate the prudence of this measure, it doesn't do anything for existing programs. this will not fix any codes in OpenBSD that currently ship with it.
either you mean intercepting malloc/realloc calls, in which case can't happen as the function signature is different, or you mean literally modifying the source code of existing programs, and releasing that.
which, lets face it, in the BSD is space (and particularly with regard to memory allocations from user input), is highly unlikely to change any time soon.
> or you mean literally modifying the source code of existing programs, and releasing that.
>
> which, lets face it, in the BSD is space (and particularly with regard to memory allocations from user input), is highly unlikely to change any time soon.
I'm not sure how to parse your sentence. Are you saying OpenBSD (which should be known for its mass audits) isn't likely to make changes in its source code any soon?
The multiplication factor is often some small constant, like 2 in the case of an array grown by doubling. It's worth making reallocarray an inline function to handle a few constant values:
inline void *reallocarray(void *old, size_t x, size_t y)
{
switch (y) {
case 1:
return realloc(old, x);
...
case 2:
// drastically simplified overflow check
...
default:
// perhaps a switch (x) with mirror cases here
return realloc_array_general(old, x, y);
}
}
The 1 case occurs when programmers are following some coding convention whereby malloc calls are replaced with this "single allocator function to bind them all": char *buf = reallocarray(size, 1);
When the function is inlined, the non-matching cases turn to dead code since the switch expression is constant.
I just looked at some of my recent code where I'm using realloc. It's obvious that this approach is insufficient, because it assumes there is only one multiplication. I have situations where an array is being resized. The size is expressed in terms of elements, rather than bytes. So two multiplications are going on: the size is increased first, and then in the realloc call, the argument expression has a multiplication by the element size also.
An overflow checking realloc won't help if the first multiplication oveflows.
Okay, now we're talking. Resize my vector old, which contains existing_size elements, by a factor of resize_factor. The individual elements have size elsize. If all goes well (non-NULL is returned), give me the new size (in elements, not bytes) in the location *p_newsize_out.
The client code computes the new size, while retaining the old. If these are supposed to be related by an integer factor, then this is passed as resize_factor. If they are not related that way, then resize_factor is passed in as zero. The elsize parameter must be nonzero, in any case.
If a nonzero resize_factor is supplied, then the calculation new_size = existing_size * resize_factor is checked for overflow, otherwise the zero value of resize_factor indicates that this check is not applicable; however, the function still at least validates that new_size > existing_size.
The calculation bytes = new_size * elsize is always checked for overflow.
Sure a good thing, but for me also a good example, why I changed from C/C++ to a scripting language.
In application development, you seldom have the need for maximum speed in all your coding. And always worrying about such things as what is addressed by this small enhancement, is a pain.
In most scripting languages, such details are just falling away.
I still program in C/C++ -- but just for very speed-intensive "hot-spot" functions, in form of Python enhancement modules. And in those, I have the time and patience to look for all these implementation details. All the rest, I do in Python.
> In application development, you seldom have the need for maximum speed in all your coding. And always worrying about such things as what is addressed by this small enhancement, is a pain.
That's what my comment was addressing: In C++ with vector, map, etc, you don't need to remember to safely realloc manually, because the containers do it for you automatically.
Might be, but in C++ there are still enough problems lingering, that using Python is worthwhile. Many workarounds have been forged in C++, but still there are so many problems lingering and to use C++ right, you really must be a master of a language that is 3 times as difficult to learn than most other programming languages.
Under the current standards, isn't C a proper subset of C++, and therefore, isn't "C/C++" just a way of saying "C++" that emphasizes that that includes C?
I had thought I remembered reading somewhere that C++11 had made C++ a superset of C (I haven't used C++ in anger in quite a long time), but from further reading while it has closer aligned with C it hasn't actually subsumed it.
Modern C and modern C++ are closer to siblings now, with the parent being K&R C. Each has added features that the other has not embraced, and writing code in one like you would in the other is a recipe for pain and suffering.
I don't get the fuzz. This is like the 3rd or 4th submission about this function. And all it does is testing that
> ((SIZE_MAX / nmemb) >= size)
before doing the size_t multiplication. That's no rocket science. Anyone who's worried about his size_ts overflowing should be able to come up with this.
I don't understand why more people don't define local helper functions. I've worked on FOSS codebases with hundreds of invocations of strncpy, each carefully followed with an additional NUL-termination check to work around strncpy's well-known problem. Why wouldn't you just write a little wrapper?
In my opinion the C library was one of C's most best innovations. But there are also some dark spots. strncpy is one of it -- and one of the things, where the C library is gravely outdated today.
Programmers just use the wrong tool for the job - arguably because strncpy is poorly named. strncpy is intended for filling zero-padded fixed-size strings, not copying normal single-zero-terminated strings.
I think the main benefit of adding functions like reallocarray is that they make the right tool for the job available by default. Plain realloc for array resizing is a timebomb in the same way as strncpy for normal strings - it will work (by accident) until it doesn't, and when it fails, it's likely to be an exploitable security hole.
I have only had to use C for a small amount of work over the years, so I have a naive question. This seems like something very essential, why is it that this function took more than 30 years to add?
For a long time the prevailing attitude was "don't make mistakes." Then after it became apparent some problems were both common and serious, people argued (and continue to do so) that if a function doesn't solve every problem, it's not worth adding. Whataboutism.
Here's just a completely random example from the Linux kernel: https://lkml.org/lkml/2014/8/19/194
So instead of changing the code code to be obviously correct (wrt. multiplication overflow), we'll instead keep the old code whose correctness becomes clear only after the programmer has looked up the types of variables involved.This mindset is exactly why vulnerabilities in C code are still common. It really needs to stop.