#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define LENGTH 128
int main(int argc, char **argv) {
char *string = NULL;
int length = 0;
if (argc > 1) {
string = argv[1];
length = strlen(string);
if (length >= LENGTH) exit(1);
}
char buffer[LENGTH];
memcpy(buffer, string, length);
buffer[length] = 0;
if (string == NULL) {
printf("String is null, so cancel the launch.\n");
} else {
printf("String is not null, so launch the missiles!\n");
}
printf("string: %s\n", string); // undefined for null but works in practice
#if SEGFAULT_ON_NULL
printf("%s\n", string); // segfaults on null when bare "%s\n"
#endif
return 0;
}
nate@skylake:~/src$ clang-3.8 -Wall -O3 null_check.c -o null_check
nate@skylake:~/src$ null_check
String is null, so cancel the launch.
string: (null)
nate@skylake:~/src$ icc-17 -Wall -O3 null_check.c -o null_check
nate@skylake:~/src$ null_check
String is null, so cancel the launch.
string: (null)
nate@skylake:~/src$ gcc-5 -Wall -O3 null_check.c -o null_check
nate@skylake:~/src$ null_check
String is not null, so launch the missiles!
string: (null)
It appear that Intel's ICC and Clang still haven't caught up with GCC's optimizations. Ouch if you were depending on that optimization to get the performance you need!
But before picking on GCC too much, consider that all three of those compilers segfault on printf("string: "); printf("%s\n", string) when string is NULL, despite having no problem with printf("string: %s\n", string) as a single statement. Can you see why using two separate statements would cause a segfault? If not, see here for a hint: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
Very interesting.
In the same vein, I once stumbled upon the following optimizer behaviour:
'''
int array[8];
// later
array[i]
'''
The optimizer deduces from the access "array[i]" that "i >= 0 && i < 8".
(leading it to optimizing away conditions like "i > 10").
Of course, if optimization leads to semantic differences, this means the program being compiled was wrong/ambiguous in the first place. However, now, to debug your programs, you must know how optimizers work.
A couple of years back, I worked at a company where I helped maintaining a Win32 application written in C, using (Open)Watcom as the compiler.
A few years before I started working there, the other programmers had been bitten by a bug in Watcom's optimizer that sometimes would wrongly optimize away comparisons involving floating point numbers.
Consequently, all code was compiled with optimizations disabled completely. (I tried to debate, but then I compiled the project with most optimizations enabled and did a benchmark - the difference in performance was minimal, so I stopped bothering).
It's usually argued that it would be too hard for the compiler to avoid false positives from templates and macro expansion. I don't like this argument, since distinguishing between "generated" code and "explicit" code isn't that hard. Also, the warning mechanism doesn't need to be perfect to be beneficial: it's generally better to catch some of the security flaws than to catch none. The one tool I've found that does catch many of these errors is "stack": https://github.com/xiw/stack
It operates by identifying "unstable" code. Essentially, it uses Clang to optimize the code twice, once with optimizations on and once with optimizations off. Then it checks to see if any "basic blocks" have been removed. Its main problem is that it's difficult to set up. You have to locally compile a specific older version of LLVM with a particular set of compile flags.
But once you have it up, it catches a lot of non-obvious errors and doesn't have many false positives. Unfortunately, in the sample code I gave above, "stack" doesn't detect any errors, because Clang doesn't doesn't optimize out the "if (string)" statement like GCC does. And it doesn't catch the switch from printf() to puts() because it's a change of which function is called rather than a change in control flow.
If the compiler is optimising code away, it's optimising away its copy of code that the diligent programmer has already written. ie. the implied assertion that the check is not performed at all is not true. The check is not performed twice.
Of course programmers are not always diligent but perhaps in that case a language which unconditionally demands diligence is not the right choice.
> It also adds a very subtle, exceptional case to several very common functions, burdening programmers.
Dealing with the burden of subtle and/or exceptional cases is the price of using a low-level language. If the price is unacceptable, don't buy the product.
But before picking on GCC too much, consider that all three of those compilers segfault on printf("string: "); printf("%s\n", string) when string is NULL, despite having no problem with printf("string: %s\n", string) as a single statement. Can you see why using two separate statements would cause a segfault? If not, see here for a hint: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609