Hacker News new | past | comments | ask | show | jobs | submit login
Memcpy (and friends) with NULL pointers (imperialviolet.org)
44 points by runesoerensen on June 29, 2016 | hide | past | favorite | 6 comments



Here's another example of this fine feature:

  #include <stdio.h>
  #include <string.h>
  #include <stdlib.h>
  #define LENGTH 128

  int main(int argc, char **argv) {
      char *string = NULL;
      int length = 0;
      if (argc > 1) {
          string = argv[1];
          length = strlen(string);
          if (length >= LENGTH) exit(1);
      }

      char buffer[LENGTH];
      memcpy(buffer, string, length);
      buffer[length] = 0;

      if (string == NULL) {
          printf("String is null, so cancel the launch.\n");
      } else {
          printf("String is not null, so launch the missiles!\n");
      }

      printf("string: %s\n", string);  // undefined for null but works in practice

      #if SEGFAULT_ON_NULL
      printf("%s\n", string);          // segfaults on null when bare "%s\n"
      #endif

      return 0;
  }

  nate@skylake:~/src$ clang-3.8 -Wall -O3 null_check.c -o null_check
  nate@skylake:~/src$ null_check
  String is null, so cancel the launch.
  string: (null)

  nate@skylake:~/src$ icc-17 -Wall -O3 null_check.c -o null_check
  nate@skylake:~/src$ null_check
  String is null, so cancel the launch.
  string: (null)

  nate@skylake:~/src$ gcc-5 -Wall -O3 null_check.c -o null_check
  nate@skylake:~/src$ null_check
  String is not null, so launch the missiles!
  string: (null)
It appear that Intel's ICC and Clang still haven't caught up with GCC's optimizations. Ouch if you were depending on that optimization to get the performance you need!

But before picking on GCC too much, consider that all three of those compilers segfault on printf("string: "); printf("%s\n", string) when string is NULL, despite having no problem with printf("string: %s\n", string) as a single statement. Can you see why using two separate statements would cause a segfault? If not, see here for a hint: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609


Very interesting. In the same vein, I once stumbled upon the following optimizer behaviour: ''' int array[8];

// later array[i]

''' The optimizer deduces from the access "array[i]" that "i >= 0 && i < 8". (leading it to optimizing away conditions like "i > 10").

Of course, if optimization leads to semantic differences, this means the program being compiled was wrong/ambiguous in the first place. However, now, to debug your programs, you must know how optimizers work.


A couple of years back, I worked at a company where I helped maintaining a Win32 application written in C, using (Open)Watcom as the compiler.

A few years before I started working there, the other programmers had been bitten by a bug in Watcom's optimizer that sometimes would wrongly optimize away comparisons involving floating point numbers.

Consequently, all code was compiled with optimizations disabled completely. (I tried to debate, but then I compiled the project with most optimizations enabled and did a benchmark - the difference in performance was minimal, so I stopped bothering).


Can't the compiler warn about this?


It's usually argued that it would be too hard for the compiler to avoid false positives from templates and macro expansion. I don't like this argument, since distinguishing between "generated" code and "explicit" code isn't that hard. Also, the warning mechanism doesn't need to be perfect to be beneficial: it's generally better to catch some of the security flaws than to catch none. The one tool I've found that does catch many of these errors is "stack": https://github.com/xiw/stack

It operates by identifying "unstable" code. Essentially, it uses Clang to optimize the code twice, once with optimizations on and once with optimizations off. Then it checks to see if any "basic blocks" have been removed. Its main problem is that it's difficult to set up. You have to locally compile a specific older version of LLVM with a particular set of compile flags.

But once you have it up, it catches a lot of non-obvious errors and doesn't have many false positives. Unfortunately, in the sample code I gave above, "stack" doesn't detect any errors, because Clang doesn't doesn't optimize out the "if (string)" statement like GCC does. And it doesn't catch the switch from printf() to puts() because it's a change of which function is called rather than a change in control flow.


If the compiler is optimising code away, it's optimising away its copy of code that the diligent programmer has already written. ie. the implied assertion that the check is not performed at all is not true. The check is not performed twice.

Of course programmers are not always diligent but perhaps in that case a language which unconditionally demands diligence is not the right choice.

> It also adds a very sub­tle, ex­cep­tional case to sev­eral very com­mon func­tions, bur­den­ing pro­gram­mers.

Dealing with the burden of subtle and/or exceptional cases is the price of using a low-level language. If the price is unacceptable, don't buy the product.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: