It’s not up to the platform. If it looks like a code path might dereference a null pointer, the compiler can and will make wild and bizarre optimizations in the surrounding code.
If code dereferences a NULL pointer, the spec imposes no constraints whatsoever on what happens after that point.
Yes, some compilers will make "wild and bizarre" optimisations (they are neither) based on the assumption that any pointer which is dereferenced is not NULL - because if it were, the code could have done the exact same thing anyway, because there are no constraints in that case.
But compilers don't have to do that. There are no constraints. A compiler can emit code to do anything at all in the event of a NULL pointer being dereferenced, including emitting the machine code you'd naively expect - which may trap on some CPUs, or always return 0 on reads and swallow writes on other CPUs. It may also emit code that wipes your hard disk (and still counts as a conforming implementation). Or make demons fly out of your nose.
A compiler vendor may choose to guarantee a specific behaviour for some invalid constructs. They are allowed to do this because there are no constraints on what they can do. It's just that most don't, because 99% of people comparing compilers care more about benchmarks than they do about what happens with invalid code.
So in some ways, it is up to the compiler/platform/CPU what actually happens with code that has invalid behaviour. But as a C author, you can't assume any particular behaviour, because it might change from one CPU to another. Or from one OS to another. Or from one compiler to another. Or from one version of one compiler to another.
This. In the case of AIX, IBM designed every component of the platform: the CPU, the operating system and the compiler. When they tell you that it’s safe to read up to 4K (or whatever) of zero bytes from NULL, they can make that promise. But obviously such code is not portable.
But this means the vast majority of C code isn't portable since you can't know what it will do with a given platform and compiler combination and the presence of even a single line of UBI invalidates the whole program.
It basically means that the whole portability argument that is supposed to be in favor of C is just wrong because every compiler and platform actually brings its own C dialect and it is just sheer coincidence that it works at all.
Are you seriously suggesting that the vast majority of C code causes signed int overflows or dereferences invalid pointers?
Not sure about the C code bases you've been spending time in, but in the ones I've looked at, the overwhelming majority of the code has been well-defined by either the C standard or the implementation. In the cases where it hasn't, the vast majority of those cases were genuine bugs that needed fixing, and the fix made the code well-defined.
The number of codebases I've seen that actually relied on unspecified behaviour, or on whatever the current compiler/OS/CPU happened to do with undefined behaviour, is miniscule. (Or an entry in an obfuscated/underhanded coding contest.)
Yes. The nominal reality is that effectively all extant C code has undefined behavior, but the practical reality is that most of that code will work portably as long as some reasonable precautions are taken.
The compiler doesn't have to come to your home and actively haunt you. The behaviour is indeed up to the platform (by "platform" I understand "compiler + host system" or whatever is used to run the program).
Ah - I suppose I don’t think of the compiler as part of the platform. Most platforms I work with tend to use gcc or clang, and as a result everyone is subject to the gcc / clang optimizations.
I recall IBM’s AIX used to put a zeroed-out block at address zero, so this code was guaranteed to always return 0:
The reason was that it allowed their optimizing compiler to speculatively dereference a pointer before executing a surrounding if condition.