To think that every time you have a bug that is suppressed switching off the optimizer you found a compiler bug is not a good idea. 99.99999% of the times the problem is in your code.
Experienced programmers usually will continue to think the bug is in their own code unless they can prove otherwise.
If the program works in some compiler optimization levels and not others, then think about what the optimizer is doing and how this may change the circumstances that the bug may appear. I agree that it is probably a memory corruption issue and that by turning off optimizations, you are hiding the sympton and not fixing the bug.
I think there should be a law that states: if you use a language like C or C++, you must ensure it compiles cleanly with all warnings turned on AND runs without error under a tool like Valgrind.
There are simply too many places where bugs may creep in to leave it to chance. The tools exist - use them!
I also tend to test a build linked with gcc’s mudflap:
gcc -g -fmudflap -lmudflap
This was xlC though.
Experienced programmers usually will continue to think the
bug is in their own code unless they can prove otherwise.
The fact of the matter is that a compiler like gcc is used by thousands (tens of thousands? more?) of people almost daily. Usually you have to be doing some pretty crazy stuff to find a bug in it. Bugs that go away when you turn off optimizations are usually either race-condition or memory-access related.
That said, I have seen exactly one optimizer bug that I know of. Back in 1993, Borland C++ completely omitted one of my inline destructors from the binary. I had to review the assembly to convince myself I wasn't imagining things.
I don't think the percentage is 99.99999% though. Compiler bugs do exist. I have seen several more than four in 12 years of C programming.
(Note the 'day-to-day'. Day-to-day-programming and 'all programming you do in x*10 years of C-programming' are very different sets of code written)
I once wrote some MPI code for class. It ran properly at -O0, though the compiler warned of a variable that was declared but never used. Compiling at a level other than -O0 or removing that variable declaration from the source code caused the program to segfault immediately. It turned out to be a memory error somewhere else in the program (I forget exactly what, but it was over my misunderstanding of some part of the message passing calls).
I don't have the experience of ESR, but I find the advice a bit dangerous if taken as a general one. Especially, the idea that a heisenbug is often caused by a compiler: most likely, the heisenbug is not a heisenbug at all, but just less visible depending on the compiler flags. That was the case for the vast majority of "heisenbugs" I have encountered in C.
And, for the naysayers, Hercules is built with -W -Wall.
If you are working with your own code and care if it works:
1) Turn on all compiler warnings
2) Change your code so it compiles clean
3) Run under Valgrind (or equivalent).
4) Address all reported errors, specifically
whitelisting them if necessary.
5) If you find a bug, don't stop until you've found
the cause. You're done when you understand what
caused the bug to appear, not when the symptoms go away.
6) Use open source tools, since otherwise you'll be
tempted to blame some unspecified 'bug in the compiler'.
(not that ESR would be using any other)
7) If it is a compiler bug, report it, along with
the smallest test case you can generate.
Since C makes it easy to overrun memory, it's pretty easy to make horrible mistakes and have those have seemingly random consequences.
The fact that the bug changes when you change optimizer settings, add trace statements, or add debug code would make me suspect memory corruption first.
In fact, I think it's a good assumption to always begin suspecting your own code.
Another example I can't remember the details of, but it was related to the fact that gcc adds code to zero-initialize your stack on first access unless optimizations are turned on. Code that checked for null pointers worked fine until optimizations were turned on, at which point it was discovered that a variable was being used uninitialized.
It was actually this class that motivated the whole compiler bug-finding project. The quality of the average embedded compiler is appalling, students trip on codegen bugs all the time.
Of course as many people are pointing out in this thread, most of the time the compiler is not to blame when changing optimization options changes program behavior.
But it's much more than the optimizer. Even code generation at -O0 can be broken by assumptions about alignment, insn size, etc. This usually happens when you're using a very new or very old part and the gcc developers make assumptions based on their limited dev board setups.
All appreciation should be paid to those gcc developers as it is a very difficult job they do for free. Thanks!
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35653, for example.
I was young.
The terminally curious may download a file containing the assembler output, and the C source, of the offending file from http://www.hercules-390.org/esamebug.zip . This corresponds to revision 5627 of the Hercules emulator as found in the Subversion repository at svn://svn.hercules-390.org/hercules/trunk . The emulator itself is at http://www.hercules-390.org .
The routine is in the generated assembler as z900_load_multiple_long.
Also, have you run valgrind against the test?
I've never run valgrind...it'll be interesting to see just what it does to Hercules execution speed. mudflap, too. Getting that built into the code might get even more interesting.
Always treat __asm with caution !