

Finding Compiler Bugs by Removing Dead Code - mehrdada
http://blog.regehr.org/archives/1161

======
petercooper
I knew this reminded me of something, and it turns out it was one of his older
posts which is also worth a read:
[http://blog.regehr.org/archives/970](http://blog.regehr.org/archives/970)
(Finding Undefined Behavior Bugs by Finding Dead Code)

~~~
pasbesoin
Just mentioning since I looked for myself.

The PDF it links is 404, but the Wayback Machine caught it in a few different
revisions (the date of Regehr's post would correspond more closely with the
earlier revision):

[http://web.archive.org/web/20130810232543/http://pdos.csail....](http://web.archive.org/web/20130810232543/http://pdos.csail.mit.edu/~xi/papers/stack-
sosp13.pdf)

[http://web.archive.org/web/20131214080638/http://pdos.csail....](http://web.archive.org/web/20131214080638/http://pdos.csail.mit.edu/~xi/papers/stack-
sosp13.pdf)

------
pedrocr
It's an interesting method. Does anyone know if these kinds of torture tests
get collected into a common library to help any future compilers or if they
just result in a paper and bug reports to current ones?

~~~
pascal_cuoq
The generated tests themselves are hardly worth the bytes they occupy. The
value is in the compact program that can generate as many of them as desired
very quickly, with varying options to focus on varying aspects of compilation
(with or without bit-fields, with or without deep pointer nesting, …)

We forced Xuejun Yang (who turned Randprog, the prototype that came before
Csmith into Csmith) to fix more bugs than were necessary for his PhD or than
you can expect a research prototype to have bugs fixed (I am one of the
developers of Frama-C, pitting Frama-C “against” Csmith was my hobby for a
summer, and we found and reported as many bugs in Csmith as we found bugs in
Frama-C). The sentence “over the last couple of years I’ve slacked off on
reporting compiler bugs” near the post's conclusion is telling. You can expect
the same story to unroll for EMI. Researchers are not rewarded for maintaining
software ad vitam æternam, even if the software is useful, but hopefully they
have or will soon release the generator as open-source, and then if you find
it too useful not to use further, you can fix or work around bugs as you find
them.

As an example, I think that there is still one bug in Csmith that we work
around by ignoring programs that have the symptoms that usually indicate that
bug (we still use Csmith to test Frama-C after we have finished a major
feature that could introduce the sort of bug it can detect).

See also
[http://blog.regehr.org/archives/1058](http://blog.regehr.org/archives/1058)
on the same blog for additional thoughts on the fate of academic software.

~~~
Jabbles
I disagree that the tests are not obviously worth keeping. Having a large
suite of regression tests is vital to stop bugs reappearing.

This type of research spawns other research and projects outside of academia
by acting as a proof of concept, even if the original researchers stop
reporting bugs.

For instance Csmith (and others) inspired Gosmith, which has found a number of
bugs in the Go compiler. I hope that someone will use the obviously successful
strategy of EMI to improve it further.

[https://code.google.com/p/gosmith/](https://code.google.com/p/gosmith/)
[https://code.google.com/p/go/issues/list?q=label:GoSmith](https://code.google.com/p/go/issues/list?q=label:GoSmith)

~~~
pascal_cuoq
> I disagree that the tests are not obviously worth keeping. Having a large
> suite of regression tests is vital to stop bugs reappearing.

What I left implicit is that a non-reduced Csmith test makes a terrible
regression test. It may for instance spend several seconds incrementing a
counter from 0 to 4000000000 before switching to the entirely unrelated
computation that once triggered a bug. The value of these randomly generated
tests is in generating a new one the next time, not in saving them to run
again and again. Running the same randomly generated tests again and again
would find very few bugs and would be a criminal waste of electricity.

The _reduced_ program is worth keeping as a regression test, because it is
typically a few lines long and these few lines contain a construct that once
tripped the compiler. Sometimes the reduced version can be rewritten by a
human to be even more concise and readable than the output of C-reduce. But as
I said in another comment, one compiler's regression test does not obviously
make a good test for another compiler.

~~~
Jabbles
Sounds like we agree then :) If we assume that no (Csmith) bug will be fixed
without a reduced test-case, then the choice of which one to use as a
regression test is obvious.

------
michaelfeathers
Has metamorphic testing been exploited for property-based testing?

~~~
davorak
Looks like there is significant overlap between metamorphic testing an
property based testing. Can anyone come up with examples that are clearly one
but not the other?

[http://en.wikipedia.org/wiki/Metamorphic_testing](http://en.wikipedia.org/wiki/Metamorphic_testing)

~~~
regehr
Metamorphic testing means taking an existing test case and mutating it into a
new test case that produces the same answer (or at least an answer that can be
easily predicted).

Property-based testing is, as far as I can tell, a meaningless term since all
testing is property-based.

~~~
michaelfeathers
This is what I was referring to:
[http://en.wikipedia.org/wiki/QuickCheck](http://en.wikipedia.org/wiki/QuickCheck)

[http://blog.jessitron.com/2013/04/property-based-testing-
wha...](http://blog.jessitron.com/2013/04/property-based-testing-what-is-
it.html)

