

57 Small Programs that Crash Compilers - hedgehog
http://blog.regehr.org/archives/696

======
cperciva
Small program #58: <http://llvm.org/bugs/show_bug.cgi?id=10604>

Unlike the others, this one was actually found in production.

------
AngryParsley
I wonder what he's doing that csmith[1] doesn't do. I quickly skimmed the
paper[2], but nothing jumped out at me. In fact it looks like some of the
reducers are implemented as plugins on top of csmith. I guess I'll have to
read the whole thing later.

1\. <http://embed.cs.utah.edu/csmith/>

2\. <http://www.cs.utah.edu/~regehr/papers/pldi12-preprint.pdf>

~~~
neilc
_I wonder what he's doing that csmith doesn't do._

The contribution is techniques for reducing the size of test cases, including
test cases that might be generated by csmith (you realize the author of the
blog post is one of the csmith authors, right?). From the PDLI'12 paper:

 _Using randomized differential testing, Csmith automates the construction of
programs that trigger compiler bugs. These programs are large out of
necessity: we found that bug-finding was most effective when random programs’
average size was 81 KB. In this paper, we use 98 bug-inducing programs created
by Csmith as the inputs to several automated program reducers..._

------
ecesena
Wow... amazing! ;) Only one crash for ICC, is it so better or just less
analyzed?

~~~
tedunangst
My experience working with compiler like products is that susceptibility to
crashes is related to choice of data structure for AST/IR. If you allow
"weird" stuff in your AST, some component won't handle it well and crash. If
your AST is strongly typed (but less flexible) this is less of a problem.

As a concrete example, EDG (used by ICC) has a strongly typed AST. GCC's AST
consists of a single type (tree), which allows you to build absurd trees. You
could represent the equivalent of

    
    
        int x = goto struct { while (1); }
    

because the data type doesn't prohibit a goto target that happens to be a
struct. This will probably explode when it gets to some later compiler stage.
If you have distinct goto_node and label_node types, the compiler is less
likely to accidentally create such monstrosities. You don't usually get such
trees directly from the parser, but from some middle transformation pass.

~~~
ecesena
I quote: the stronger is the theory behind, the stronger is the software.

