
The Following Code Causes Segfault in Clang - DaNmarner
http://llvm.org/bugs/show_bug.cgi?id=20516
======
lindig
If your are looking for code to break a C compiler, you can try my tool Quest
[https://github.com/lindig/quest](https://github.com/lindig/quest). It tries
to to generate code that shows that a C compiler handles parameter passing
wrong. I usually run it in a loop, like here on Mac OS X 10.9.4 witch gcc:

    
    
        :quest $ gcc --version
         Configured with: --prefix=/Library/Developer/CommandLineTools/usr  --with-gxx-include-dir=/usr/include/c++/4.2.1
         Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
         Target: x86_64-apple-darwin13.3.0
         Thread model: posix
    
        :quest $ while true; do 
         > ./main.native -test gcc -n 1 > foo.c
         > gcc -O2 -o foo foo.c 
         > ./foo || break
         > echo -n .
         > done
         ................................................................
         ................................................................
         .................................................
         Assertion failed: (b32 == b43), function callee_b0f, file foo.c, line 128.
         Abort trap: 6
    

This means the tool found C code where parameter passing is not compiled
properly. It took about 10 seconds to find this. The test case is pretty
small:

    
    
        :quest $ wc foo.c 
    	 140     444    3485 foo.c
    

The generated code that where the assertion checks that parameters are
received correctly looks like this:

    
    
        static
        union bt8 *
        callee_b0f(struct bt4 *bp7,
    	double *bp8,
    	struct bt6 bp9,
    	float bp10,
    	struct bt7 bp11,
    	double bp12,
    	short int bp13,
    	...)
        {
    	va_list ap;
    	typedef int bd0;
    	typedef struct bt0 bd1;
    	typedef int bd2;
    	typedef union bt3 bd3;
    	bd0 b41;
    	bd1 b42;
    	bd2 b43;
    	bd3 b44;
    	
    	/* seed: 2040 */
    	va_start(ap, bp13);
    	QUEST_ASSERT(b34 == bp7);
    	QUEST_ASSERT(b35 == bp8);
    	QUEST_ASSERT(b36.b24.b18 == bp9.b24.b18);
    	QUEST_ASSERT(b36.b24.b19 == bp9.b24.b19);
    	QUEST_ASSERT(b36.b24.b20 == bp9.b24.b20);
    	QUEST_ASSERT(b36.b24.b21 == bp9.b24.b21);
    	QUEST_ASSERT(b36.b24.b22 == bp9.b24.b22);
    	QUEST_ASSERT(b36.b24.b23 == bp9.b24.b23);
    	QUEST_ASSERT(b36.b25 == bp9.b25);
    	QUEST_ASSERT(b36.b26 == bp9.b26);
    	QUEST_ASSERT(b37 == bp10);
    	QUEST_ASSERT(b38.b27 == bp11.b27);
    	QUEST_ASSERT(b39 == bp12);
    	QUEST_ASSERT(b40 == bp13);
    	b41 = va_arg(ap, bd0);
    	b42 = va_arg(ap, bd1);
    	b43 = va_arg(ap, bd2);
    	b44 = va_arg(ap, bd3);
    	QUEST_ASSERT(b30 == b41);
    	QUEST_ASSERT(b31.b0 == b42.b0);
    	QUEST_ASSERT(b32 == b43);
    	QUEST_ASSERT(b33.b10.b1 == b44.b10.b1);
    	va_end(ap);
    	return b29;
        }

------
danieljh
While we're at segfaulting compiler's, here's what I found just a few days
ago:

    
    
        python -S -c 'print("void f(){} int main(){return (" + "*"*10**7 + "f)();}")' | gcc -xc -
    

(This is legal C -- look it up. Don't argue with me over the practical
relevance of this please)

~~~
deathanatos
I will point out that there is a section called "Translation limits" that
discusses how compilers can't really be excepted to compile every legal
program, because they run in a machine with a finite amount of memory.

> Both the translation and execution environments constrain the implementation
> of language translators and libraries. The following summarizes the
> language-related environmental limits on a conforming implementation; the
> library-related limits are discussed in clause 7.

> The implementation shall be able to translate and execute at least one
> program that contains at least one instance of every one of the following
> limits:

> 4095 characters in a logical source line

Of course, it notes:

> Implementations should avoid imposing fixed translation limits whenever
> possible.

Note that these aren't strict limits, and don't really have an effect on the
legality of your program, I feel it's more of a discussion of the limits
imposed by reality, and what compilers must handle at a bare minimum.

And honestly, I would hope most modern compilers would do better than the
noted limits and I'd also hope for a decent error message, not "gcc: internal
compiler error: Segmentation fault (program cc1)" (which is what the program
generates).

Last,

> This is legal C

Is it? You're returning the result of a function that returns void in a
function that returns int (and even if main were void, I still don't think
that's legal). Were gcc able to handle the abusive number of stars, it would
say,

    
    
        <stdin>: In function ‘main’:
        <stdin>:1:23: error: void value not ignored as it ought to be
    

(which is what it says if you remove some of the stars.) Granted, this can be
corrected, and your example will still cause the same output. (Which doesn't
seem nearly as interesting as the linked C++ code. I'd _like_ to know why that
causes a segfault. With yours, I'd like to know why you were doing that.)

~~~
danieljh
You are right in that the return is wrong and accidentally stayed in during
example reduction down to a smaller version. Without it, the result is still
the same.

The reason I was testing this was a discussion on IRC about functions decaying
to pointer to functions, such that they are endlessly dereferenceable. The
snippet above crashes GCC -- hard.

So, while the implementation is free not to handle 10^7 dereferencing
operations, I'm not sure a hard crash is the right answer.

Here's a version without the return and using a lambda to shorten it further:

    
    
        python -S -c 'print("int main(){(" + "*"*10**7 + "+[]{})();}")' | g++ -std=c++11 -xc++ -

------
archgoon
Hmm...

    
    
        Unable to find instantiation of declaration!
        UNREACHABLE executed at SemaTemplateInstantiateDecl.cpp:4384!
    

Not quite so unreachable...

[https://gist.github.com/cwgreene/d689f010619310dbbc77](https://gist.github.com/cwgreene/d689f010619310dbbc77)

[https://github.com/llvm-
mirror/clang/blob/b310439121c875937d...](https://github.com/llvm-
mirror/clang/blob/b310439121c875937d78cc49cc969bc1197fc025/lib/Sema/SemaTemplateInstantiateDecl.cpp#L4384)

------
udp
Something I found last week that crashes with clang-503.0.40:

    
    
        template<class T> class foo
        {
        public:
    
            ~ foo()
            {
            }
    
            foo &operator = (const foo &rhs)
            {
                foo::~foo();
                new (this) foo (rhs);
    
                return *this;
            }
        };
    
        int main(int argc, char * argv[])
        {
            foo<int> a, b;
            b = a;
        }

~~~
archgoon
This is the same bug.

------
hamburglar
Is there some legitimate reason to want to have A's destructor called twice on
a single instance?

~~~
misnome
Probably not, but the compiler crashing isn't a good way of notifying the user
of that!

~~~
hamburglar
Ah, I didn't realize the segfault was in the compiler itself. The title
("segmentation fault on calling destructor in member function") made it sound
like the generated code crashed. Now that there's a gist of a callstack it's
clearer.

------
andrewchambers
Something tells me C++ isn't the best thing to implement a compiler with.

~~~
golemotron
I modded you up because clang is written in C++ and even if I didn't know this
I'd suspect it because segfaults in languages that are not weakly typed (i.e.,
C and C++) are incredibly rare.

There are better languages to write compilers in. OCaml is one.

~~~
nly
C++ probably isn't 'weakly typed', whatever that means.

~~~
yzzxy
You probably aren't "qualified to make that statement", if you don't know what
weak typing is and are too lazy to google it.

~~~
nly
[https://en.wikipedia.org/wiki/Strong_and_weak_typing](https://en.wikipedia.org/wiki/Strong_and_weak_typing)

> In general, these terms do not have a precise definition. Rather, they tend
> to be used by advocates or critics of a given programming language, as a
> means of explaining why a given language is better or worse than
> alternatives.

