> They discover bugs by testing random programs against multiple compilers. If the result from any of the compilers disagree, then there must be a bug.
How does this make sense?
If the result differs from the specification, it is a bug.
If the result is unspecified in the specification, the different compilers can differ as much as they want without any of them being considered buggy.
From the linked paper: "Although this compiler-testing approach has been used before [6, 16, 23], Csmith’s test-generation techniques substantially advance the state of the art by generating random programs that are expressive—containing complex code using many C language features—while also ensuring that every generated program has a single interpretation. To have a unique interpretation, a program must not execute any of the 191 kinds of undefined behavior, nor depend on any of the 52 kinds of unspecified behavior, that are described in the C99 standard."
I took "They guarantee that the inputs are legal" to mean that they limited it to programs with specified behavior. They don't know what the behavior is -- just that it is specified.
If they can do this, it finds a subset of bugs, with no false positives.
C compilers can be buggy, particularly when you start working with vendor-supplied compilers for embedded platforms. A colleague was furious when he realized that his board's compiler didn't support function pointers.
I could imagine a DSP architecture that doesn't intrinsically support indirect jumps. (especially as DSPs frequently use the Harvard memory model) That would make implementing function pointers tricky. I'd probably work around this by making a set of dispatch macros that expand into a giant switch block where each case is a (static) function call. The other option would be self-modifying code, which is annoying to do, to say the least, particularly for Harvard systems.
If your CPU supports keeping function return addresses on a stack that you can push other things onto, you can do an indirect jump by pushing the address you want to jump to and then "returning" to it. That's a lot easier than self-modifying code or massive switch statements, and just as easy on Harvard as on von Neumann architectures.
Both loumf and Wilya are correct. In support of their answers, remember that the specification does not specify the results of interesting programs. It says "if you do this, this must be the result." But if you limit yourself to only testing such simple cases, you're not going to find any interesting bugs - because such simple programs are likely to have already been tested.
>If the result differs from the specification, it is a bug.
A large part of the C standard is implementation defined(see acqq's post here: http://news.ycombinator.com/item?id=4131828 ), so the result could be different on multiple compilers, not a bug, and STILL completely within spec.
It's certainly related. The difference here is that 1) they're comparing output of multiple systems, rather than looking for obviously erroneous behavior of one (segfaults, memory leaks, failed assertions); and 2) the input data is all correct - fuzzing (per my understanding) usually implies tossing bad data in to see if the system breaks (frequently just slightly bad data is more interesting than complete garbage, but either falls under "fuzzing").
This kind of coinage is a rather large rabbit hole! Once upon a time in 1963, someone asked: what happens if you take the Fourier transform of a Fourier transform? Well, a Fourier transform gives you a spectrum, so let's call a Fourier transform of that, a new concept called a cepstrum. So what are its bins, analogous to frequency bins? Let's call them quefrency bins, and the cepstrum is therefore a quefrency cepstrum. What's the operation when you modify quefrencies in the cepstrum in some manner other than uniformly, analogous to how one might run a frequency spectrum through a frequency-domain filter? Why, liftering, of course.
in case anyone else is confused - it (a cepstrum) is the ft of the log of the modulus of an ft. the ft of an ft is the original signal.
https://en.wikipedia.org/wiki/Cepstrum
How does this make sense?
If the result differs from the specification, it is a bug.
If the result is unspecified in the specification, the different compilers can differ as much as they want without any of them being considered buggy.