Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where does one go for a fully representative sample of all kinds of valid C code to test a project like this? Or -- given a formal grammar, has someone produced a tool that generates representative code?



I give +1 to Csmith recommendation. They used it to tear up all kinds of compilers with practical bugs found as a result. They even found a few in CompCert despite formal proofs due to specification errors. The middle-end came out flawless, though.

So, such results speak for themselves. I think Csmith has to be the baseline. Not sure if it's easy to tune for the equivalent of basic, acceptance tests, though. If not, then it could be nice to have a long list of tests that each correspond to specific, C-language features to help a compiler writer gradually implement one.


Actually the bugs they found in CompCert were not in the proved sections. The CSmith paper says

"As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users."

https://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf


"The problem is that the 16-bit displacement field is overflowed. CompCert’s PPC semantics failed to specify a constraint on the width of this immediate value, on the assumption that the assembler would catch out-of-range values. In fact, this is what happened. We also found a handful of crash errors in CompCert."

That bug was in the backend. I believe it's verified code based on this paper's existence:

http://gallium.inria.fr/~xleroy/publi/compcert-backend.pdf

The code correctly implemented a bad spec far as I can tell. The crash errors aren't elaborated so I can't say what they imply. So, for backend, there was at least one bug found in verified versions due to inaccurate spec.

Shows value of multiple forms of verification which Orange Book era research already supported empirically. Corroborated by Altran-Praxis and others in recent times. I've included example papers on LOCK's design/cost and Altran-Praxis' CA since they're representative of trends I've observed over time. Proving part, for common errors at least, has gotten a lot easier though with tools like SPARK. I wanted CompCert derived into SPARK at one point.

http://www.cyberdefenseagency.com/publications/LOCK-An_Histo...

https://cryptosmith.files.wordpress.com/2014/10/lock-eff-acm...

http://www.anthonyhall.org/c_by_c_secure_system.pdf


For gradual implementation I would start from a test suite in tcc [1]. Eventually adopting the one from gcc and csmith.

[1] http://bellard.org/tcc/


Forgot about that one. Thank you! I remember it being small and fast but didn't know it had memory and bounds checking. That's a great combination.


Csmith is a tool that can generate random C programs that statically and dynamically conform to the C99 standard.

It sounds a lot like a fuzzer, although only for testing success cases.


For specific test cases, you could use gcc's test cases:

https://github.com/gcc-mirror/gcc/tree/master/gcc/testsuite

Not sure about the latter, but you could probably accomplish something similar without too much work using a fuzzer.


Ha, finally an opportunity to show this off! My pet parser project has a built-in "unparser" to turn a parser (specified as a formal grammar) into a randomized test-case generator.

https://github.com/Hardmath123/nearley#the-unparser




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: