CompCert: A formally verified optimizing C compiler

Tomte · on March 2, 2015

Although there seem to have been parts of it that hadn't been formally verified:

"Using Csmith, we found previously unknown bugs in unproved parts of CompCert—bugs that cause this compiler to silently produce incorrect code." (http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf)

The CompCert homepage tells us that the same authors did a study in the same year (2011) without finding any wrong-code errors.

But even if they hadn't; I'd still be excited about CompCert demonstrating that this level of verification is indeed possible. Because then verifying the unverified parts would probably be just as feasible, given enough resources.

jbapple · on March 2, 2015

The Csmith paper notes two types of errors found in CompCert: front-end errors and a misunderstanding of PPC semantics.

The former type of bug was apparently due to a part of the compiler that wasn't verified at the time. According to the Csmith paper, " This bug and five others like it were in CompCert’s unverified front-end code. Partly in response to these bug reports, the main CompCert developer expanded the verified portion of CompCert to include C’s integer promotions and other tricky implicit casts."

Additionally, the CompCert team produced a validated parser which they discussed in their ESOP 2012 paper, "Validating LR(1) Parsers"

The second type of bug (misunderstanding the semantics of the target architecture) is, as far as I know, impossible to avoid. Any verification project can end up with a true proof of the wrong theorem!

DanWaterworth · on March 2, 2015

> The second type of bug (misunderstanding the semantics of the target architecture) is, as far as I know, impossible to avoid. Any verification project can end up with a true proof of the wrong theorem!

Indeed. However, although it's impossible to avoid, you can still take steps to make it less likely.

When you formally verify code, you are showing that a program has a particular specification.

If you want to eliminate errors in your specification, you should aim to make it as simple as possible, so that people with domain knowledge (in this case, people with a correct understanding of PPC) would be able to tell whether your specification is correct.

jbapple · on March 2, 2015

See also CakeML, "A verified implementation of a significant subset of Standard ML in the HOL4 theorem prover".

https://cakeml.org/

Not that this project, unlike CompCert, is Free Software in the Stallman sense:

https://github.com/CakeML/cakeml/blob/master/COPYING

nathell · on March 2, 2015

The fact that Xavier Leroy has been working on CompCert gives it an extra level of credibility for me.

(Xavier Leroy of INRIA is the main author of OCaml, among other things.)

peterfirefly · on March 2, 2015

And the original Linux threading support.

His report on the Zinc Experiment is well worth reading:

http://gallium.inria.fr/~xleroy/bibrefs/Leroy-ZINC.html

khaki54 · on March 2, 2015

So is this protected from the "Ken Thompson Hack"? http://c2.com/cgi/wiki?TheKenThompsonHack

I guess it would have to be cross compiled for your system otherwise your existing compiler could corrupt it...

vilhelm_s · on March 2, 2015

It's not self-hosted, so it is not yet at the point where the Thompson hack would apply.

That is, CompCert is a C compiler, but it's not written in C, it's written in Coq. So when you use it, you have to trust that the Coq implementation was written correctly, and that nobody has inserted any backdoors in Coq which insert backdoors in Compcert which insert backdoors in your program.

The next level would be to also have a verified Coq compiler, which would make the whole system that much more trustworthy. And at _that_ point you could worry about self-perpetuating backdoors in that compiler, and ways to mitigate them.

more_original · on March 2, 2015

Oh, but they're already building a verified Coq compiler: http://www.cs.princeton.edu/~appel/certicoq/

jerf · on March 2, 2015

Unless you implement the proved compiler from scratch yourself, no.

If you're not doing massive amounts of implementation yourself, you have not escaped from Ken Thompson's hack. All this does is rephrase how they promise to not hack you... it doesn't change the fact that what they say they do and what they actually do have nothing in particular connecting them.

I wouldn't worry about it until strong AI comes along though. Humans are barely capable of making compilers that work, let alone shipping compromised compilers. (I know Ken demonstrated his hack, but to make something that does that under controlled conditions is one thing, to make one that could be shipped and robustly implement the hack quite another.)

puetzk · on March 2, 2015

As long as it's also capable of reproducible builds (and I would think the proofs would be very hard if the output is not even predictable!), you should be able to cross-build it from a certified platform, and bootstrap it from a trusted one (or more than one), and the bootstraps should converge to an identical binary, thus giving you both claims.

ngrilly · on March 2, 2015

This is probably a Philistine question, but how do we make sure the proof -- that CompCert works as intended -- itself is correct? In mathematics and science, it happens all the time that someone finds a mistake in a proof years later after its first publication.

more_original · on March 2, 2015

The proof is written in Coq, an automatic proof checker.

Of course, Coq itself could have bugs, but it has a small trusted kernel and it produces proof objects that can be checked, so what you get is much, much more reliable than the typical published proof in mathematics.

userbinator · on March 2, 2015

I think the parent's question might not be asking about whether the proof is correct in the "do all the statements in it logically hold" sense, but more like "is the proof a proof of what a 'correct' compiler should do", i.e. does the translation follow the C standard on one side, and the target machine's ISA on the other? Starting with the wrong assumptions can lead to valid reasoning to the wrong conclusion, and AFAIK Coq only helps with the "valid reasoning".

In other words, if you have the wrong idea of what a compiler is, you can write one and a proof that it is correct. Humans have to be involved at some point to tell the machine what their idea of a correct compiler is. (Even if the machine somehow knows, that would also be a result of humans teaching it at some point in the past.)

more_original · on March 2, 2015

Yes, that's a valid point.

The idea is to first give a high-level formalisation of the C language definition and then to prove that the compiler correctly implements this. In the case of CompCert, the specification is given by a big-step operational semantics. Of course it's possible to make mistakes in it, but it is still small enough to be read and understood by humans (much more so than a whole compiler). From Coq one gets the guarantee that the compiler correctly implementes this language definition.

To get an idea what the specification looks like, you may look at Fig.3 in http://gallium.inria.fr/~xleroy/publi/cfront.pdf

sanxiyn · on March 2, 2015

While formal proof is not perfect, humanity is not yet aware of methods to achieve higher assurance. It is basically better than anything else we have tried.

Jimmy · on March 2, 2015

Why would this situation be disanalogous to when we prove anything else in mathematics? There's always a chance that anything we believe (well, almost anything) could be wrong.

If you think that disqualifies any man-made proof from being worthy of the name "proof"... perhaps you would be interested in studying philosophy of mathematics? :)

arcatek · on March 2, 2015

In science it may happen because everything is based on a lot of assumptions, themselves based on our physical perception of the world, which may be proved wrong later in time.

Are mathematics subject to this kind of flaws ? There is a very restricted set of axioms, and almost everything else is based on formal proofs.

one-more-minute · on March 2, 2015

See, for example:

https://en.wikipedia.org/wiki/Wiles%27_proof_of_Fermat%27s_L...

http://rsta.royalsocietypublishing.org/content/roypta/363/18...

lmm · on March 2, 2015

Wiles' proof was not written in machine-checkable language; if it had been, the error would almost certainly have been caught.

ngrilly · on March 2, 2015

According to my very fresh reading on this topic, you are right, but your comment give the impression that writing Wile's proof in a machine-checkable language is not very different from writing it in a human language.

According to this article, I understand it is a lot more difficult to write a proof in a machine-checkable language, in the current state of the art:

http://www.cs.rug.nl/~wim/fermat/wilesEnglish.html

I read a lot of comments on Hacker News and Lobsters saying "you could have avoided this bug by writing your program in Idris, Coq, Haskell, etc. and enjoying the benefits of the Curry-Howard Isomorphism". But I'm under the impression that most of this comments undervalue the effort in "feeding" the type/proof checker with the necessary formalism.

I'm not saying that establishing a proof is worthless. Of course there is a lot of value in having a formal proof. I'm just saying that even for full-time professional mathematicians, machine-checkable formal proofs are still the exception, not the norm. This suggests the cost of building these proofs is still high, and it may explain why the practice is still limited in the field of computer science.

ngrilly · on March 2, 2015

But writing Wile's proof in a machine-checkable language wouldn't have been a problem in itself?

There is no "universal" machine-checkable language which can be used for any proof. I don't know if Wiles' proof could have been written in Coq, for example. Let's say it can't, and that no existing machine-checkable language can be used for this purpose. In such a case, you would have to develop a new proof language, and make sure the proof checker itself is correct.

I guess that at some point we are hitting the theoretical limits imposed by the halting problem and Gödel's incompleteness theorems?

It's not my field so I'm just asking questions here. Don't hesitate to lecture me :-)

thoran · on March 2, 2015

No, you're wrong. This is not a theoretical issue there, but a practical one.

Almost all mathematics (and this certainly includes Wile's proof) could be written in Coq, in theory. It is extremely hard to do in practice.

Mathematicians write proof for their peers who have a smart brain. Most of the trivial and less than trivial details are omitted. Coq cannot not accept this (because he is very stupid and can't figure the missing steps). It turns out that it is particularly hard and boring to fill the missing holes in a "human" proof.

Gödel comes in when you try to prove Coq's correctness within Coq (but this was partially done, in a sense)

ngrilly · on March 2, 2015

I want to make sure I understand correctly. :)

I am right on the fact that rewriting Wile's proof in a machine-checkable language is a problem in itself? But I'm wrong when I suggest it is theoretical issue, when it's actually practical issue?

I've read that "the theory behind Coq is generally admitted to be consistent with regard to Zermelo-Fraenkel set theory + inaccessible cardinals". [1] I also read that some statements are undecidable in Zermelo–Fraenkel set theory [2]. It follows from this that it must exist some statements that are undecidable in Coq, correct? But I understand that this is theoretical issue, not a practical one, because "Zermelo–Fraenkel set theory, combined with first-order logic, gives a satisfactory and generally accepted formalism for essentially all current mathematics". Is it the reason why my comment was wrong?

On this topic, I've read an article titled "Computer verification of Wiles' proof of Fermat's Last Theorem" which explains that Wile's proof could probably be verified with a tool like Coq, but that would be a massive challenge:

> The mathematics used in Wiles' proof of Fermat's Last Theorem is very complicated. [...] I do not know the provers Coq and Mizar good enough, but I think they are adequate to express the mathematics.

> On the other hand, I do think that the challenge is doable, and that it will be done in the coming fifty years. It is definitely not within reach. The project will have to cover large parts of current pure mathematics. Its scope is comparable with the Bourbaki project, where between 1950 and 1970 almost all of pure mathematics got a new foundation by the efforts of an originally French group of mathematicians who published some 20 books under the pseudonym N. Bourbaki.

[1] https://coq.inria.fr/faq [2] http://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_th... [3] http://en.wikipedia.org/wiki/Hilbert%27s_program [4] http://www.cs.rug.nl/~wim/fermat/wilesEnglish.html

thoran · on March 2, 2015

It's my fault, I think I read your answer too quickly and miss a negation somewhere.

So, yes, a formal rewriting of Wiles' proof is "only" a practical challenge. But a big one. Just about 15 years ago, we were just able to prove the Fundamental theorem of algebra in Coq (a 200 years old theorem that is routinely taught to undergraduates).

ngrilly · on March 3, 2015

That's okay @thoran! Thanks a lot for having helped me gaining a better understanding of this topic.

lmm · on March 2, 2015

> It follows from this that it must exist some statements that are undecidable in Coq, correct?

Yes, but only those that are undecidable in ordinary mathematics. If you're not using Coq, you have exactly the same problem. (And this concern doesn't apply if we already have the proof and just want to check it).

ngrilly · on March 2, 2015

> Yes, but only those that are undecidable in ordinary mathematics. If you're not using Coq, you have exactly the same problem.

Yes, this is what I understood. Thanks for confirming this point.

> (And this concern doesn't apply if we already have the proof and just want to check it).

Why this concern would not apply anymore if we already have the proof and just want to check it? The proof still needs to be expressed, directly or indirectly, in terms of Zermelo–Fraenkel set theory combined with first-order logic. If the proof uses a different and "incompatible" theory, how could we check it with Coq?

(I understand this is a very rare occurence in practice and this is the reason why ZFC is so much used.)

lmm · on March 3, 2015

> The proof still needs to be expressed, directly or indirectly, in terms of Zermelo–Fraenkel set theory combined with first-order logic. If the proof uses a different and "incompatible" theory, how could we check it with Coq?

If Wiles' proof wasn't in ZFC we wouldn't call it a proof (at least not without qualification). You don't get to just write mathematical proofs in any language you like - otherwise we could pick a language in which FLT is an axiom, and then the proof of it is trivial.

ngrilly · on March 3, 2015

I don't get it. There is a lot of variants of set theory. ZFC is just one of them, albeit the most common. But some proofs are founded on an extension of ZFC. For example, the Mizar system, which is another proof assistant, has adopted Tarski–Grothendieck set theory, an extension of ZFC.

According to "The QED Manifesto Revisited" [1], it can be difficult to use Coq as-is for some kind of proofs:

> Here are four mathematical statements that most mathematicians will consider to be totally non-problematic: [...] We claim that currently none of the QED-like systems can express all four statements in a good way. Of course one can easily extend the systems with axioms that allow one to write down these statements. However, that really amounts to ‘changing the system’. It would mean that both the library and the automation of the system will not be very useful anymore. Classical & extensional reasoning in Coq or abstract algebra in the HOLs by postulating the necessary types and axioms will not be pleasant without re-engineering the system.

[1] http://mizar.org/trybulec65/8.pdf

lmm · on March 4, 2015

> I don't get it. There is a lot of variants of set theory. ZFC is just one of them, albeit the most common. But some proofs are founded on an extension of ZFC. For example, the Mizar system, which is another proof assistant, has adopted Tarski–Grothendieck set theory, an extension of ZFC.

Some mathematicians accept extensions of ZFC. But I would definitely expect a proof that relied on such an extension to have in big red letters "we use these additional axioms", and the mathematical community would hope for and expect a proof without it. (Indeed proofs that make use of the axiom of choice still often flag this up and we look for proofs without it where possible). I am quite surprised that Mizar would do this, and certainly hope that it provides a way to distinguish between proofs that are valid in ZFC and those that aren't.

> According to "The QED Manifesto Revisited" [1], it can be difficult to use Coq as-is for some kind of proofs:

The statement about aleph-0 and aleph-2 (the one that Coq has problems with) isn't really mathematics so much as metamathematics/proof theory, at least if we're viewing it as a statement about the consequences of axioms. Which sure is a valuable field and one we'd like to be able to reason about, but it's circumscribed and doesn't affect normal working mathematics.

If we're talking about accepting that axiom and actually using it in a proof, then it's certainly not "totally non-problematic". The paper admits that most mathematicians wouldn't even know the statement of the proper forcing axiom. They certainly wouldn't write proofs that depended on it.

ngrilly · on March 4, 2015

Thanks a lot @lmm. Now I get it :)

About Mizar, I don't know if you can use it without the additional axioms of Tarski-Grothendieck set theory.

Drakim · on March 2, 2015

Surely the implementation of the mathematics may be subject to errors, just like any implementation of anything? There is nothing that is so perfect that humans cannot err while applying it.

scscsc · on March 2, 2015

It is true that there might be errors in anything.

However, this is mitigated by having only a very small core that checks the proofs. Basically if there if a wrong proof goes through, there must be an error in the core (which was checked by many people and is very small).

vertex-four · on March 2, 2015

You have to translate between the English spec and the computer-readable proof, and surely it's relatively easy to mess that up in some way and end up proving the wrong thing.

qznc · on March 2, 2015

The C spec and x86 assembly are two very well known specs.

vertex-four · on March 2, 2015

If they're very well known and everyone can understand them and implement them perfectly, we wouldn't need formal verification in the first place.

I don't doubt that formal verification significantly reduces the chances of bugs, but it's not a panacea, or we have computers that read our minds. You still need to verify that the proof is proving the right thing, although you can be sure that it's internally consistent.

more_original · on March 2, 2015

> If they're very well known and everyone can understand them and implement them perfectly, we wouldn't need formal verification in the first place.

We're verifying the compiler from one language to the other. The specifications of the two languages are not that hard to implement. The difficulty is covering all cases in compiling from one to the other. The formal proof does tell us that we haven't missed any.

Drakim · on March 2, 2015

I agree that the chances are pretty small, but is that "good enough" when it comes to formally verifying things?

(I know next to nothing about formal verification)

scscsc · on March 2, 2015

Well, that's also true.

But apparently there is no better thing to do.

There is always a disconnect between what you mean and what you say. It happens all the time and the risk is not going to go away. People who work with proof assistants understand this risk better and most of them are pretty happy that the risk is sufficiently small for their purposes.

Keyframe · on March 2, 2015

Note that it can't be used commercially without a proper license. Also, no 64 bit.

jk4930 · on March 2, 2015

Two thoughts here:

1. If you use it commercially, you can pay for it.

2. In most cases where it's needed, 64 bit doesn't matter.

(Though I might be wrong on the second one.)

Keyframe · on March 2, 2015

1. Nobody argued that.

2. What do you mean it doesn't matter? You think embedded and similar systems don't need/use 64-bit space? True for most I guess.

wolfgke · on March 2, 2015

But if I don't use it commercially, but also not for research, it's not sure, whether I can pay for it.

e12e · on March 2, 2015

It can't be used for anything other than research for free. So you can't write useful systems and give them away without a license either. (You might of course be able to get a license for free, but it rules out using it as a foundation for a Free embedded kit, like writing something like the Arduino platform. Which is a pity, but quite understandable).

soup10 · on March 2, 2015

"CompCert typically runs twice as fast as the code generated by GCC without optimizations." Lol, That's like saying you run faster than Usain Bolt when he's walking.

pascal_cuoq · on March 2, 2015

You are right, but this sentence makes sense in the context for which CompCert is intended.

The requirements of the DO-178 standard for aeronautics, for instance, are such that the choice for compiling safety-critical C code that runs within the aircraft is basically between either:

- the equivalent of “gcc -O0”, that is, an off-the-shelf compiler with no optimization so that source-level constructs can be mapped to assembly constructs and vice versa, or

- CompCert with optimizations, because the soundness of the optimizations can be justified to a level of formality such that it matters less to be able to trace assembly to source.

A source without much detail, but the best I can find within 5 minutes of googling: http://projects.laas.fr/IFSE/FMF/J3/slides/P05_Jean_Souyiris...

sanxiyn · on March 2, 2015

Since GCC optimizations can miscompile, target audience ("safety-critical and mission-critical software written in C and meeting high levels of assurance", aka Airbus) is currently using GCC without optimizations. So it is a fair comparison for that case, since they can now enable proven-correct optimizations.

craftit · on March 2, 2015

Interesting they are using it without optimisations. The compilers is used far more with optimisations, so those code paths are actually far better tested. I have heard of cases where bugs only appear in unoptimised code.

edwintorok · on March 2, 2015

That may be true as long as you use the same compiler versions as the developer used when writing/testing their code. But then someone decides to deploy that code on an (old and) stable version of Linux with old(er) GCC version and things break.

For example ClamAV used to break quite often when built with older compiler versions, in the end we had to add runtime tests to configure based on testcases from the GCC bugzilla to make sure people are not using a "stable" and broken compiler. GCC 4.1.0 and 4.1.1 were particularly bad versions

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27603 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26763 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28045 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37573 https://bugzilla.clamav.net/show_bug.cgi?id=670

You might think that you could still use -O2 and selectively turn off optimizations that are known to cause trouble (i.e. -fno-strict-aliasing), but there are cases where turning off that optimization exposes bugs in other optimization passes and a compiler that would be otherwise good at -O2 would break at "-O2 -fno-strict-aliasing": https://bugs.gentoo.org/show_bug.cgi?id=275928#c29 https://bugzilla.clamav.net/show_bug.cgi?id=1581

andrewchambers · on March 2, 2015

I don't think this is really true. Generally all optimization passes are run upon the unoptimized version, hence ALL code paths first pass through the -O0 machinery.

Besides, they seem to manually inspect the assembly output themselves for the final production build.

craftit · on March 5, 2015

But the code path is first pass + optimisation gives correct code. That doesn't guarantee the first pass generates correct code. I believe there were bugs along those lines at one stage, though I admit they are rare. I've not had to track down a compiler bug in quite a few years.

justincormack · on March 2, 2015

It is not because those code paths are better tested, it is because people's code is not actually quite correct. eg something declared cost is not const unless the compiler evaluates it at compile time.

lifthrasiir · on March 2, 2015

There are certain applications that fear the breakage due to the compiler optimization. (Compilers are prone to break, for your information. They are really complex applications after all.) CompCert has a fully verified optimization engine comparable to (I guess) GCC's -O2, so that really is an improvement.

pascal_cuoq · on March 2, 2015

It doesn't get close to “gcc -O2” (the page says within 20% of “gcc -O3”). It is only the most useful optimizations that are implemented in CompCert, at perhaps an implementation cost of one PhD thesis per optimization pass. But that is still better than no optimizations in contexts where none of the optimizations from an off-the-shelf's compiler could be trusted, as discussed in this thread.

eruquen · on March 2, 2015

There is on-going research on more modular frameworks that allow for faster implementation of optimizations. One of them is [1]. I worked on that project and implemented (and proved correct) several optimizations within just weeks. This is not industry-ready stuff, though.

[1] http://www.mpi-sws.org/~neis/pils/

duaneb · on March 2, 2015

> an implementation cost of one PhD thesis per optimization pass.

This strikes me as a little much. This kind of proof requires insight, yes, but it's hardly the level of a thesis, it hardly requires innovation to demonstrate. Just exhaustiveness.

lifthrasiir · on March 2, 2015

Yeah, -O2 was my wild guess and in the reality the figure may vary a lot. I think that it is quite great anyway.

sklogic · on March 2, 2015

http://www.irisa.fr/celtique/ext/compcertSSA/ may bring in a whole bunch of optimisations nearly for free. We'll see...

whitten · on March 3, 2015

Does anyone know if CompCert can be used to formally verify a commpiler of some other language than C, such as Assembly Language or Python ?

venomsnake · on March 2, 2015

And who verifies the verify-er? (That is generally the problem I have with the concept of unit tests too)

vilhelm_s · on March 2, 2015

Ideally, a simpler and therefore more trustworthy verifier. So far, the most fleshed out implementation of this idea is Milawa, which developed a chain of 11 increasingly sophisticated verifiers, each one checked by the previous.

http://www.cs.utexas.edu/users/jared/milawa/Web/