Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CompCert: A formally verified optimizing C compiler (absint.com)
96 points by adamnemecek on March 2, 2015 | hide | past | favorite | 62 comments


Although there seem to have been parts of it that hadn't been formally verified:

"Using Csmith, we found previously unknown bugs in unproved parts of CompCert—bugs that cause this compiler to silently produce incorrect code." (http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf)

The CompCert homepage tells us that the same authors did a study in the same year (2011) without finding any wrong-code errors.

But even if they hadn't; I'd still be excited about CompCert demonstrating that this level of verification is indeed possible. Because then verifying the unverified parts would probably be just as feasible, given enough resources.


The Csmith paper notes two types of errors found in CompCert: front-end errors and a misunderstanding of PPC semantics.

The former type of bug was apparently due to a part of the compiler that wasn't verified at the time. According to the Csmith paper, " This bug and five others like it were in CompCert’s unverified front-end code. Partly in response to these bug reports, the main CompCert developer expanded the verified portion of CompCert to include C’s integer promotions and other tricky implicit casts."

Additionally, the CompCert team produced a validated parser which they discussed in their ESOP 2012 paper, "Validating LR(1) Parsers"

The second type of bug (misunderstanding the semantics of the target architecture) is, as far as I know, impossible to avoid. Any verification project can end up with a true proof of the wrong theorem!


> The second type of bug (misunderstanding the semantics of the target architecture) is, as far as I know, impossible to avoid. Any verification project can end up with a true proof of the wrong theorem!

Indeed. However, although it's impossible to avoid, you can still take steps to make it less likely.

When you formally verify code, you are showing that a program has a particular specification.

If you want to eliminate errors in your specification, you should aim to make it as simple as possible, so that people with domain knowledge (in this case, people with a correct understanding of PPC) would be able to tell whether your specification is correct.


See also CakeML, "A verified implementation of a significant subset of Standard ML in the HOL4 theorem prover".

https://cakeml.org/

Not that this project, unlike CompCert, is Free Software in the Stallman sense:

https://github.com/CakeML/cakeml/blob/master/COPYING


The fact that Xavier Leroy has been working on CompCert gives it an extra level of credibility for me.

(Xavier Leroy of INRIA is the main author of OCaml, among other things.)


And the original Linux threading support.

His report on the Zinc Experiment is well worth reading:

http://gallium.inria.fr/~xleroy/bibrefs/Leroy-ZINC.html


So is this protected from the "Ken Thompson Hack"? http://c2.com/cgi/wiki?TheKenThompsonHack

I guess it would have to be cross compiled for your system otherwise your existing compiler could corrupt it...


It's not self-hosted, so it is not yet at the point where the Thompson hack would apply.

That is, CompCert is a C compiler, but it's not written in C, it's written in Coq. So when you use it, you have to trust that the Coq implementation was written correctly, and that nobody has inserted any backdoors in Coq which insert backdoors in Compcert which insert backdoors in your program.

The next level would be to also have a verified Coq compiler, which would make the whole system that much more trustworthy. And at _that_ point you could worry about self-perpetuating backdoors in that compiler, and ways to mitigate them.


Oh, but they're already building a verified Coq compiler: http://www.cs.princeton.edu/~appel/certicoq/


Unless you implement the proved compiler from scratch yourself, no.

If you're not doing massive amounts of implementation yourself, you have not escaped from Ken Thompson's hack. All this does is rephrase how they promise to not hack you... it doesn't change the fact that what they say they do and what they actually do have nothing in particular connecting them.

I wouldn't worry about it until strong AI comes along though. Humans are barely capable of making compilers that work, let alone shipping compromised compilers. (I know Ken demonstrated his hack, but to make something that does that under controlled conditions is one thing, to make one that could be shipped and robustly implement the hack quite another.)


As long as it's also capable of reproducible builds (and I would think the proofs would be very hard if the output is not even predictable!), you should be able to cross-build it from a certified platform, and bootstrap it from a trusted one (or more than one), and the bootstraps should converge to an identical binary, thus giving you both claims.


This is probably a Philistine question, but how do we make sure the proof -- that CompCert works as intended -- itself is correct? In mathematics and science, it happens all the time that someone finds a mistake in a proof years later after its first publication.


The proof is written in Coq, an automatic proof checker.

Of course, Coq itself could have bugs, but it has a small trusted kernel and it produces proof objects that can be checked, so what you get is much, much more reliable than the typical published proof in mathematics.


I think the parent's question might not be asking about whether the proof is correct in the "do all the statements in it logically hold" sense, but more like "is the proof a proof of what a 'correct' compiler should do", i.e. does the translation follow the C standard on one side, and the target machine's ISA on the other? Starting with the wrong assumptions can lead to valid reasoning to the wrong conclusion, and AFAIK Coq only helps with the "valid reasoning".

In other words, if you have the wrong idea of what a compiler is, you can write one and a proof that it is correct. Humans have to be involved at some point to tell the machine what their idea of a correct compiler is. (Even if the machine somehow knows, that would also be a result of humans teaching it at some point in the past.)


Yes, that's a valid point.

The idea is to first give a high-level formalisation of the C language definition and then to prove that the compiler correctly implements this. In the case of CompCert, the specification is given by a big-step operational semantics. Of course it's possible to make mistakes in it, but it is still small enough to be read and understood by humans (much more so than a whole compiler). From Coq one gets the guarantee that the compiler correctly implementes this language definition.

To get an idea what the specification looks like, you may look at Fig.3 in http://gallium.inria.fr/~xleroy/publi/cfront.pdf


While formal proof is not perfect, humanity is not yet aware of methods to achieve higher assurance. It is basically better than anything else we have tried.


Why would this situation be disanalogous to when we prove anything else in mathematics? There's always a chance that anything we believe (well, almost anything) could be wrong.

If you think that disqualifies any man-made proof from being worthy of the name "proof"... perhaps you would be interested in studying philosophy of mathematics? :)


In science it may happen because everything is based on a lot of assumptions, themselves based on our physical perception of the world, which may be proved wrong later in time.

Are mathematics subject to this kind of flaws ? There is a very restricted set of axioms, and almost everything else is based on formal proofs.



Wiles' proof was not written in machine-checkable language; if it had been, the error would almost certainly have been caught.


According to my very fresh reading on this topic, you are right, but your comment give the impression that writing Wile's proof in a machine-checkable language is not very different from writing it in a human language.

According to this article, I understand it is a lot more difficult to write a proof in a machine-checkable language, in the current state of the art:

http://www.cs.rug.nl/~wim/fermat/wilesEnglish.html

I read a lot of comments on Hacker News and Lobsters saying "you could have avoided this bug by writing your program in Idris, Coq, Haskell, etc. and enjoying the benefits of the Curry-Howard Isomorphism". But I'm under the impression that most of this comments undervalue the effort in "feeding" the type/proof checker with the necessary formalism.

I'm not saying that establishing a proof is worthless. Of course there is a lot of value in having a formal proof. I'm just saying that even for full-time professional mathematicians, machine-checkable formal proofs are still the exception, not the norm. This suggests the cost of building these proofs is still high, and it may explain why the practice is still limited in the field of computer science.


But writing Wile's proof in a machine-checkable language wouldn't have been a problem in itself?

There is no "universal" machine-checkable language which can be used for any proof. I don't know if Wiles' proof could have been written in Coq, for example. Let's say it can't, and that no existing machine-checkable language can be used for this purpose. In such a case, you would have to develop a new proof language, and make sure the proof checker itself is correct.

I guess that at some point we are hitting the theoretical limits imposed by the halting problem and Gödel's incompleteness theorems?

It's not my field so I'm just asking questions here. Don't hesitate to lecture me :-)


No, you're wrong. This is not a theoretical issue there, but a practical one.

Almost all mathematics (and this certainly includes Wile's proof) could be written in Coq, in theory. It is extremely hard to do in practice.

Mathematicians write proof for their peers who have a smart brain. Most of the trivial and less than trivial details are omitted. Coq cannot not accept this (because he is very stupid and can't figure the missing steps). It turns out that it is particularly hard and boring to fill the missing holes in a "human" proof.

Gödel comes in when you try to prove Coq's correctness within Coq (but this was partially done, in a sense)


I want to make sure I understand correctly. :)

I am right on the fact that rewriting Wile's proof in a machine-checkable language is a problem in itself? But I'm wrong when I suggest it is theoretical issue, when it's actually practical issue?

I've read that "the theory behind Coq is generally admitted to be consistent with regard to Zermelo-Fraenkel set theory + inaccessible cardinals". [1] I also read that some statements are undecidable in Zermelo–Fraenkel set theory [2]. It follows from this that it must exist some statements that are undecidable in Coq, correct? But I understand that this is theoretical issue, not a practical one, because "Zermelo–Fraenkel set theory, combined with first-order logic, gives a satisfactory and generally accepted formalism for essentially all current mathematics". Is it the reason why my comment was wrong?

On this topic, I've read an article titled "Computer verification of Wiles' proof of Fermat's Last Theorem" which explains that Wile's proof could probably be verified with a tool like Coq, but that would be a massive challenge:

> The mathematics used in Wiles' proof of Fermat's Last Theorem is very complicated. [...] I do not know the provers Coq and Mizar good enough, but I think they are adequate to express the mathematics.

> On the other hand, I do think that the challenge is doable, and that it will be done in the coming fifty years. It is definitely not within reach. The project will have to cover large parts of current pure mathematics. Its scope is comparable with the Bourbaki project, where between 1950 and 1970 almost all of pure mathematics got a new foundation by the efforts of an originally French group of mathematicians who published some 20 books under the pseudonym N. Bourbaki.

[1] https://coq.inria.fr/faq [2] http://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_th... [3] http://en.wikipedia.org/wiki/Hilbert%27s_program [4] http://www.cs.rug.nl/~wim/fermat/wilesEnglish.html


It's my fault, I think I read your answer too quickly and miss a negation somewhere.

So, yes, a formal rewriting of Wiles' proof is "only" a practical challenge. But a big one. Just about 15 years ago, we were just able to prove the Fundamental theorem of algebra in Coq (a 200 years old theorem that is routinely taught to undergraduates).


That's okay @thoran! Thanks a lot for having helped me gaining a better understanding of this topic.


> It follows from this that it must exist some statements that are undecidable in Coq, correct?

Yes, but only those that are undecidable in ordinary mathematics. If you're not using Coq, you have exactly the same problem. (And this concern doesn't apply if we already have the proof and just want to check it).


> Yes, but only those that are undecidable in ordinary mathematics. If you're not using Coq, you have exactly the same problem.

Yes, this is what I understood. Thanks for confirming this point.

> (And this concern doesn't apply if we already have the proof and just want to check it).

Why this concern would not apply anymore if we already have the proof and just want to check it? The proof still needs to be expressed, directly or indirectly, in terms of Zermelo–Fraenkel set theory combined with first-order logic. If the proof uses a different and "incompatible" theory, how could we check it with Coq?

(I understand this is a very rare occurence in practice and this is the reason why ZFC is so much used.)


> The proof still needs to be expressed, directly or indirectly, in terms of Zermelo–Fraenkel set theory combined with first-order logic. If the proof uses a different and "incompatible" theory, how could we check it with Coq?

If Wiles' proof wasn't in ZFC we wouldn't call it a proof (at least not without qualification). You don't get to just write mathematical proofs in any language you like - otherwise we could pick a language in which FLT is an axiom, and then the proof of it is trivial.


I don't get it. There is a lot of variants of set theory. ZFC is just one of them, albeit the most common. But some proofs are founded on an extension of ZFC. For example, the Mizar system, which is another proof assistant, has adopted Tarski–Grothendieck set theory, an extension of ZFC.

According to "The QED Manifesto Revisited" [1], it can be difficult to use Coq as-is for some kind of proofs:

> Here are four mathematical statements that most mathematicians will consider to be totally non-problematic: [...] We claim that currently none of the QED-like systems can express all four statements in a good way. Of course one can easily extend the systems with axioms that allow one to write down these statements. However, that really amounts to ‘changing the system’. It would mean that both the library and the automation of the system will not be very useful anymore. Classical & extensional reasoning in Coq or abstract algebra in the HOLs by postulating the necessary types and axioms will not be pleasant without re-engineering the system.

[1] http://mizar.org/trybulec65/8.pdf


> I don't get it. There is a lot of variants of set theory. ZFC is just one of them, albeit the most common. But some proofs are founded on an extension of ZFC. For example, the Mizar system, which is another proof assistant, has adopted Tarski–Grothendieck set theory, an extension of ZFC.

Some mathematicians accept extensions of ZFC. But I would definitely expect a proof that relied on such an extension to have in big red letters "we use these additional axioms", and the mathematical community would hope for and expect a proof without it. (Indeed proofs that make use of the axiom of choice still often flag this up and we look for proofs without it where possible). I am quite surprised that Mizar would do this, and certainly hope that it provides a way to distinguish between proofs that are valid in ZFC and those that aren't.

> According to "The QED Manifesto Revisited" [1], it can be difficult to use Coq as-is for some kind of proofs:

The statement about aleph-0 and aleph-2 (the one that Coq has problems with) isn't really mathematics so much as metamathematics/proof theory, at least if we're viewing it as a statement about the consequences of axioms. Which sure is a valuable field and one we'd like to be able to reason about, but it's circumscribed and doesn't affect normal working mathematics.

If we're talking about accepting that axiom and actually using it in a proof, then it's certainly not "totally non-problematic". The paper admits that most mathematicians wouldn't even know the statement of the proper forcing axiom. They certainly wouldn't write proofs that depended on it.


Thanks a lot @lmm. Now I get it :)

About Mizar, I don't know if you can use it without the additional axioms of Tarski-Grothendieck set theory.


Surely the implementation of the mathematics may be subject to errors, just like any implementation of anything? There is nothing that is so perfect that humans cannot err while applying it.


It is true that there might be errors in anything.

However, this is mitigated by having only a very small core that checks the proofs. Basically if there if a wrong proof goes through, there must be an error in the core (which was checked by many people and is very small).


You have to translate between the English spec and the computer-readable proof, and surely it's relatively easy to mess that up in some way and end up proving the wrong thing.


The C spec and x86 assembly are two very well known specs.


If they're very well known and everyone can understand them and implement them perfectly, we wouldn't need formal verification in the first place.

I don't doubt that formal verification significantly reduces the chances of bugs, but it's not a panacea, or we have computers that read our minds. You still need to verify that the proof is proving the right thing, although you can be sure that it's internally consistent.


> If they're very well known and everyone can understand them and implement them perfectly, we wouldn't need formal verification in the first place.

We're verifying the compiler from one language to the other. The specifications of the two languages are not that hard to implement. The difficulty is covering all cases in compiling from one to the other. The formal proof does tell us that we haven't missed any.


I agree that the chances are pretty small, but is that "good enough" when it comes to formally verifying things?

(I know next to nothing about formal verification)


Well, that's also true.

But apparently there is no better thing to do.

There is always a disconnect between what you mean and what you say. It happens all the time and the risk is not going to go away. People who work with proof assistants understand this risk better and most of them are pretty happy that the risk is sufficiently small for their purposes.


Note that it can't be used commercially without a proper license. Also, no 64 bit.


Two thoughts here:

1. If you use it commercially, you can pay for it.

2. In most cases where it's needed, 64 bit doesn't matter.

(Though I might be wrong on the second one.)


1. Nobody argued that.

2. What do you mean it doesn't matter? You think embedded and similar systems don't need/use 64-bit space? True for most I guess.


But if I don't use it commercially, but also not for research, it's not sure, whether I can pay for it.


It can't be used for anything other than research for free. So you can't write useful systems and give them away without a license either. (You might of course be able to get a license for free, but it rules out using it as a foundation for a Free embedded kit, like writing something like the Arduino platform. Which is a pity, but quite understandable).


"CompCert typically runs twice as fast as the code generated by GCC without optimizations." Lol, That's like saying you run faster than Usain Bolt when he's walking.


You are right, but this sentence makes sense in the context for which CompCert is intended.

The requirements of the DO-178 standard for aeronautics, for instance, are such that the choice for compiling safety-critical C code that runs within the aircraft is basically between either:

- the equivalent of “gcc -O0”, that is, an off-the-shelf compiler with no optimization so that source-level constructs can be mapped to assembly constructs and vice versa, or

- CompCert with optimizations, because the soundness of the optimizations can be justified to a level of formality such that it matters less to be able to trace assembly to source.

A source without much detail, but the best I can find within 5 minutes of googling: http://projects.laas.fr/IFSE/FMF/J3/slides/P05_Jean_Souyiris...


Since GCC optimizations can miscompile, target audience ("safety-critical and mission-critical software written in C and meeting high levels of assurance", aka Airbus) is currently using GCC without optimizations. So it is a fair comparison for that case, since they can now enable proven-correct optimizations.


Interesting they are using it without optimisations. The compilers is used far more with optimisations, so those code paths are actually far better tested. I have heard of cases where bugs only appear in unoptimised code.


That may be true as long as you use the same compiler versions as the developer used when writing/testing their code. But then someone decides to deploy that code on an (old and) stable version of Linux with old(er) GCC version and things break.

For example ClamAV used to break quite often when built with older compiler versions, in the end we had to add runtime tests to configure based on testcases from the GCC bugzilla to make sure people are not using a "stable" and broken compiler. GCC 4.1.0 and 4.1.1 were particularly bad versions

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27603 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26763 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28045 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37573 https://bugzilla.clamav.net/show_bug.cgi?id=670

You might think that you could still use -O2 and selectively turn off optimizations that are known to cause trouble (i.e. -fno-strict-aliasing), but there are cases where turning off that optimization exposes bugs in other optimization passes and a compiler that would be otherwise good at -O2 would break at "-O2 -fno-strict-aliasing": https://bugs.gentoo.org/show_bug.cgi?id=275928#c29 https://bugzilla.clamav.net/show_bug.cgi?id=1581


I don't think this is really true. Generally all optimization passes are run upon the unoptimized version, hence ALL code paths first pass through the -O0 machinery.

Besides, they seem to manually inspect the assembly output themselves for the final production build.


But the code path is first pass + optimisation gives correct code. That doesn't guarantee the first pass generates correct code. I believe there were bugs along those lines at one stage, though I admit they are rare. I've not had to track down a compiler bug in quite a few years.


It is not because those code paths are better tested, it is because people's code is not actually quite correct. eg something declared cost is not const unless the compiler evaluates it at compile time.


There are certain applications that fear the breakage due to the compiler optimization. (Compilers are prone to break, for your information. They are really complex applications after all.) CompCert has a fully verified optimization engine comparable to (I guess) GCC's -O2, so that really is an improvement.


It doesn't get close to “gcc -O2” (the page says within 20% of “gcc -O3”). It is only the most useful optimizations that are implemented in CompCert, at perhaps an implementation cost of one PhD thesis per optimization pass. But that is still better than no optimizations in contexts where none of the optimizations from an off-the-shelf's compiler could be trusted, as discussed in this thread.


There is on-going research on more modular frameworks that allow for faster implementation of optimizations. One of them is [1]. I worked on that project and implemented (and proved correct) several optimizations within just weeks. This is not industry-ready stuff, though.

[1] http://www.mpi-sws.org/~neis/pils/


> an implementation cost of one PhD thesis per optimization pass.

This strikes me as a little much. This kind of proof requires insight, yes, but it's hardly the level of a thesis, it hardly requires innovation to demonstrate. Just exhaustiveness.


Yeah, -O2 was my wild guess and in the reality the figure may vary a lot. I think that it is quite great anyway.


http://www.irisa.fr/celtique/ext/compcertSSA/ may bring in a whole bunch of optimisations nearly for free. We'll see...


Does anyone know if CompCert can be used to formally verify a commpiler of some other language than C, such as Assembly Language or Python ?


And who verifies the verify-er? (That is generally the problem I have with the concept of unit tests too)


Ideally, a simpler and therefore more trustworthy verifier. So far, the most fleshed out implementation of this idea is Milawa, which developed a chain of 11 increasingly sophisticated verifiers, each one checked by the previous.

http://www.cs.utexas.edu/users/jared/milawa/Web/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: