

CompCert - Compilers you can formally trust - evangineer
http://compcert.inria.fr/index.html

======
davidtgoldblatt
Also worth reading: the paper that found bugs in the proven-correct compiler:
<http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf>

(okay, so only a portion of the compiler is proven correct, and they didn't
find any bugs in that portion).

Note also that a compiler with a proof isn't necessarily correct - it just
means that you've pushed all your bugs into your description of the rules of C
and of the target architecture. Since neither C nor any common CPU
architecture have a formal semantics, someone has to read the (english)
description and convert it to a description in their theorem proving software,
and there's no reason to believe this will be easy.

~~~
skew
One thing I missed the first time reading the paper is that those were the
only bugs they could find in the compiler. After reporting them and getting
fixes, several CPU-years more of random programs haven turned up another
failure.

All the other compilers pick up tens of failing cases a night, which they skim
off and report as their previously reported tests are fixed (the LLVM and GCC
teams are quite responsive, others less so).

This kind of differential testing also provides some evidence that the
formalization isn't too far off - each "passing" case means that almost all
the compilers produced exactly the same results for the program.

Both projects are awesome, especially if you like reliable compilers.

------
schrototo
This old Knuth quote seems apt:

 _"Beware of bugs in the above code; I have only proved it correct, not tried
it."_

~~~
cwzwarich
That seems like it either a statement about the nature of informal proof or
the disconnect between the proof and the code. How does that apply to
something written in Coq, where the program is extracted from a mechanically
verified proof?

~~~
shasta
Mechanical proof of what, though? The spec can fail to capture intent.

~~~
njs12345
In this case, the semantics of the C subset chosen are captured perfectly by
the assembly produced by the compiler.

Seems like that's the intent of a compiler to me :)

~~~
shasta
Right, but first you have to specify "the semantics of C" and "the semantics
of assembly". How do you prove you did that right?

~~~
cwzwarich
Compcert uses small-step operational semantics for both the source language
and the assembly output (and intermediate languages that are irrelevant to the
final statement of the semantic preservation theorem).

Specifying such a semantics is basically like writing a very simple functional
interpreter for the language. I don't see how it could get much simpler. Here
is the description of the semantics for their input language:

<http://compcert.inria.fr/doc/html/Csem.html>

and their PPC output:

<http://compcert.inria.fr/doc/html/Asm.html>

~~~
shasta
Yes. The approach is probably as good as it can get. And yet, it's not
perfect. That functional interpreter could have a bug. And thus Knuth's quote
is applicable. Testing the compiler could reveal to you that your
specification wasn't really what you intended.

------
click170
I'm confused. The title made me think "list of compilers I can trust", but the
website provides no such thing. It looks to me like they've written their own
compiler.

~~~
p4bl0
Yes. CompCert is a project of the French research institute in computer
science and automation (INRIA). The goal is to have a certified compiler for
the C programming language. I believe they're writing a C compiler in Coq (a
proof assistant also developped mainly at INRIA).

The main guy behind CompCert is Xavier Leroy (already known for OCaml).

~~~
MostAwesomeDude
Yeah, the big thing here is that almost all of this thing is _proven_ ,
meaning that the compiler's behavior is well-defined and correct, all the
time, on all inputs.

I still had a jaw-drop moment the first time I heard about this; Coq is not a
pleasant language, and proving this thing must have been a nightmare.

~~~
justincormack
What does it do about the undefined behaviour parts of the spec?

~~~
skew
You can see the statement here:
[http://compcert.inria.fr/doc/html/Compiler.html#transf_c_pro...](http://compcert.inria.fr/doc/html/Compiler.html#transf_c_program_correct)

A C compiler doesn't have to promise much of anything about the behavior of
the code produced for a program with undefined behavior.

This statement could be vacuous for a program p with undefined behavior if
(Cstrategy.exec_program p beh) isn't true for any behavior beh, or only true
for a behavior which makes (not_wrong beh) false.

If they don't always, then they actually give you a guarantee for some bad
programs as well. Nothing in the spec says running a bad program _has_ to make
demons fly out your nose.

------
beza1e1
To be realistic: The frontend is not verified. Only very few optimizations are
done. It only seems to have a PPC backend.

However, i still think verified compilers are the future. It will just take
two or three decades until we see them at LLVM quality.

~~~
cwzwarich
It now has X86 and ARM backends.

------
protomyth
I seem to remember a CPU that was stack-based that was built for correctness.
I think it was called the Viper, but I cannot seem to find it. It would be
logical to me to run your formally trusted compiler on a chip designed the
same way.

~~~
daveasdf
You might be thinking of Verisoft and their "VAMP" hardware stack:
<http://www.verisoft.de/SubProject2.html>

The original goal of the project was to have a verified computer stack, with
proofs going from the software, compiler and processor all the way down to the
gate level.

I am not sure how far they got, but I don't think the project is still active,
which is a bit sad.

------
kemiller
Would someone explain "formally trust" in this sense?

~~~
dons
The trust you place is based on mathematical evidence for the soundness of the
compiler. The compiler is thus trustworthy, based on formal arguments, as
opposed to informal arguments (such as "we did some testing" or "no bugs are
were reported").

Going back to Kant, formal arguments in this sense are _a priori_ knowledge --
knowledge that is in some sense objective, and independent from any observer
making empirical measurements (as opposed to _a posteriori_ , or experimental,
knowledge).

~~~
kemiller
Thanks. My school's CS department was very strong in theory, among other
things, and so there were a lot of very mathy people in the program mixed in
with the more practical proles. Most of them were very nice, but there was
something of a communication problem because (I assume) they were too
naturally talented at such things to be able to explain them very well. I
remember one guy who would get this look of satisfaction on his face every
time a teacher would start talking about formalisms, as if he were eating his
favorite food, or sitting down after a long day on his feet, and that's my
first association with the word "formal" in a CS context.

~~~
Jebdm
A little bit of explanation about formalisms and the communication problems
you mention from a computer scientist/linguist who gets that same look of
satisfaction:

In natural languages, the relationships between utterances and what they
convey are somewhat fuzzy. Words like "tree" and "run" don't have single
precise meanings; they have a variety of senses associated with them, and
these senses vary in how closely they are related to each other. Biological
trees made of cells are quite closely related to computer science trees by
their shared shape; the relation between running with your legs and running
for office is a bit fuzzier, via the race metaphor; the relation between a
river bank and a financial bank is even more unclear.

Meaning in natural language is built in aggregate from associations, with the
senses of expressions converging over time in groups of communicating
speakers. Natural languages do not clearly define the relationships between
different expressions, nor between expressions and the real world. Because
natural languages are unspecified and evolving, you can only measure how often
(in your observation) different expressions and relationships between
expression reflect different situations.

Formalisms, like natural langauges, are also used to represent meaning via
expressions made of symbols and relationships between them. (Technically,
formalisms don't have to represent anything, but in practice they usually do
because otherwise they're not very useful.) Unlike natural languages, formal
languages have normatively defined specifications. These specifications can be
written in a variety of ways, but usually they define rules (which always
apply) for deriving expressions from each other.

To be able to use formal languages for real-life problems, you need to have
these five skills:

Symbol manipulation: deriving expression in desirable forms from old ones
using a specification Interpretation: relating formal expression to the domain
Encoding: capturing domain knowledge in the formal language Abstraction:
finding consistent patterns in the domain (to be expressed in the formalism)
Modeling: expressing relationships in the domain using relationships between
symbols

Unfortunately, most people are not very good at these skills. It is hard to
break habits like using intuition and relying on others' ability to resolve
ambiguity. Furthermore, most people are not very good at dealing with
abstractions, especially new ones or higher-level ones. To make things worse,
some of the problems involved in using formal systems are genuinely hard, even
for computers and mathematicians. Obviously, even the most accomplished
mathematicians are not always able to get the results they are looking for,
let alone quickly, but people who are good with formalisms usually don't have
as many problems with things like checking their intuition or dealing with
abstractions.

The problem with communicating formal ideas to people who are not good with
formalisms is rather fundamental, because the very attributes which make
formalisms useful are the ones that those people have trouble understanding.
You can, of course, relate the results of using a formalism, but without using
formalisms it is often very hard to usefully express the "why" of the result.
The problem is compounded because people who are good with formalisms have a
hard time figuring out what is understandable to others, due to the general
"knowing it too well to teach it" effect as well as lack of practice (since
most people who care about formalisms are good with them).

Some talented communicators are able to explain formal results satisfactorily
to others by relating them to familiar abstractions or rephrasing their logic
in everyday terms, but unfortunately this frequently results in
misunderstanding and misapplication of the results. Furthermore, there are a
number of formal results which really can't be properly understood unless you
have mastered the skills above (particularly in theoretical computer science
and formal logic). The problem is akin to trying to talk to non-programmers
about programming, but more general. Programming languages are formal
languages, of a sort, but most programmers only understand a few particular
formalisms. Theoretical computer science and proof theory involve dealing with
a larger and more abstract set of formalisms, including formalisms about
formalisms.

If you need examples of the difficulties, just look at all the ways that
people try to explain Gödel's incompleteness theorems or monads informally.

