
Proving false in Coq using an implementation bug - clarus
https://github.com/clarus/falso
======
jerf
Well, I saw the title and was all prepared to link to Falso, but alas, I am
beaten to the punch. However I am even more impressed by the fact that since
the last time I looked at the Falso page, they now have a list of available
implementations, which has already been updated now to include Coq!

Meanwhile, my three year old is now looking over my shoulder asking "What's so
funny, dad? What's so funny?" and I have to admit... I have no idea how to
answer that.

~~~
darklajid
I'm 35. Does that help? Can you try to explain it to me then? :)

(I looked at the Readme, looked at the All.v file and .. I don't get it.)

~~~
jerf
Coq is a proof assistant, where "proof" is in the mathematical sense of the
term. Thanks to Curry-Howard isomorphism [1], it happens that there's a
mapping between proofs and programs, meaning that the "proof assistant" can do
things like "prove a sorting algorithm correctly sorts all inputs" in a
relatively straightforward manner. I mean, still fairly complicated and not
something that you'll encounter in an undergrad computer science education in
general, but feasible. For instance, a similar tool was recently used to prove
that the sorting algorithm in Python and Java had a bug [2], which was a bit
surprising given how well-exercised they were.

Something that a computer science program _should_ cover is that a logical
implication is always true when the antecedent is false. X -> Y is always true
when X is false. Thus, if you can convince your system to prove a false term,
you can set that as the X and write anything you want for the Y, and prove
anything. (That statement is actually glossing over a bit of discussion on how
you can legally introduce statements into your system, and there's variance
between the logics on that point anyhow, but that's what it comes out to in
most if not all systems.) Therefore, if you can prove a false statement, you
can literally prove anything.

This turns out to be an unhelpful characteristic for a proof system to
exhibit. In the case of Coq, it means you can now convince it to take any
program as satisfying any property. _Any_ program, satisfying _any_ property.
You want to prove that "return []" is a proper sorter for any input? This can
do that. Heck, you want to prove that "return []" is a valid fully HTTP5
rendering engine? We can do that too!

In this particular case, it exploits a bug in Coq to do that, but the fear
mathematicians have developed over the course of the 20th century is that
while the bug may be obvious, deeper, more subtle problems in Coq could
conceivably be used to prove "False" oven the course of a proof, accidentally
and buried in potentially hundreds or thousands of statements, subtly
rendering all such proofs flawed. Such are the things that keep mathematicians
up at night.

As long as I'm explaining things, let's also explain the joke in Falso. There
are, as the page says, a number of different axiom systems. Some of them are
simply wildly different; for instance, Euclid's axioms create Euclidean
geometry, whereas axioms about vertices and edges create graph theory. Some
are more related... the 20th century resulted in several "set theories", which
are ultimately characterized by different axiom sets that lead to different
places. Axioms are neither right nor wrong, they are the _definition_ of a
branch of math, so you can choose them as you like, then see where they take
you. Per the previous paragraphs, you prefer not to have an axiom system that
allows you to prove false because it reveals your system doesn't actually say
anything because it just says "Yes!" no matter what you ask. In some sense
this isn't "invalid" or "wrong" either, it's just useless. (And in some sense,
there's only one axiom set that just says "return true", there's just more and
less complicated ways of spelling it, so there's not much value in trying to
"study" one because it's identical to "return true" anyhow.)

The joke in Falso is that it simply takes as an axiom "For all P, the proof of
P is the null statement", which, for the purposes of this post, is essentially
"Assume false is true." From this, Falso can therefore prove... anything! In
one statement! At this point the "comparison table" ought to start making
sense. But the page is full of other little mathematician jokes... for
instance:

    
    
        Let us prove that P = NP by contradiction. Assume that P ≠ NP.
        Then, by the axiom of the Falso system, we have a contradiction.
        Therefore, P = NP.
    

The joke here being that under the Falso system, "proof by contradiction"
isn't actually necessary; you want "P = NP"? Just set it as the "P" in the
axiom and read it off in one step, "The empty statement proves P = NP". Proof
by contradiction is unnecessary frippery providing the thinnest, _thinnest_ of
covers over the triviality of the proof, which is itself the joke. As opposed
to a huge proof of dozens of steps, which could be funny in its own way, it is
also mathematician-funny to provide the _smallest possible_ obfuscation. (The
remainder of the joke is a slightly disguised allusion to the fact that "P ≠
NP" is equally trivially provable under Falso. Estatis Inc. need not wait
anxiously by their mailbox for their million dollars from the Clay institute.)

Mathematicians have a strange sense of humor. Strange, but wonderful.

As I said, the last time I encountered this page [3], which HN says was 2
years ago, it didn't have any implementations, it was just an extended joke.
The fact that it is now actually "available" is also sort of funny.

Oh, and finally, I can't read or use Coq either. Plus, even if you could, this
is ultimately a bug exploit, not a "real proof", which would make it even
harder to understand how what is being done relates to the final result; it's
like seeing some random-looking assembler that turns out to produce arbitrary
code execution due to some quirky combination of bugs, there isn't any
apparently relationship between what the assembler looks like it's doing and
the result it actually produces. Haskell's close enough for me for now. I'm
also willing to wait and see if homotopy type theory follows through on its
promises to make proofs simpler... at the moment, as I am ultimately a
software engineer and not a mathematician, I can't help but see current proof
technology as too much work for too little gain. I wish all those working to
change that assessment all the best, though.

[1]:
[http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspond...](http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence)

[2]: [http://envisage-project.eu/proving-android-java-and-
python-s...](http://envisage-project.eu/proving-android-java-and-python-
sorting-algorithm-is-broken-and-how-to-fix-it/)

[3]:
[https://news.ycombinator.com/item?id=5677646](https://news.ycombinator.com/item?id=5677646)

~~~
tel
The way the exploit works isn't actually that bad. It just has to do with
exploiting Coq's assumption that its bytecode VM evaluator is equivalent to
normal CBV reduction. Turns out that's false---though that was certainly
always a potential weakness in that assumption.

~~~
jerf
Yes, if I did not make it clear, what was actually demonstrated here was not
the sort of scarybad mathematical weakness that can only be extracted from a
long proof that I was talking about, it's basically just a bug in Coq, and in
the long term has little consequence other than providing a bit of fun.

------
anderskaseorg
This incorrectly passes the compiler, which is certainly a bug, but it does
not pass the validator.

    
    
      $ ./configure.sh
      $ make
      coqdep -c -slash -R . Falso "All.v" > "All.v.d" || ( RV=$?; rm -f "All.v.d"; exit ${RV} )
      coqc  -q  -R . Falso   All
      $ make validate
      coqchk -silent -o -R . Falso All
      Type error
      Makefile:133: recipe for target 'validate' failed
      make: *** [validate] Error 1

~~~
panic
What is the validator doing that the compiler isn't?

~~~
tel
My understanding is that it's just smaller. By focusing entirely on validation
it opens up less room for bugs.

------
solomatov
There's a problem with dependently typed languages, neither of them have a
small enough kernel to be verified formally.

As far as I know, the only system which achieves this goal is HOL.

~~~
jbapple
The Calculus of Constructions is dependently typed, and it has a reasonably
sized kernel, compared to HOL and ZFC:

Freek Wiedijk, "Is ZF a hack?", [http://www.cs.ru.nl/~freek/zfc-etc/zfc-
etc.pdf](http://www.cs.ru.nl/~freek/zfc-etc/zfc-etc.pdf)

OTOH, Coq is based on the Calculus of Inductive Constructions, which is both
(a) more complex that the calculus of constructions, and (b) has an
implementation in Coq that changes rapidly and is not formalized in any
published paper, as I understand it.

Finally, do you know if this was a bug in the kernel, or if some other
component (the compiler or caml virtual machine) can be blamed? It apparently
was a problem with `vm_compute`, which I don't think I've ever used.

~~~
solomatov
>The Calculus of Constructions is dependently typed, and it has a reasonably
sized kernel, compared to HOL and ZFC:

However, Calculus of Constructions doesn't have inductive constructions,
without which it's very hard to formalize interesting mathematics in it.

>Finally, do you know if this was a bug in the kernel, or if some other
component (the compiler or caml virtual machine) can be blamed? It apparently
was a problem with `vm_compute`, which I don't think I've ever used.

Actually, I am aware of another, much more serious problem with CoIC formalism
which was found when Coq was tried to be used to formalize homotopy type
theory. It was a bug in CoIC, not a software bug.

~~~
CHY872
> Actually, I am aware of another, much more serious problem with CoIC
> formalism which was found when Coq was tried to be used to formalize
> homotopy type theory. It was a bug in CoIC, not a software bug.

Do you have any references? Intrigued :)

~~~
vilhelm_s
The notion of guardedness (for structural recursion) was a bit too permissive.
This lead to inconsistency if you assume e.g. (False->False = True) (which is
implied by propositional extensionality, and by the univalence axiom). The
same issue affected Agda also. It was discussed a lot on the Coq and Agda
mailing lists, e.g. [https://sympa.inria.fr/sympa/arc/coq-
club/2013-12/msg00119.h...](https://sympa.inria.fr/sympa/arc/coq-
club/2013-12/msg00119.html) [http://agda.chalmers.narkive.com/E2UeRTOx/re-coq-
club-propos...](http://agda.chalmers.narkive.com/E2UeRTOx/re-coq-club-
propositional-extensionality-the-return-of-the-revenge)

------
yodsanklai
I've always thought that Coq type checker was small enough to be formally
verified.

~~~
clarus
What must be trusted in Coq in the kernel, which contains 30,000 lines of code
for the latest development version.

It cannot be completely verified by Coq itself due to the Gödel theorem (well,
now with this bug this is possible). Still, some part of it can be verified,
like the byte-code interpreter. There is actually an ongoing project to make a
certified JIT compiler for Coq:
[http://www.maximedenes.fr/download/coqonut.pdf](http://www.maximedenes.fr/download/coqonut.pdf)

~~~
yodsanklai
> It cannot be completely verified by Coq itself due to the Gödel theorem

What can't be proved in Coq is that the proof system is consistent.

However, we can in theory verify that the implementation of the proof system
satisfies its specification (edit: maybe that's what you were saying!).

~~~
amelius
Doesn't the Gödel theorem say that such proof cannot be found for _all_
conceivable systems; but that for any _particular_ system, a proof _could_ in
principle be found?

~~~
ttctciyf
I think it's something like this: you can't prove the consistency of a formal
system using that same formal system, unless it is in fact inconsistent, but
you can model a formal system from inside a larger formal system and then
prove things like consistency about the smaller system from inside the larger
one. But then the consistency of the larger system is unproven and you're
stuck with an endless regress into ever larger FS's as you pursue the chimera.

~~~
ObviousScience
Only for sufficiently powerful systems, I believe.

Things like first order logic are too simple to introduce the kinds of
statements Godel used to prove the incompleteness theorem about arithmetic.

------
synthecypher
Anyone care to ELI5?

~~~
Dewie
Logic systems can have bugs in them that make them inconsistent, which means
that you can prove _false_ (or _bottom_ or whatever), which means that you can
prove _anything_ in this logic. And that, in turn, makes the logic system
useless since for any given formula, it is provable in that logic. And being
able to prove _everything_ is not fun.

Then you have to patch your logic and make sure that none of your proofs
relied on that inconsistency. Or something like that.

~~~
seanmcdirmid
You can re-run your proofs once the bug is patched, so its not like this makes
all those proofs useless. Actually, that is the nice thing about Coq.

~~~
jsprogrammer
How do you know there isn't another, yet undiscovered, bug?

~~~
seanmcdirmid
You don't, but so what? Even proofs done well by hand can have obscure
mistakes; most of the time they don't invalidate the proof. Take a proof as
strong evidence of correctness and it all works out from a epistemological
point of view.

------
agumonkey
How much does that invalidates Coq results ? like the 4 colors theorem.

~~~
j2kun
AFAIK the four color theorem was not first proved with Coq, and people
consider it correct regardless of whether Coq has bugs.

~~~
cheatsheet
Yes, the four color theorem was proven by Appel and Haken on an IBM 370-168 at
the University of Illinois in June 1976. Coq was released in May 1989.

~~~
repsilat
I'd be surprised Appel and Haken did a _formal_ proof on their machine -- I
thought they made a conventional proof with a machine-assisted enumeration of
special-cases.

~~~
cheatsheet
My apologies for the lack of clarity: I was following the thought path that
was concerned with the validity of the four color theorem proof because of the
Coq bug.

------
minopret
Too bad this comes too late to earn a bounty in bitcoin at Proof Market.
[https://proofmarket.org/problem/view/12](https://proofmarket.org/problem/view/12)

------
ajuc
Obligatory refrence:

> Beware of bugs in the above code; I have only proved it correct, not tried
> it.

[Donald Knuth]

~~~
solomatov
Fortunately, this technology allows us to prove code correct and be sure about
it.

