
A Mathematical Proof Takes 200 Terabytes to State - ghosh
http://m.cacm.acm.org/news/202462-a-mathematical-proof-takes-200-terabytes-to-state/fulltext
======
aplis
And yes, the point is not to construct theorems with ridiculously long proofs,
the point is to proof the statements posed by mathematicians which attracted
the interest of mathematicians. From this perspective, the proofs for the
bounds found in both cases _are the only available proofs_ and that is why the
size is an interesting aspect. Notice that in EDP case, remarkable Tao's proof
for the general case does not give an explicit bound close enough to computed
using SAT proofs for a particular case (C=2).

------
aplis
Chriswarbo: not that simple. Neither "Wikipedia-sized proof", nor the last
200TB proof take their size from just "variable assignment for a SAT solver".
The size comes from "proof certificates" that SAT problem has no solution ==
effectively you need to provide a _verifiable evidence_ that search did not
find a solution. In positive cases, when solution exists, all the evidence you
need, is indeed just an assignment for the variables.

------
chriswarbo
Hmm, I've not been particularly impressed by the massive proofs I've seen
recently, e.g. the "Wikipedia-sized proof"[1] that's been doing the rounds. I
don't know about this particular proof, but the blowup in size can often come
from using a severely limited representation.

For example, the "Wikipedia-sized proof" seems to be the variable assignment
for a SAT solver, i.e. the problem was reduced to a boolean formula of the
form "a AND (b OR c) OR d AND ..." which, because it's such an inexpressive
language, requires exponentially more symbols than an equivalent encoding in,
say, first-order or higher-order logic. The proof is then an assignment of the
form "a = true, b = true, c = false, ..." such that the whole formula
evaluates to "true", and this proof is massive precisely because it must
encode so many bits.

The reason this is a little underwhelming is that mathematicians do not tend
to work in terms of boolean satisfiability. Adding just a little
expressiveness to the language can make statements and proofs much shorter,
and mathematicians tend to work at very high levels of abstraction.

Another reason why such proofs are underwhelming is that they don't really
give us any insights which we could re-use to perform other proofs; we'd just
have to start the solver from scratch on the new problem.

All they really tell us is that a proof exists, and hence the statement is
true rather than false. So, in effect, these "huge" proofs only really contain
a single bit of information. This can be made rigorous using algorithmic
information theory: given some encoding of the problem (allowed axioms and
deduction rules, plus the theorem statement) we only need a constant number of
bits to make a proof-searching program which enumerates all allowed rules
applied to all allowed axioms in all allowed ways, and halts when one
satisfies the theorem statement. The only unknown is whether the program will
halt, and that's the bit given by the proof. All of the rest of the proof is
redundant, since it can be generated (i.e. decompressed) by running the
program, since we know that it halts.

Anything else is basically a time/space tradeoff, to avoid having to wait for
such an enumerating program to finish. I'm pretty sure such gigantic proofs
are not pareto-optimal though (i.e. there's probably another proof out there
which is both shorter and faster to "decompress").

Of course, there's also the point that it's trivial to construct theorems
which require ridiculously long proofs (in some particular representation, no
matter how powerful) [2].

[1] [https://www.newscientist.com/article/dn25068-wikipedia-
size-...](https://www.newscientist.com/article/dn25068-wikipedia-size-maths-
proof-too-big-for-humans-to-check/)

[2] [https://en.wikipedia.org/wiki/G%C3%B6del%27s_speed-
up_theore...](https://en.wikipedia.org/wiki/G%C3%B6del%27s_speed-up_theorem)

