Hacker News new | past | comments | ask | show | jobs | submit login
Learning to prove theorems via interacting with proof assistants (acolyer.org)
126 points by pgunt 30 days ago | hide | past | web | favorite | 30 comments

There has been a lot of work like this popping up in the past year. I think it's somewhat promising, but for ML for ITPs to really become useful, I think the models need to better take into account semantic information about terms and types and not just tactics. And this is not easy, because Coq's logic is higher-order and dependently typed, and most work in ML for PL deals with much simpler logics. Still, anything that helps with the tedium is welcome.

Another problem is that you will very rarely find examples of failing proofs, since users rarely commit them. I'm working on a joint UW-UCSD project on collecting and analyzing development data, and this is one of the goals. But our dataset is not very large, since it is hard to get users to agree to something so invasive, and there are still only 1000 or so active Coq users in total in the world right now.

Another big problem is that it is very, very difficult to define what it means for a proof to be "almost" correct. This makes it difficult to apply many existing techniques fruitfully.

Nonetheless, I think the general spirit is right. The biggest gains in proof automation in ITPs this decade are going to come from automation that makes use in some way of existing proofs, rather than starting from scratch and "blindly" (often very effectively or even decideably for certain fragments of the logic or certain domains if the user knows the right tactic to call) searching for some term with a given type.

As an example of using semantic information, you might define a bunch of functions that take two natural numbers and return some result. Then you might write some proofs about those functions.

Let's say you write those proofs via induction over the natural numbers. You can choose between arguments to induct over if your functions take multiple inputs. The most effective choice often depends on the definitions of your functions. This means to pick the best argument, already you need to unfold your definition. Chances are you do that unfolding in your head, not using a tactic, or at least not one that you keep in your final proof.

So already, an ML technique for ITPs that does not unfold definitions in goals for more information is missing out on the essential piece that makes a user choose one argument for induction over another, even when syntactically the hypotheses and goals may look identical (I can define "add" to break down the first argument or the second argument, my call).

In general, I think the ITP community likes to pretend that we are using tactics to interact with a black box, but in practice, most of us have internalized heuristics that involve introspecting on the structures of terms and types. So any model that lacks the information to internalize those heuristics is missing out.

Idea for getting "almost correct" proofs: Take a big Coq project like CompCert or whatever. Search the version control history for commits with small changes to definitions. These very likely come with corresponding changes to proofs as well. Now apply only the code changes but not the proof changes: You get a state with "almost correct" proofs and as a bonus known "ground truth" proofs as well.

This is a really interesting idea. In some sense, you can say a proof script is "almost correct" if it proves a slightly different theorem.

I suspect the one difficulty there would be finding such incremental changes on Github. It was surprisingly difficult to find small changes to specifications in Git history for a different project that I did. Too many large commits and too much history revision, so you often lose the incremental changes. Still, it is worth trying.

Good point about the difficulty of finding incremental changes. Maybe it's better the other way round: Pick an arbitrary commit and only roll back the changes it introduces to one proof. This would model the situation where a programmer has just changed the code and is now setting out to adjust the proof.

Have you thought an interactive site where people could learn coq writing proofs? Something very similar to https://projecteuler.net/. You could get thousands of failed proofs very fast.

That's a neat idea. It would mostly help for gathering data from beginners, which would be very skewed, but I'm sure it could still be useful, especially for developing tools to help beginners.

Yeah skewing would be definitely problematic, it might be better if it would be similar to Kaggle where people would compete with each other.

This is a great idea. As I interested novice I'd certainly be a early adopter.

From the article: "As an extra bonus, the generated proofs tend to be shorter than the ground truth proofs collected in CoqGym."

This feels a little misleading, the paper itself says that the phenomena "... suggests that theorems with longer proofs are much more challenging for the model." It'd be more interesting to see how the automated theorem proving length compares to the manual length for the same proofs (although I'd expect this to be biased downwards, for the same reason).

Oh neat. I recently spent a good bit of time comparing different formal proof systems. The Mizar project [0] is pretty interesting since it's proofs look a lot like standard proofs in the literature.

However, at the moment I'm a bit enamored with Metamath [1]. It's basically a tiny system that just performs formal substitution. Everything else, including propositional calculus, predicate logic, and ZFC set theory is encoded within the language, rather than the implementation. I.e. it's a metalogic that only knows about simple substitution. That way one can easily investigate other logics and axiom systems, e.g. they have stuff for intuitionist logics, orthomodular lattice models, you name it.

Anyway, the project already has a pretty substantial collection of theorems, including a lot of analysis, topology, algebra and even QM. The cool thing is that you can easily dig up how the high level proofs trace back to the fundamental axioms, and there's no fancy math happening in the metalogic at all.

Super cool stuff.

[0]:http://www.mizar.org [1]:http://metamath.org

And another


I'd really love to see someone like Deepmind get a hold of this. They are perfectly placed to apply a lot of resources to it and it's right in their wheelhouse.

Don't know how to contact them about it though.

Related: https://arxiv.org/abs/1610.01044

"DeepAlgebra - an outline of a program"

> We outline a program in the area of formalization of mathematics to automate theorem proving in algebra and algebraic geometry. We propose a construction of a dictionary between automated theorem provers and (La)TeX exploiting syntactic parsers. We describe its application to a repository of human-written facts and definitions in algebraic geometry (The Stacks Project). We use deep learning techniques.

What makes you think they aren't?


"Solving Symbolic Equations with PRESS"


> The PRESS program was originally developed, in 1974--75, as a vehicle to explore some ideas about controlling search in automatic theorem proving by using meta-level descriptions and strategies


A lot of this stuff was/is done/doable in Prolog. I've been looking at TLA+ recently and although I'm not to far into it yet, so far everything has been "Oh, I can do that in Prolog."

Can you explain what parts of the TLA+ stuff you can express in Prolog and why you think that might be better (or equally good)? It's true that modeling languages all have something in common with Prolog, but do you think that proofs about a similar Prolog version would be nicer than in TLA+?

I don't know if it's better or nicer. "I am but an egg."

Here is Lecture 4 of Lamport's course: https://lamport.azurewebsites.net/video/video4.html

TLA+ source code:

    EXTENDS Integers

    VARIABLES small, big   
    TypeOK == /\ small \in 0..3 
            /\ big   \in 0..5

    Init == /\ big   = 0 
            /\ small = 0

    FillSmall == /\ small' = 3 
                /\ big'   = big

    FillBig == /\ big'   = 5 
            /\ small' = small

    EmptySmall == /\ small' = 0 
                /\ big'   = big

    EmptyBig == /\ big'   = 0 
                /\ small' = small

    SmallToBig == IF big + small =< 5
                THEN /\ big'   = big + small
                        /\ small' = 0
                ELSE /\ big'   = 5
                        /\ small' = small - (5 - big)

    BigToSmall == IF big + small =< 3
                THEN /\ big'   = 0 
                        /\ small' = big + small
                ELSE /\ big'   = small - (3 - big)
                        /\ small' = 3

    Next == \/ FillSmall 
            \/ FillBig    
            \/ EmptySmall 
            \/ EmptyBig    
            \/ SmallToBig    
            \/ BigToSmall   

This is what I came up with (SWI Prolog). Note that I made some constraints explicit to prune the search, like guarding the empty_FOO steps with constraints that the jugs must not already be empty:

    :- use_module(library(clpfd)).

    type_ok(Small, Big) :- Small in 0..3, Big in 0..5.

    next_dh(Moves) :- next_dh(0, 0, Moves).

    next_dh(Small, Big, [[Move, Si, Bi]|Moves]) :-
        type_ok(Small, Big),
        die_hard(Move, Small, Big, Si, Bi),
        (Bi = 4 -> Moves = [] ; next_dh(Si, Bi, Moves)).

    die_hard( fill_small, Small, Big, 3, Big) :- Small #< 3.
    die_hard(   fill_big, Small, Big, Small, 5) :- Big #< 5.
    die_hard(empty_small, Small, Big, 0, Big) :- Small #> 0.
    die_hard(  empty_big, Small, Big, Small, 0) :- Big #> 0.

    die_hard(small_to_big, Small, Big, S, B) :-
        Big #< 5, Small #> 0,
        small_to_big(Small, Big, S, B).

    die_hard(big_to_small, Small, Big, S, B) :-
        Small #< 3, Big #> 0,
        big_to_small(Small, Big, S, B).

    big_to_small(Small, Big, S, 0) :-
        Small + Big #=< 3,
        S #= Small + Big.

    big_to_small(Small, Big, 3, B) :-
        Small + Big #> 3,
        B #= Big - (3 - Small).

    small_to_big(Small, Big, 0, B) :-
        Small + Big #=< 5,
        B #= Small + Big.

    small_to_big(Small, Big, S, 5) :-
        Small + Big #> 5,
        S #= Small - (5 - Big).

And here's a query with depth limit and some manual reflow of the list for presentation...

    ?- call_with_depth_limit(next_dh(Moves), 11, _).
    Moves = [
        [fill_big, 0, 5],
        [big_to_small, 3, 2],
        [empty_small, 0, 2],
        [big_to_small, 2, 0],
        [fill_big, 2, 5],
        [big_to_small, 3, 4]
        ] ;

This is very neat. It shows that Prolog is very close to being the holy grail of an executable specification language. But my question was about verification: TLA+ is not just used for modeling but also for proving stuff about the models. Do you have anything similar for Prolog? If yes, I would love to use it :-)

If you're only interested in the specification part of TLA+ but in proofs, that's great. But then it's not fair to claim that you can do everything in Prolog that TLA+ does.

> This is very neat.


> Do you have anything similar for Prolog?

Well, what you think of "Formal Methods: A First Introduction using Prolog to specify Programming Language Semantics"? https://homepage.cs.uri.edu/faculty/hamel/pubs/fcs16.pdf


> An important fundamental idea in formal methods is that programs are mathematical objects one can reason about. Here we introduce students and developers to these ideas in the context of formal programming language semantics. We use first-order Horn clause logic as implemented by Prolog both as a specification and a proof scripting language. A module we have written facilitates using Prolog as a proof assistant and insures that Prolog implements a sound logic. In order to illustrate our approach we specify the semantics of a small functional language and demonstrate various proof approaches and styles.

I just found this the other day and it seems promising.

> it's not fair to claim that you can do everything in Prolog that TLA+ does.

A claim I didn't make. "So far", I said, and, "I'm not to far into it yet" (sic). :-)

(Although it's trivially true because of Turing completeness, eh?)

I have great respect for Lamport. I would hate to be unfair to him.

Oh, that paper is very interesting, thanks for that. It's not really comparable to a fully formal system, though: The user essentially chooses which trivial things to prove to convince themselves of something. For example, a real proof assistant would tell you what exactly to prove for the inductive proof in the factorial example. In this system you pick an induction principle yourself and hope it's correct.

Still, I'd like to see more work along these lines.

It's Prolog, so there is probably already more. I just found a bachelor thesis from 2011 "Implementation of a Proof Assistant in Prolog" https://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6043/p... I haven't read it yet.

There's also PRESS, which seems relevant, "Solving symbolic equations with PRESS"

> We describe a program, PRESS, (PRolog Equation Solving System) for solving symbolic, transcendental, non-differential equations in one or more variables. PRESS solves autonomously, i.e. without guidance from the user. The methods used for solving equations are described, together with the service facilities. The principal technique, meta-level inference, appears to be relevant to the broader field of symbolic and algebraic manipulation.



I'm not trying to promote Prolog over TLA+ or anything, I've only just started with TLA+.

The work described in the linked article is very different from what PRESS is doing.

Yeah, I should have said that. I'm just pointing out that Prolog has been used to do things that Coq and Idris do.

This is really very cool, and makes me want to pick up Coq again. I know the set of formally verified proofs is very small as compared to traditional mathematics, but I think Coq and its cousins are vital for a future revolution in mathematics.

I was hoping this was going to be about education;anyone have tips or links on using coq to get better at math proofs?

Is anyone aware of a proof system that can do independence proofs over a set of axioms?

All of them? I don't understand why you think existing systems are not able to handle independence proofs. You embed your model in the logic of the system, and this approach works for all proof assistants.

Because embedding a model in the logic of the system is not practically achievable in many systems

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact