Another problem is that you will very rarely find examples of failing proofs, since users rarely commit them. I'm working on a joint UW-UCSD project on collecting and analyzing development data, and this is one of the goals. But our dataset is not very large, since it is hard to get users to agree to something so invasive, and there are still only 1000 or so active Coq users in total in the world right now.
Another big problem is that it is very, very difficult to define what it means for a proof to be "almost" correct. This makes it difficult to apply many existing techniques fruitfully.
Nonetheless, I think the general spirit is right. The biggest gains in proof automation in ITPs this decade are going to come from automation that makes use in some way of existing proofs, rather than starting from scratch and "blindly" (often very effectively or even decideably for certain fragments of the logic or certain domains if the user knows the right tactic to call) searching for some term with a given type.
Let's say you write those proofs via induction over the natural numbers. You can choose between arguments to induct over if your functions take multiple inputs. The most effective choice often depends on the definitions of your functions. This means to pick the best argument, already you need to unfold your definition. Chances are you do that unfolding in your head, not using a tactic, or at least not one that you keep in your final proof.
So already, an ML technique for ITPs that does not unfold definitions in goals for more information is missing out on the essential piece that makes a user choose one argument for induction over another, even when syntactically the hypotheses and goals may look identical (I can define "add" to break down the first argument or the second argument, my call).
In general, I think the ITP community likes to pretend that we are using tactics to interact with a black box, but in practice, most of us have internalized heuristics that involve introspecting on the structures of terms and types. So any model that lacks the information to internalize those heuristics is missing out.
I suspect the one difficulty there would be finding such incremental changes on Github. It was surprisingly difficult to find small changes to specifications in Git history for a different project that I did. Too many large commits and too much history revision, so you often lose the incremental changes. Still, it is worth trying.
This feels a little misleading, the paper itself says that the phenomena "... suggests that theorems with longer proofs are much more challenging for the model." It'd be more interesting to see how the automated theorem proving length compares to the manual length for the same proofs (although I'd expect this to be biased downwards, for the same reason).
However, at the moment I'm a bit enamored with Metamath . It's basically a tiny system that just performs formal substitution. Everything else, including propositional calculus, predicate logic, and ZFC set theory is encoded within the language, rather than the implementation. I.e. it's a metalogic that only knows about simple substitution. That way one can easily investigate other logics and axiom systems, e.g. they have stuff for intuitionist logics, orthomodular lattice models, you name it.
Anyway, the project already has a pretty substantial collection of theorems, including a lot of analysis, topology, algebra and even QM. The cool thing is that you can easily dig up how the high level proofs trace back to the fundamental axioms, and there's no fancy math happening in the metalogic at all.
Super cool stuff.
I'd really love to see someone like Deepmind get a hold of this. They are perfectly placed to apply a lot of resources to it and it's right in their wheelhouse.
Don't know how to contact them about it though.
"DeepAlgebra - an outline of a program"
> We outline a program in the area of formalization of mathematics to automate theorem proving in algebra and algebraic geometry. We propose a construction of a dictionary between automated theorem provers and (La)TeX exploiting syntactic parsers. We describe its application to a repository of human-written facts and definitions in algebraic geometry (The Stacks Project). We use deep learning techniques.
> The PRESS program was originally developed, in 1974--75, as a vehicle to explore some ideas about controlling search in automatic theorem proving by using meta-level descriptions and strategies
A lot of this stuff was/is done/doable in Prolog. I've been looking at TLA+ recently and although I'm not to far into it yet, so far everything has been "Oh, I can do that in Prolog."
Here is Lecture 4 of Lamport's course: https://lamport.azurewebsites.net/video/video4.html
TLA+ source code:
VARIABLES small, big
TypeOK == /\ small \in 0..3
/\ big \in 0..5
Init == /\ big = 0
/\ small = 0
FillSmall == /\ small' = 3
/\ big' = big
FillBig == /\ big' = 5
/\ small' = small
EmptySmall == /\ small' = 0
/\ big' = big
EmptyBig == /\ big' = 0
/\ small' = small
SmallToBig == IF big + small =< 5
THEN /\ big' = big + small
/\ small' = 0
ELSE /\ big' = 5
/\ small' = small - (5 - big)
BigToSmall == IF big + small =< 3
THEN /\ big' = 0
/\ small' = big + small
ELSE /\ big' = small - (3 - big)
/\ small' = 3
Next == \/ FillSmall
type_ok(Small, Big) :- Small in 0..3, Big in 0..5.
next_dh(Moves) :- next_dh(0, 0, Moves).
next_dh(Small, Big, [[Move, Si, Bi]|Moves]) :-
die_hard(Move, Small, Big, Si, Bi),
(Bi = 4 -> Moves =  ; next_dh(Si, Bi, Moves)).
die_hard( fill_small, Small, Big, 3, Big) :- Small #< 3.
die_hard( fill_big, Small, Big, Small, 5) :- Big #< 5.
die_hard(empty_small, Small, Big, 0, Big) :- Small #> 0.
die_hard( empty_big, Small, Big, Small, 0) :- Big #> 0.
die_hard(small_to_big, Small, Big, S, B) :-
Big #< 5, Small #> 0,
small_to_big(Small, Big, S, B).
die_hard(big_to_small, Small, Big, S, B) :-
Small #< 3, Big #> 0,
big_to_small(Small, Big, S, B).
big_to_small(Small, Big, S, 0) :-
Small + Big #=< 3,
S #= Small + Big.
big_to_small(Small, Big, 3, B) :-
Small + Big #> 3,
B #= Big - (3 - Small).
small_to_big(Small, Big, 0, B) :-
Small + Big #=< 5,
B #= Small + Big.
small_to_big(Small, Big, S, 5) :-
Small + Big #> 5,
S #= Small - (5 - Big).
?- call_with_depth_limit(next_dh(Moves), 11, _).
Moves = [
[fill_big, 0, 5],
[big_to_small, 3, 2],
[empty_small, 0, 2],
[big_to_small, 2, 0],
[fill_big, 2, 5],
[big_to_small, 3, 4]
If you're only interested in the specification part of TLA+ but in proofs, that's great. But then it's not fair to claim that you can do everything in Prolog that TLA+ does.
> Do you have anything similar for Prolog?
Well, what you think of "Formal Methods: A First Introduction using Prolog to specify Programming Language Semantics"? https://homepage.cs.uri.edu/faculty/hamel/pubs/fcs16.pdf
> An important fundamental idea in formal methods is that programs are mathematical objects one can reason about. Here we introduce students and developers to these ideas in the context of formal programming language semantics. We use first-order Horn clause logic as implemented by Prolog both as a specification and a proof scripting language. A module we have written facilitates using Prolog as a proof assistant and insures that Prolog implements a sound logic. In order to illustrate our approach we specify the semantics of a small functional language and demonstrate various proof approaches and styles.
I just found this the other day and it seems promising.
> it's not fair to claim that you can do everything in Prolog that TLA+ does.
A claim I didn't make. "So far", I said, and, "I'm not to far into it yet" (sic). :-)
(Although it's trivially true because of Turing completeness, eh?)
I have great respect for Lamport. I would hate to be unfair to him.
Still, I'd like to see more work along these lines.
There's also PRESS, which seems relevant, "Solving symbolic equations with PRESS"
> We describe a program, PRESS, (PRolog Equation Solving System) for solving symbolic, transcendental, non-differential equations in one or more variables. PRESS solves autonomously, i.e. without guidance from the user. The methods used for solving equations are described, together with the service facilities. The principal technique, meta-level inference, appears to be relevant to the broader field of symbolic and algebraic manipulation.
I'm not trying to promote Prolog over TLA+ or anything, I've only just started with TLA+.