Hacker News new | past | comments | ask | show | jobs | submit login
Graph Representations for Higher-Order Logic and Theorem Proving (2019) (arxiv.org)
104 points by brzozowski on Sept 6, 2020 | hide | past | favorite | 15 comments

ONNX (and maybe RIF) are worth mentioning.

ONNX: https://onnx.ai/ :

> ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers

RIF (~FOL): https://en.wikipedia.org/wiki/Rule_Interchange_Format

Datalog (not Turing-complete): https://en.wikipedia.org/wiki/Datalog

HOList Benchmark: https://sites.google.com/view/holist/home

"HOList: An Environment for Machine Learning of Higher-Order Theorem Proving" (2019) https://arxiv.org/abs/1904.03241

> Abstract: We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment. HOL Light comes with a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture, from which we derive a challenging benchmark for automated reasoning. We also present a deep reinforcement learning driven automated theorem prover, DeepHOL, with strong initial results on this benchmark.

I wonder why they don't mention any work based on transformer architectures? The recent work on solving differential equations based on expressions in reverse polish notation seemed like a reasonable idea to apply to theorem proving as well.

Really cool work though!

Solving differential equations symbolically[^1] is a classic pattern matching problem though. It takes a lot of practice recognising which equations can be solved with which method/heuristic, but in the end, the available heuristics are somewhat limited in number and so it's not unreasonable that an ML model would perform well here. Notice that you will still need to actually check your result in some way or the other, because ML models can generate false positives and you really don't want that in maths; so it's not that you're replacing rule-based systems, you're just augmenting them with other approaches (and of course, verifying the solution to a differential equation is basically a solved problem in symbolic analysis, since taking derivatives is a simple rule-based procedure[^2]).

I think theorem proving is much different. The space of possible heuristics you may apply to a proof is basically infinite. It can take a tremendous amount of creativity and intuition to come up with a complicated and novel proof. While I can see ML models being of use for simpler proofs or lemmas within bigger proofs (such as simple epsilon-delta proofs etc.), I have a hard time imagining that they will really be able to do real proofs anytime soon.

[^1]: I emphasise "symbolically" because it is my understanding that outside of simple situations and university lectures, most people don't bother with that, instead solving differential equations numerically.

[^2] There are some subtleties around the fact that, afaik, deciding whether two expressions are equal is undecidable in the general case because at some point you will have to compare e.g. coefficients and the equality of real numbers is undecidable. In practice, you will consider two numbers to be equal if their difference is below a certain threshold, which could in theory also yield false positives, but I find it unlikely that this would occur in practice in the situation of solving an equation.

Actually OpenAI just published an article about using Transformer for theorem proving: https://arxiv.org/abs/2009.03393

I think they can use transformer in the same way for proof path generation: do pattern matching for complex logical statements to find those where specific rule can be applied, similarly as it has be done in differential equations paper.

How does that help and/or address any of what I wrote?

You can generate possible transformations from one proposition to another one, yes. But the space of possible transformations is infinite, how does the system know which ones to try? Moreover, even if you try some local transformation, how do you know that you have made progress?

A system like that could probably perform well on typical undergraduate proofs: epsilon-delta, proof that such and such a thing is a norm, etc., but it's gonna have a hard time with more difficult problems.

This is the differential equations paper I was thinking about https://medium.com/analytics-vidhya/solving-differential-equ...

A transformer is unable to really represent logic, let alone higher order logic and theorem proving.

A transformer is a universal function approximator. The question is whether it can do so reasonably efficiently. Trained on natural language linear sequences, I’m with you. Trained on abstract logical graph representations? I don’t think that question’s answered yet, unless I’m missing something.

How do transformers handle with truth tables, logical connectives, and propositional logic / rules of inference, and first-order logic?

Truth table: https://en.wikipedia.org/wiki/Truth_table

Logical connective: https://en.wikipedia.org/wiki/Logical_connective

Propositional logic: https://en.wikipedia.org/wiki/Propositional_calculus

Rules of inference: https://en.wikipedia.org/wiki/Rule_of_inference

DL: Description logic: https://en.wikipedia.org/wiki/Description_logic (... The OWL 2 profiles (EL, QR, RL; DL, Full) have established decideability and complexity: https://www.w3.org/TR/owl2-profiles/ )

FOL: First-order logic: https://en.wikipedia.org/wiki/First-order_logic

HOL: Higher-order logic: https://en.wikipedia.org/wiki/Higher-order_logic

In terms of regurgitating without critical reasoning?

Critical reasoning: https://en.wikipedia.org/wiki/Critical_thinking

Think about a program that can write code but not execute it. It’s not hard to get a transformer to learn to write python code like merge sort or simple arithmetic code, even though a transformer can’t reasonably learn to sort, nor can it learn simple arithmetic. It’s an important disambiguation. In one view it appears it can’t (learn to sort) and in another it demonstrably can (learn to code up a sort function). It can learn to usefully manipulate the language without needing the capacity to execute the language. They probably can’t learn to “execute” what you’re talking about (execute being a loose analogy), but I’d say the jury’s out on whether they can learn to usefully manipulate it.

Transformer is a function from seq of symbols to seq of symbols. For example, truth table is exactly similar kind of function from variables to [True, False] alphabet.

Transformer can represent some very complex logical operations, and per this article is turing complete: https://arxiv.org/abs/1901.03429, meaning any computable function, including theorem prover can be represented as transformer.

Another question is if it is feasible/viable/rational to build transformer for this? My intuition says: no.

Where can I read more about this?

It looks like Graph NNs are going places.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact