Pushing the Limits of Mathematical Reasoning in Open Language Models

daly · 2024-02-06T21:29:06 1707254946

Traditional AI approaches to games involves reinforcement learning (RL). This type of learning is also used post-LLM as reinforcement learning with human feedback (RLHF) to improve the LLM.

At the moment my effort involves a focus on mathematical proofs using this technology.

RL requires a few things. It requires a notion of 'states'. It requires 'actions' that can be applied within states. It requires a success/fail measure, either when moving from state to state or only at the end of an effort.

Now consider mathematical proofs.

The LEAN project[0] has developed an IDE[1] that guides proofs, It guides the user "from state to state" keeping track of the current state.

There is a database of axioms and theorems[2]. Each of these are strongly typed so they can only be applied in specific states.

There are "tactics" which are actions from state to state.

Success can be found when the proof is successful. Intermediate success can be measured by recording what tactics / axioms / theorems were applied.

The use of strong types means that there are only a few actions that apply in each state. Traditional arguments about exponential growth limiting brute force no longer apply.

SO ... we now have all of the elements of a game. Which means that we can apply reinforcement learning to LEAN proofs. I suspect that "proving theorems" will be one of the next areas overtaken by AI.

[0] http://www.contrib.andrew.cmu.edu/~avigad/Papers/lean_system...

[1] https://en.wikipedia.org/wiki/Lean_(proof_assistant)

[2] https://www.andrew.cmu.edu/user/avigad/Talks/mathematical_co...