Yes, but AlphaZero is based on reinforcement learning, where there is a simple c...

Yes, but AlphaZero is based on reinforcement learning, where there is a simple cost function to optimize. There hasn't been much progress in applying reinforcement learning to LLMs to get them to self improve. I agree with the quote that this will be necessary to get superhuman performance in mathematics, and Lean may very well play a role there since it can help provide a cost function by checking correctness objectively.