Hacker News new | past | comments | ask | show | jobs | submit login

Yes, but AlphaZero is based on reinforcement learning, where there is a simple cost function to optimize. There hasn't been much progress in applying reinforcement learning to LLMs to get them to self improve. I agree with the quote that this will be necessary to get superhuman performance in mathematics, and Lean may very well play a role there since it can help provide a cost function by checking correctness objectively.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: