AlphaZero is a system including models and search capabilities. This isn't a gre...

michaelnny · 2024-10-15T05:52:03.000000Z

One important aspect of the success of AlphaGo and its successor is the game environment is closed domain, and has a stable reward function. With this we can guide the agent to do MCTS search and planning for the best move in every state.

However, such reward system is not available for LLM in an open domain setting.

aabhay · 2024-10-14T18:22:21.000000Z

AlphaZero is not an llm though? Its primarily a convolutional network with some additional fully connected layers that guide an MCTS at inference time.

danielmarkbruce · 2024-10-14T18:26:38.000000Z

That too! It's definitely not an LLM. It would be a bad architecture choice (almost certainly...). For math an LLM would be a good choice. Arbitrary length input, sequential data, unclear where/which symbols to pay most attention to. Math notation is a human built language just like english.

Search is a great tool for AI, but you have to have a reasonable search space (like chess or go or poker). I'm not close enough to Lean to understand if it can do something like that (i believe not), but math doesn't feel like a thing where you can reasonably "search" next steps, for most problems.

danenania · 2024-10-15T01:11:09.000000Z

> It would be a bad architecture choice (almost certainly...)

Naively, it would seem like transformers could line up nicely with turn-based games. Instead of mapping tokens to language as in an LLM, they could map to valid moves given the current game state. And then instead of optimizing the next token for linguistic coherence as LLMs do, you optimize for winning the game.

danielmarkbruce · 2024-10-15T05:10:17.000000Z

A lot of the games usually used are markov. All the state is right there in front of you, doesn't matter how you got there. As an example - chess - it matters not how you got to state X (... someone will get the edge cases..).