Hacker News new | past | comments | ask | show | jobs | submit login

> AlphaZero uses Monte Carlo Tree Search. It definitely has randomness built in.

That's not what MCTS means.

MCTS refers to the bandit problem which formulated the search parameters. MCTS always chooses the "most interesting" path to explore. (Where "most interesting" is the path that balances explore-and-exploit hyper-parameters).

AlphaZero improved upon MCTS by deferring to the neural net as the hyper-parameter. But AlphaZero, for a given network on a given board-state, will ALWAYS choose the same position as "most interesting".

Turning AlphaZero from its current deterministic form into a random form would be an easy fix. But its just one example of how AlphaZero really isn't designed for competitive use yet (despite playing the best game of Go of all time). Instead of picking the top #1 move, maybe you randomly pick from the top 3 moves... or some other scheme.




MCTS is a random algorithm, and AlphaGo is no exception.

The AI selects a move. What state is the board in now? It doesn't know, because the opponent also selected a move.

MCTS models this with a probability distribution of the states, and samples from this distribution repeatedly to build an estimate of the effectiveness of each move it could make.

But what's the probability of each move made by the opponent? And after the simulation has looked as many moves ahead as it can in the time constraints, how good a position is it in?

These are the same question, really - what's the chance of winning from this board state. In Chess you can use a heuristic algorithm to figure it out. In Go, you can't. But you can use a neural network to learn an approximation that improves as it sees more games complete.

AlphaGo does this. MCTS is a random sampling technique, and the neural net informs its probability distributions, but doesn't make it deterministic.


Be it randomized algorithm or not, LeelaZero seems to play deterministically.

If given White in chess, LeelaZero plays 1. e4. Each time, every time. Guess what that means?

If you're building an opening chess database vs LeelaZero (or at least, this version of LeelaZero: https://lichess.org/@/LeelaZero-UK), you only have to worry about 1. e4 openings.


> If given White in chess, LeelaZero plays 1. e4. Each time, every time. Guess what that means?

Nothing.


Even if it is 100% deterministic (which I'm not convinced of, especially seeing as how distributed and thus ordering-dependent it is), if it's the best in the world, how does that help you compete against it? In order to take advantage of its determinism you'd need to be better than it, and nothing else is.


> if it's the best in the world, how does that help you compete against it? In order to take advantage of its determinism you'd need to be better than it, and nothing else is.

Play AlphaGo against itself. Go rarely has draws (requires Triple-Ko, a very, very rare position).

Almost every game you play with AlphaZero vs AlphaZero will result in a winner-and-loser. You will quickly be able to characterize the positions that AlphaZero loses in.


Go engine training surpasses this basic level of self-play effectively instantly.

The strongest early moves create the most potential for winning (maximizing potential winning paths, sort of); they do not push the game towards one best end state. They do not have counters. I saw elsewhere you have some understanding of the game (15kyu) so you should be able to demonstrate this to yourself by playing some of AlphaGo’s openings on a board and trying to write deterministic counters to them. You will not be able to push the AI into a situation where it has too few options to avoid loss. You will also find you need to create a book much larger than a few moves to meaningfully predict play and so will exceed the number of states that can be stored (referencing your 16TB comment elsewhere.)

Please actually try this as I think it is a key to improving your skill in addition to understanding the challenges in automating play.


But if you're not as least as good at it you won't even be able to get to those positions. You'll lose to it much earlier in much worse ways.


The cyborg player will have access to the best publicly available software for preparation a year ahead of any competition.

For most of us, that means we'll all have the best verison of LeelaZero to grab from Github and use in our own personal studies. Which should still be super-human in terms of play.


How is the human player adding anything to this partnership when the program is already so much better?

And don't forget that AlphaZero is gonna be getting better over that year too; you're trying to beat a moving target.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: