Hacker News new | past | comments | ask | show | jobs | submit login
Multiplayer AlphaZero (arxiv.org)
53 points by hardmaru 21 days ago | hide | past | web | favorite | 17 comments

Nice and concise paper. All that changed was outputting a vector of predicted game values, as we go from having a zero-sum game with two players (which can be summarized with a single value) to a game with n players. Makes sense.

The comparison against MCTS shows strong performance from AlphaZero. Would be curious to see the performance of AlphaZero vs. number of its own rollouts - ie is the probability head output alone already encoding enough information to play well, and how deep does it have to look ahead to determine good play. Finally, for tic-tac-mo and connect 3x3, it should be possible to determine the optimal move. How much training / lookahead is required to achieve that? Does AlphaZero achieve perfect play for these games?

The paper's first listed contribution is "an independent reimplementation of DeepMind's AlphaZero algorithm". Maybe I missed it, but I don't see a link to a repo with the implementation.

Interesting results but they really only use toy examples (Tic-Tac-Toe and Connect Four). It would be kind of surprising if it didn't work.

> e.g. equity trading

As much as I liked to get excited bout these kinds of papers:

"Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware."

[1] https://github.com/leela-zero/leela-zero

Yeah, but that's an embarassingly parallel task, so you can recreate it with sufficient resources. If there's a sufficiently lucrative application, businesses will happily pay that.

And it's already been done for the chess version, in a community effort[0]. Well, "recomputing the weights" isn't plausible for a neural network of this size, but computing weights that give the network a similar level of performance.

[0] https://lczero.org/

One thing I think that will be critical to the future of software is community efforts to build deep neural nets that match the AI power of the big companies.

There are newer approaches where specialising for the problem domain (Go) gets you a 5-100x speedup: https://github.com/lightvector/KataGo/

Indeed that's a weird thing to say, especially since AFAIK AlphaGo is for (very large) discrete search spaces (where Monte Carlo Tree Search would be used), & equity trading doesn't strike me as a discrete search space.. is it ?

Yes? At any point in time you can buy or sell a discrete number of shares in a finite universe of stocks.

The bigger difference in equity trading is that it's a "game" of hidden information with asymmetric and maybe unknown payoffs. Of course, computers are pretty good at trading stocks, too, but I can say confidently AlphaZero won't be the most useful approach.

A part missing in your analysis is that time is involved in the decisions, and that time is continuous, and that time is very important

Is time really continuous in this case? I have absolutely no idea how it's implemented in practice, but would assume that stock exchange computers use some sort of game loop at a consistent rate.

Sound is not discrete either, it does not seem to be a problem on computers, though.

That's the feature space, not the decision space

Well at least the second author, Tucker Balch, is a computational investing consultant / lecturer. I'm sure they'll continue to work this angle, even if it's not yet there.

Anyone who's interested to see how brilliant AlphaZero played against the best chess computer (then), with solid explanations: https://www.youtube.com/watch?v=lFXJWPhDsSY

I'd recommend instead the videos made by Matthew Sadler for chess24[0]. Unlike the guy in your videos, Sadler is a strong chess player and literally wrote the book [1] on Alphazero. The videos are approachable for all levels of chess ability.

[0] https://www.youtube.com/playlist?list=PLAwlxGCJB4NchyTBYik8F... [1] https://www.newinchess.com/game-changer


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact