Hacker News new | past | comments | ask | show | jobs | submit login

I think the title is a bit misleading, this is a classic Monte Carlo method, but not MCTS per se. There is no tree structure built, but simply in the current state simulations are run for each action and the best one is picked.

I agree, this article is well explained but it's not MCTS. To be more accurate, this is Monte Carlo, and here it is applied to search the tree of game turn possibilities and doing stats on it. But Monte-Carlo Tree Search is a "reserved" name of a specific algorithm, involving more than what is done in the article : some kind of caching, and some optimisations. In fact, it would be a great addition to this article to do a part 2, and explaining how to extend the code to do MCTS. I imagine this two way trip would be a better MCTS presentation than the classic ones describing the algorithm step by step in one shot.

It would be very interesting to see proper MCTS applied to 2048. Monte Carlo Tree Search performs well on problems with linear reward. 2048 has exponential reward, with subsequent states having much higher tile scores. This nonlinearity tends to cause MCTS to fall into traps - one rollout will find a state evaluation with a much higher score, and will thereafter bias all subsequent rollouts to that same (potentially suboptimal) path. If there are clever ways around this, such as board evaluation heuristics that scale linearly, then I'd love to see them. In practice, a minimax approach that enumerates all successor states up to a particular depth has worked best for me.

Thanks for your feedback. My understanding was that my method used was MCTS without Reinforcement Learning. I'll do some more research and look into clarifying this in the future. Thanks again.

— Gabriel

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact