Hacker News new | past | comments | ask | show | jobs | submit login

A lot of people reading the paper miss this. I guess it's not emphasized enough.

In the first paper, the selfplay trained policy is about 1500 in elo rating, while darkforest2 a supervised trained policy from Facebook is around the same, if not better. So selfplay wasn't of much use the first time around. While in the AlphaZero paper the selfplay trained policy has about 3000 elo rating.




> A lot of people reading the paper miss this. I guess it's not emphasized enough.

Yeah, it's hilariously underemphasized. 1 sentence, literally. Fortunately I was able to ask Silver directly and get confirmation that it's the tree iteration: https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: