A lot of people reading the paper miss this. I guess it's not emphasized enough.... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

cosminro on Nov 5, 2017 | parent | context | favorite | on: Alpha Go Zero: How and Why It Works

A lot of people reading the paper miss this. I guess it's not emphasized enough.

In the first paper, the selfplay trained policy is about 1500 in elo rating, while darkforest2 a supervised trained policy from Facebook is around the same, if not better. So selfplay wasn't of much use the first time around. While in the AlphaZero paper the selfplay trained policy has about 3000 elo rating.

gwern on Nov 5, 2017 [–]

> A lot of people reading the paper miss this. I guess it's not emphasized enough.

Yeah, it's hilariously underemphasized. 1 sentence, literally. Fortunately I was able to ask Silver directly and get confirmation that it's the tree iteration: https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact