Hacker News new | past | comments | ask | show | jobs | submit login

AlphaGo was actually only trained on publicly available amateur (that is, strong amateur) games. After that, AlphaGo was trained by running a huge number of games against itself (reinforcement learning).

A priori, this makes sense: you don't need to train on humans to get a better understanding of the game tree. (See any number of other AIs that have learned to play games from scratch, given nothing but an optimization function.)

Yes, but is it known if there's some limit to what you can reach doing this? I mean, if they trained it on games of bad amateur players instead of good, and then played itself, will it keep improving continuously to the current level or hit some barrier?

That's why they only initially trained it on human players, and afterwards, they trained it on itself. I would guess (strongly emphasize: guess) that they trained it on humans just to set initial parameters and to give it an overview of the structure and common techniques. It would've probably been possible to train AlphaGo on itself from scratch, but it would've taken much longer -- amateur play provides a useful shortcut.

I don't think there is a theoretical upper limit on this kind of learning. If you do it sufficiently broadly, you will continuously improve your model over time. I suppose it depends to what extent you're willing to explicitly explore the game tree itself.

There is always a risk of getting stuck in a local maxima, thinking you've found an optimal way of playing, so you'd need more data that presents different strategies, I'd think.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact