Hacker News new | past | comments | ask | show | jobs | submit login

I agree that the main strength of AlphaGo seems to be evaluation, using supervised learning + reinforcement learning.

What I found interesting about AlphaGo's final algorithm is that there are so many different methods being used at once:

0. there's the monte carlo tree search. while this is definitely a "classic" tree search, this particular tree search algorithm is a fairly recent development, and relies heavily on statistics, which is perhaps somewhat less classical

1. the policy function approximation they use in the final algorithm, aka the policy network, is based on supervised learning + deep network model. but it is NOT the other policy network in the paper that was further tuned used reinforcement learning - that one made the overall system perform worse!

2. the value function approximation they use in the final algorithm isn't just a network. it's a linear combination of a network and a rollout approximation using a much weaker, faster, simpler evaluation function trained on different features. they find the system performs best when each is given an equal weight.

3. from what i understand, the value network is trained (at huge computational cost, particularly in generating the data set required) to give similar accuracy to the value function one could define by using the reinforcement-learning policy network. the value network gives similar valuations but runs 1500x faster. in some sense this isn't terribly algorithmically interesting - it is just an implementation detail to give faster results at game-time at the cost of a ridiculous amount of offline computation.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: