Hacker News new | past | comments | ask | show | jobs | submit login

"and an optimal solution so complex it appears infeasible to directly approximate using a policy or value function."

To be clear, the above refers to specific concepts in Reinforcement Learning.

A policy is a function from state (in Go, where all the stones are) to action (where to place the next stone). I agree that it is unlikely to have an effective policy function. At least one that is calculated efficiently (no tree search)... otherwise its not what a Reinforcement Learning researcher typically calls a policy function.

A value function is is a function from state to numerical "goodness", and is more or less one step removed from a policy function: you can choose the action that takes you to the state with the highest value. It has the same representational problems found there.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: