Hacker News new | past | comments | ask | show | jobs | submit login

Can you clarify why variance only scales linearly in the duration you are optimizing over? I would have expected it to be exponential, since the size of the space you are searching is exponential in the duration.



Re variance, the argument is not entirely bullet proof, but it goes like this: we know that the variance of the gradient of ES grows linearly with the dimensionality of the action space. Therefore, the variance of the policy gradient (before backprop through the neural net) should similarly be linear in the dimensionality of the combined action space, which is linear in the time horizon. And since backprop through a well-scaled neural net doesn't change the gradient norm too much, the absolute gradient variance of the policy gradient should be linear in time horizon also.

This argument is likely accurate in the case where exploration is adequately addressed (for example, with a well chosen reward function, self play, or some kind of an exploration bonus). However, if exploration is truly hard, then it may be possible for the variance of the gradient to be huge relative to the norm of the gradient (which would be exponentially small), even though the absolute variance of the gradient is still linear in the time horizon.



That makes sense, thanks for clarifying!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: