
Evolved Policy Gradients - gdb
https://blog.openai.com/evolved-policy-gradients/
======
edhu2017
"EPG takes a step toward agents that are not blank slates but instead know
what it means to make progress on a new task, by having experienced making
progress on similar tasks in the past." Can someone explain to me how they
take a step? It seems like they just use random search define a loss function
for the sub-policy to optimize against. Is it because the loss function is
"learned" over the sequence of actions, making it adaptive?

------
twtw
TL;DR:

Parametrize your loss function and wrap a normal policy optimization with a
random search to find a better loss function. Don't call it "random search,"
call it "evolution strategies" to make it sound sophisticated.

Neat idea.

------
yohann305
Would someone here know how to go about recreating a physics sandbox using a
virtual robot arm with cubes in a game engine editor like Unity/UE4 where we'd
be able to apply ML?

Any suggestion is welcome

~~~
mikepurvis
You certainly could, and gamedev tools are better than ever before at
modelling real world physics. However, I would submit that if you want to
simulate a robotic arm, it would be better to use tools specifically designed
for that purpose. There are lots of reasonable-fidelity simulators of real
robot arms which work with Gazebo, and by using the larger ROS ecosystem, you
can also process simulated camera or depth camera data using standard
pipelines, which will also be truer to life than using Unity.

See for example:
[http://sdk.rethinkrobotics.com/intera/Gazebo_Tutorial](http://sdk.rethinkrobotics.com/intera/Gazebo_Tutorial),
[http://wiki.ros.org/ur_gazebo](http://wiki.ros.org/ur_gazebo),
[http://wiki.ros.org/katana_gazebo_plugins](http://wiki.ros.org/katana_gazebo_plugins)

