
Reinforcement Learning for Improving Agent Design - hardmaru
https://designrl.github.io
======
hardmaru
Hi,

TL;DR Usually in reinforcement learning, the body designs of robot agents are
fixed. Here, the agent can learn a task, and also learn a better design of its
body at the same time.

Link to the blog article:

[https://designrl.github.io/](https://designrl.github.io/)

(There is a blooper section at the very bottom of the article so check it
out!)

There's also some active discussion about this article on Twitter:

[https://twitter.com/hardmaru/status/1049817185055436800](https://twitter.com/hardmaru/status/1049817185055436800)

Some background:

I have been meaning to write up this fun little experiment for a while now,
after working on previous articles related to Reinforcement Learning (RL) and
Evolution Strategies (ES) that you may have read before from a previous HN
discussion[0].

While I was trying to solve a few RL/ES problems, I found the
BipedalWalkerHardcore[1] task extremely frustrating to solve (actually much
harder than all of the standard MuJoCo tasks that most papers are based on),
but eventually was able to crack it [2] after some effort. I was thinking
perhaps the agent's body was not really suited to solve this task, and even
minor tweaks here and there would result in making the task easier for an RL
algorithm to learn a good set of parameters of the agent's controller neural
network to perform well on the task.

There has been an exciting line of work on Passive Robotics where researchers
such as Tad McGeer[3] and Steve Collins[4] made walking robots that walked on
their own naturally without using any external power, unlike complicated,
inefficient robots like the Asimo[5] that had motors everywhere for
controlling each joint that is all managed by a central computer. In some
ways, many standard RL tasks we see are similar to the Asimo model where we
train a neural network to control a fixed, pre-determined robot. I thought it
might be an interesing little experiment if I also allowed the RL algorithm to
not only learn the parameters of the neural network controller, but also learn
a set of parameters that describe the structure of the agent's body at the
same time.

We also see work done using Evolution, such as Strandbeests[6], virtual
creatures by Karl Sims[7] and Soft Robots[8], where novel morphology is being
discovered (an excellent course on evolutionary robotics by Josh Bongard [9]).
While RL is great at many problems, I feel a limitation of RL is to discover
novel structures, although there have been recent attempts[10]. But at the
same time, RL is also much more sample efficient at learning the search space
of a pre-defined design, which is what this article tries to explore starting
with using only the simplest of all RL algorithms. Hopefully it will spark
more life and discussion in the area of morphology learning and generative
design in the RL community.

Any feedback welcome!

[0]
[https://news.ycombinator.com/item?id=15680180](https://news.ycombinator.com/item?id=15680180)

[1]
[https://gym.openai.com/envs/BipedalWalkerHardcore-v2/](https://gym.openai.com/envs/BipedalWalkerHardcore-v2/)

[2] [http://blog.otoro.net/2017/11/12/evolving-stable-
strategies/](http://blog.otoro.net/2017/11/12/evolving-stable-strategies/)

[3]
[https://www.youtube.com/watch?v=WOPED7I5Lac](https://www.youtube.com/watch?v=WOPED7I5Lac)

[4]
[https://www.youtube.com/watch?v=e2Q2Lx8O6Cg](https://www.youtube.com/watch?v=e2Q2Lx8O6Cg)

[5] [https://en.wikipedia.org/wiki/ASIMO](https://en.wikipedia.org/wiki/ASIMO)

[6] [http://www.strandbeest.com](http://www.strandbeest.com)

[7] [https://www.karlsims.com/evolved-virtual-
creatures.html](https://www.karlsims.com/evolved-virtual-creatures.html)

[8] [http://www.evolvingai.org/soft-robots](http://www.evolvingai.org/soft-
robots)

[9] [https://goo.gl/tz6gCG](https://goo.gl/tz6gCG)

[10]
[https://openreview.net/forum?id=r1Ue8Hcxg](https://openreview.net/forum?id=r1Ue8Hcxg)

~~~
formalsystem
Was wondering if you're also releasing your slightly modified version of the
Open AI gym that lets you tweak the environment? I hacked something together
for myself but am wondering if you have a cleaner solution.

~~~
hardmaru
I’ll be releasing the code to reproduce the experiments soon.

~~~
formalsystem
Looking forward to contributing!

