
A Robot Leg Learned to Walk by Itself Without Programming in Scarily Short Time - known
https://www.sciencealert.com/this-robot-leg-learned-to-walk-all-by-itself-without-any-programming
======
jimfleming
The paper[0] is less vague than the article. To put this in terms of
reinforcement learning:

1\. Sample actions from a random policy distribution.

2\. Fit an inverse model with supervised learning from this data. Inverse
models learn to map current observations and next observations to the action
which produced the next observation: f(s_t, s_t+1) -> a_t

3\. Use reinforcement learning to fit a policy which varies the next
observation towards a goal: p(s_t) -> s_t+1

4\. Use new data from attempts with the policy and inverse model working
together to continue training the inverse model.

Motor babbling is a quick way of generating data but it isn't particularly
efficient. The problem with taking random actions is that most of your data is
going to cover parts of the state space that aren't important for the task.
The addition of the policy allows biasing future attempts towards more useful
areas of the state space to continue training the inverse model.

This paper [1] also includes a forward and inverse model to improve sample
efficiency for more examples of these ideas.

[0]
[https://www.nature.com/articles/s42256-019-0029-0](https://www.nature.com/articles/s42256-019-0029-0)

[1] [https://arxiv.org/abs/1606.07419](https://arxiv.org/abs/1606.07419)

~~~
chroem-
They're essentially doing surrogate model optimization, but using a neural
network instead of Gaussian processes. This is the same way that the control
policy for the MIT Cheetah robot was created.

------
veryworried
Meh, after seeing a presentation on how genetic algorithms were used to figure
out the most optimal walk for a dinosaur with a given skeletal structure, this
does not seem that impressive or scary, especially since we probably have way
more computing power now for machine learning than we did almost a decade ago.

No programming necessary, just throw in a bunch of variable settings for bone
sizes, lengths, and weights, spawn hundreds of dinosaurs firing random muscle
movements, and breed only the ones that manage to walk the most distance by
pure luck with each generation, until you get descendants who are very good at
walking thousands of generations later.

~~~
mc32
Yeah, here is 1994 using genetic algorithms in action:
[https://www.youtube.com/watch?v=bBt0imn77Zg](https://www.youtube.com/watch?v=bBt0imn77Zg)

~~~
nck4222
Looks like a lot of those would benefit from a little more evolving.
Apparently the generations varied from 50-100. This paper is really
interesting, and shows how detailed the simulation is:
[http://www.karlsims.com/papers/siggraph94.pdf](http://www.karlsims.com/papers/siggraph94.pdf)

------
wpasc
Can we (HN community) please stop upvoting sciencealert garbage? They never
cease to post utterly sensationalist garbage that is always clickbait titled
and never accurately represents the information.

