
‘Deep learning’ technique enables robot mastery of skills via trial and error - joshwa
http://newscenter.berkeley.edu/2015/05/21/deep-learning-robot-masters-skills-via-trial-and-error/
======
karpathy
Sometimes it's hard to separate signal from noise when you're not part of a
field and just hearing about projects/papers, so I wanted to quickly pitch in
to say that this is a legitimately ground-breaking approach and line of work
that you can expect to hear much more about in the future. It's probably the
most exciting robotics/manipulation project I'm currently aware of.

What's exciting here is that the entire system is trained end-to-end
(including the vision component). In other words, it's heading towards
agents/robots that consist entirely of a single neural net and that's it;
There is no software stack at all - it's just a GPU running a neural net "code
base", from perception to actuators. In this respect this work is similar to
the Atari game-playing agent that has to learn to see while also learning to
play the game. Except this setting is quite a lot more difficult in some
respects; In particular, the actions in the Deepmind Atari paper are few and
discrete, while here the robot is an actual physical system with a very large-
dimensional and continuous action space (joint torques). Also, if you're new
to the field you might think "why is the robot so slow?", while someone in the
field is thinking "holy crap how can it be so fast?"

~~~
tormeh
Neuroscience suggests that the brain has many fixed-purpose computational
units, some neural nets and others not. While doing everything in a single
network is indeed impressive, I am concerned about whether this is the most
efficient approach.

EDIT: Downvoter, care to explain the reason for downvoting?

~~~
zeidrich
Why are you concerned? It's an approach. It seems to work pretty well. It's
possible it could be improved, does that mean we should not display the
results of this because it might not be the most efficient result?

I didn't downvote, but it's tiresome to hear "Yeah, that's all cool and
everything, but I'm concerned that it might not be the absolute best." Nobody
claimed it was the most efficient approach ever possible, only that it was
cool, surprising, and reasonably ground breaking.

You can't really tell if it's the most efficient approach until you try it,
and compare it with a model that you think might be a more efficient approach.

~~~
tormeh
I said it was impressive and I meant it.

------
deepnet
Learning motor torques directly from vision is a very important result.

[https://youtu.be/EtMyH_--vnU](https://youtu.be/EtMyH_--vnU)

This talk by Sergey Levine, Pieter Abbeel's PostDoc outlines Berkley's end-to-
end deep-training visuomotor control in detail.

Here is the paper :

End-to-End Training of Deep Visuomotor Policies, Sergey Levine _, Chelsea
Finn_ , Trevor Darrell, Pieter Abbeel.

[http://arxiv.org/abs/1504.00702](http://arxiv.org/abs/1504.00702)

~~~
deepnet
Also

Learning Contact-Rich Manipulation Skills with Guided Policy Search Levine,
Wagener, Abbeel

[http://rll.berkeley.edu/icra2015gps/robotgps.pdf](http://rll.berkeley.edu/icra2015gps/robotgps.pdf)

[http://rll.berkeley.edu/icra2015gps/](http://rll.berkeley.edu/icra2015gps/)

------
mark_l_watson
I probably made a career direction error in the early 1990s. I had been on
DARPA'S neural network tools advisory panel and written the SAIC Ansim
product, but moved on because of a stronger interest in natural language
processing. Now, I think deep learning is getting very interesting for NLP.

This UCB project looks awesome!

BTW, I took Hinton's Coursera neural network class a few years ago, and it was
excellent. Take it if that course is still online.

------
dm3
Could someone explain in simple terms how is the target set to the robot so
that it can learn to accomplish the task? For example, what inputs are
provided in order for it to understand that it needs to put the cap on the
bottle?

~~~
deepnet
The robot learns to see it's arm and the target using a Imagenet trained CNN.

The robot is then shown the task a few times. A human controls the robot for a
few minutes, performing the task.

Then an innovative policy search finds a robust policy so the robot can
perform the task from any initial position and is robust to changes such as
the addition of a shirt to the hanger task after training.

Potentially the robot can learn from videos of humans performing the task -
i.e. by copying people.

~~~
your_ai_manager
This is some truly impressive work, I would expect the next step is to have
the robot 'guess' what a new task is based on its similarity to previously
completed tasks.

i.e. when given wood with a protruding nail + hammer, it relates the task to a
previously trained Whac-A-Mole scenario and begins hammering the nail in.

------
jonnycowboy
It seems most of the code behind this effort is open source as well!
[http://lfd.readthedocs.org/en/latest/](http://lfd.readthedocs.org/en/latest/)
[https://github.com/cbfinn/caffe](https://github.com/cbfinn/caffe)

------
alkonaut
Wouldn't this benefit from simulation of the task (from the robot's
perspective)? Doing something physical over and over again on ONE single robot
must be very slow and inefficient compared to if it could be simulated. Even
if the simulated training isn't spot on, the physical robot could start off
with network weights from millions of attempts in a simulated environment.

~~~
irl_zebra
I think that the efficiency of the specific task isn't actually the point
here, but rather how the robot is progressing on learning the task on its own.
In the future when this is put to practical application, I don't doubt that
what you're saying would be employed to make a robot's learning curve more
gentle.

------
beefman
I'm impressed it (apparently) learned to align screw caps with a short
backward turn at the start.

Then again, why do we make so many containers with these ungainly screw caps?
Ever use those caps (popular in Japan) with the locking track that only take a
quarter-turn to close? Examples

[http://www.amazon.com/Yu-Be-Moisturizing-Skin-Cream-
Skin-1/d...](http://www.amazon.com/Yu-Be-Moisturizing-Skin-Cream-
Skin-1/dp/B0001UWRCI/)

[http://www.amazon.com/Biotene-PBF-Toothpaste-Ounce-
Pack/dp/B...](http://www.amazon.com/Biotene-PBF-Toothpaste-Ounce-
Pack/dp/B00JX73B2A)

------
adekok
While it is how humans learn, there's more to human learning than that. Babies
are pre-wired to learn language, recognize shapes, determine "intent", etc.

This means that the neural nets used by babies are pre-wired to be good at
specific tasks. Then, babies use those neural nets to do "deep learning" for
the final part of the process.

Starting from _nothing_ and learning how to do a job is a big step. But having
_something_ would be a better start position. What that something is, though,
is hard to define.

~~~
gldalmaso
>> _Babies are pre-wired to learn language, recognize shapes, determine
"intent", etc._

I tend to disagree. In my perception (as father of two if it counts) is that
babies are very poorly wired if at all.

They struggle with basic survival skills like breastfeeding. Some babies get
it in the first couple of days, other take weeks of "training" with the help
of adults. Awareness of needing sleep seems to be entirely absent (crying is
not the best strategy for animals to sleep, huge bug).

Things like language, shapes and intent, are all developed later, and can go
entirely undeveloped without stimulation and feedback, so I'd say they are
already a product of learning and not pre-wiring.

The only thing I can think of that is most certainly pre-wired is crying. They
nail that from day one.

Also don't forget that they are already sensory capable of a lot several weeks
before being born, and voice recognition for one thing is something they learn
around that time.

~~~
adekok
> babies are very poorly wired if at all.

As a father myself, I don't agree. I find it impossible to believe that babies
are wired _poorly_ , or randomly, or just are amorphous blobs of learning.
They're active and inter-active from a very early age. Even before they're
born.

Their brain is still growing connections, and re-wiring itself based on
sensory input / feedback. i.e. blind people co-opt the vision centers to
process sound.

But there _is_ a vision center. There are portions of the brain which are pre-
wired to be good at certain activities.

If nothing else, look at the inputs. The nerves from the retina and ears go
_somewhere_. They don't just disappear into random parts of the brain. They're
pre-wired to certain areas. Those areas are in turn pre-wired to be good at
accepting certain inputs.

In contrast, many animals have much more hard-wired behavior. And insects are
little automatons. Are we really going to say that animals are pre-wired
with... nothing? And that they learn all of their behavior after they're born?

I find that even harder to believe than the idea that the brain is pre-wired
to be good at some things.

------
lowglow
If you're interested in this, I'm putting together a meetup/workshop/lab at
the Palace of Fine Arts in SF every weekend. Come out and share, learn, and
build with other people interested in this field.

Think of it as the Home Brew Computer Club for Robotics/AI :)

[https://www.facebook.com/groups/762335743881364/](https://www.facebook.com/groups/762335743881364/)

------
Qantourisc
It acts very organic. But I have to wonder if the organic motion is a good
thing. Wouldn't it be more efficient to control the arm using IK, but let the
robot "think" where the arm should be ? I mean I could easily imagine a
straight line, but I can't draw it.

This would also speed it up imo. Since some things can easily be solved using
regular algorithms. Our brains also come with some pre wired functions.

~~~
ansible
Yes. In the video, I was struck by how clumsy the robot was. Not clumsy in the
typical fashion of robots, but of young children.

Interesting research.

~~~
spiritplumber
Susan Calvin approves.

------
platz
there was recently a Talking Machines episode that included some information
(not apparent in the title) about difficulties of modeling the world with
robots).

"We learn about the Markov decision process (and what happens when you use it
in the real world and it becomes a partially observable Markov decision
process) "

[http://www.thetalkingmachines.com/blog/2015/5/21/how-we-
thin...](http://www.thetalkingmachines.com/blog/2015/5/21/how-we-think-about-
privacy-and-finding-features-in-black-boxes)

------
rasz_pl
oh man, that bottle and shoe examples! :o

