Hacker News new | comments | show | ask | jobs | submit login
‘Deep learning’ technique enables robot mastery of skills via trial and error (berkeley.edu)
220 points by joshwa 938 days ago | hide | past | web | favorite | 50 comments

Sometimes it's hard to separate signal from noise when you're not part of a field and just hearing about projects/papers, so I wanted to quickly pitch in to say that this is a legitimately ground-breaking approach and line of work that you can expect to hear much more about in the future. It's probably the most exciting robotics/manipulation project I'm currently aware of.

What's exciting here is that the entire system is trained end-to-end (including the vision component). In other words, it's heading towards agents/robots that consist entirely of a single neural net and that's it; There is no software stack at all - it's just a GPU running a neural net "code base", from perception to actuators. In this respect this work is similar to the Atari game-playing agent that has to learn to see while also learning to play the game. Except this setting is quite a lot more difficult in some respects; In particular, the actions in the Deepmind Atari paper are few and discrete, while here the robot is an actual physical system with a very large-dimensional and continuous action space (joint torques). Also, if you're new to the field you might think "why is the robot so slow?", while someone in the field is thinking "holy crap how can it be so fast?"

Andrej -- thank you.[1]

What struck me the most is this number: 92,000. That's the total number of parameters in the neural net guiding the robot.

In other words, the robot learned to do this with a tiny toy neural net!


[1] Thank you not just for this helpful explanatory comment, but also for all the friendly explanatory lectures, presentations, blog posts and open-source code you have shared online over the past few years. You deserve recognition for it.

Here's the same guy (PR2) in action 7 years ago folding towels at 50x speed: https://www.youtube.com/watch?v=gy5g33S0Gzo

And I doubt it was trained. It seems to be following a predefined procedure instead of learning by itself.

It seems to have come a pretty long way if you ask me!

As one of the ex-leads on the PR2 laundry folding project, I can confirm that Pieter's group has come up with a framework that totally blows everything we used to do out of the water.

Any ideas on how the interaction shown here is directed, in other words what defines the 'goals' and what motivates the robot to perform the task?

I don't know enough robotics to follow the paper[1] entirely, but this seems the relevant part for your question:

We evaluated our method by training policies for hanging a coat hanger on a clothes rack, inserting a block into a shape sorting cube, fitting the claw of a toy hammer under a nail with various grasps, and screwing on a bottle cap. The cost function for these tasks encourages low distance between three points on the end-effector and corresponding target points, low torques, and, for the bottle task, spinning the wrist. The equations for these cost functions follow prior work.[23]

The reference 23 points at S. Levine, N. Wagener, and P. Abbeel. Learning contactrich manipulation skills with guided policy search. In International Conference on Robotics and Automation(ICRA), 2015.

I haven't read that paper yet.

Interestingly, they initialise the visual learning model using the ImageNet images. Was it 3 years ago that was considered a pretty much intractable problem, and now the fact a CNN can work on it well enough to be useful isn't even worth a complete sentence.

[1] http://arxiv.org/pdf/1504.00702v1.pdf

Thanks for confirming what my untrained SNR filters had flagged (seemingly somewhat correctly) as an incredible advance. The video was reminiscent of watching a child learn.

This technique seems to be the best bet of all the machine learning techniques to be solvable by Moore's law. If it currently takes about three hours to learn to do these simple tasks with no previous spatial data for the objects, then as the article states "In the next five to 10 years, we may see significant advances in robot learning capabilities through this line of work."

>video was reminiscent of watching a child learn

Yes! this story and yesterdays RNN one with this comment https://news.ycombinator.com/item?id=9584988 "It's like a child learning to talk"

make me believe we are very close to some kind of breakthrough using "boring" non-magical methods.

Your comment regarding the training of the NN to perceive and action the joint actuators reminded me of an old software project that's remotely related:


It's an ambitious project that wanted to "Evolve" agents made of sticks + joints that ran in a 3d world with physics. The joints had signal inputs + activation functions that are similar to AN activation functions.

Neuroscience suggests that the brain has many fixed-purpose computational units, some neural nets and others not. While doing everything in a single network is indeed impressive, I am concerned about whether this is the most efficient approach.

EDIT: Downvoter, care to explain the reason for downvoting?

This method actually does have fixed-purpose computational units if you will (that's why it's called guided policy search). During learning, it basically loops back and forth between a heuristic trajectory optimization (to keep a tight rein on the motor torques) and reinforcement learning with neural networks (to achieve high-level abstraction).

Why are you concerned? It's an approach. It seems to work pretty well. It's possible it could be improved, does that mean we should not display the results of this because it might not be the most efficient result?

I didn't downvote, but it's tiresome to hear "Yeah, that's all cool and everything, but I'm concerned that it might not be the absolute best." Nobody claimed it was the most efficient approach ever possible, only that it was cool, surprising, and reasonably ground breaking.

You can't really tell if it's the most efficient approach until you try it, and compare it with a model that you think might be a more efficient approach.

I said it was impressive and I meant it.

The neocortex is over half of the human brain by mass or volume, and it plausibly uses a small number of distinct algorithms (believed by some to correspond very loosely to those of artificial neural networks) to learn and remember a vast diversity of knowledge and skills.

>some neural nets and others not.

Source for the brain having computational units that aren't neural nets? I'd love to read more on this.

"In vertebrates, inter-aural time differences are known to be calculated in the superior olivary nucleus of the brainstem. According to Jeffress,[1] this calculation relies on delay lines: neurons in the superior olive which accept innervation from each ear with different connecting axon lengths."


How is that not a neural network? This system is basically exploiting the property of action potential propagation down the axon to capture relative timing differences.

As an aside, there are a number of useful properties seen in biological neural networks that aren't yet incorporated into multi-layer perceptrons. E.g. short-term plasticity, axonal delays, spiking neurons, etc. I expect that some of these will find their way into the MLP formulation when we can figure out an effective mathematical way of doing so.

I mean that it's not a neural network in the computational sense. Sorry about the terminology issue.

Thanks a lot for your comment.

Are there any prospects for integrating neural nets with more traditional code? For all their power, neural nets share the limitations of humans: they learned behaviors are approximate and not perfect. So for example if your deep learning robot wanted to play chess, it would be nice to have him switch to using a chess engine instead of learning to play chess. Or you might want to hardcode a few very efficient moves into your otherwise autonomous industrial robot.

Is that possible?

The core problem with the "just a neural net" idea is how to tell a robot what it is supposed to do; the whole AI currently relays on some sort of external harness either training the model or pushing the control algorithm into a desired operation. You may implement something very thin like artificial pain or hunger, but the odds that it will push the system out of a metastable stupor are likely negligible.

Do you see a plausible architecture for long term planning? Putting money in my 401(k) every month to get it a few decades from now isn't something that's accessible to Q-learning, direct policy search or forward simulations. How do we learn hierarchical / semantic representations of our own actions?

I think a similar problem needs to be solved in the language generation neural networks. While generated sentences often are formed syntactically correct, they don't make a lot of sense -- there's no over arching thought that it is trying to express. This is even more apparent when you read the subsequent sentences generated. It's clear there's no conceptual goal it's trying to convey.

In principle, adding more depth to the network can take care of that. If you are keeping track of higher and higher level abstractions, you can remember what you were saying.

But you can't remember everything, so ideally you'd also have an attention model that's capable of looking back at what you've written, and make edits. When I balance parentheses, I don't maintain a counter of how many parentheses are open, I look back at what I've written and count them.

is the robot possible with the RNN you wrote about earlier?

Would love to see youtube channel describing these deep learning /machine learning / AI subjects.

Learning motor torques directly from vision is a very important result.


This talk by Sergey Levine, Pieter Abbeel's PostDoc outlines Berkley's end-to-end deep-training visuomotor control in detail.

Here is the paper :

End-to-End Training of Deep Visuomotor Policies, Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel.



Learning Contact-Rich Manipulation Skills with Guided Policy Search Levine, Wagener, Abbeel



Please keep posting!

I probably made a career direction error in the early 1990s. I had been on DARPA'S neural network tools advisory panel and written the SAIC Ansim product, but moved on because of a stronger interest in natural language processing. Now, I think deep learning is getting very interesting for NLP.

This UCB project looks awesome!

BTW, I took Hinton's Coursera neural network class a few years ago, and it was excellent. Take it if that course is still online.

Could someone explain in simple terms how is the target set to the robot so that it can learn to accomplish the task? For example, what inputs are provided in order for it to understand that it needs to put the cap on the bottle?

The robot learns to see it's arm and the target using a Imagenet trained CNN.

The robot is then shown the task a few times. A human controls the robot for a few minutes, performing the task.

Then an innovative policy search finds a robust policy so the robot can perform the task from any initial position and is robust to changes such as the addition of a shirt to the hanger task after training.

Potentially the robot can learn from videos of humans performing the task - i.e. by copying people.

This is some truly impressive work, I would expect the next step is to have the robot 'guess' what a new task is based on its similarity to previously completed tasks.

i.e. when given wood with a protruding nail + hammer, it relates the task to a previously trained Whac-A-Mole scenario and begins hammering the nail in.

It seems most of the code behind this effort is open source as well! http://lfd.readthedocs.org/en/latest/ https://github.com/cbfinn/caffe

Wouldn't this benefit from simulation of the task (from the robot's perspective)? Doing something physical over and over again on ONE single robot must be very slow and inefficient compared to if it could be simulated. Even if the simulated training isn't spot on, the physical robot could start off with network weights from millions of attempts in a simulated environment.

I think that the efficiency of the specific task isn't actually the point here, but rather how the robot is progressing on learning the task on its own. In the future when this is put to practical application, I don't doubt that what you're saying would be employed to make a robot's learning curve more gentle.

I'm impressed it (apparently) learned to align screw caps with a short backward turn at the start.

Then again, why do we make so many containers with these ungainly screw caps? Ever use those caps (popular in Japan) with the locking track that only take a quarter-turn to close? Examples



While it is how humans learn, there's more to human learning than that. Babies are pre-wired to learn language, recognize shapes, determine "intent", etc.

This means that the neural nets used by babies are pre-wired to be good at specific tasks. Then, babies use those neural nets to do "deep learning" for the final part of the process.

Starting from nothing and learning how to do a job is a big step. But having something would be a better start position. What that something is, though, is hard to define.

>> Babies are pre-wired to learn language, recognize shapes, determine "intent", etc.

I tend to disagree. In my perception (as father of two if it counts) is that babies are very poorly wired if at all.

They struggle with basic survival skills like breastfeeding. Some babies get it in the first couple of days, other take weeks of "training" with the help of adults. Awareness of needing sleep seems to be entirely absent (crying is not the best strategy for animals to sleep, huge bug).

Things like language, shapes and intent, are all developed later, and can go entirely undeveloped without stimulation and feedback, so I'd say they are already a product of learning and not pre-wiring.

The only thing I can think of that is most certainly pre-wired is crying. They nail that from day one.

Also don't forget that they are already sensory capable of a lot several weeks before being born, and voice recognition for one thing is something they learn around that time.

> babies are very poorly wired if at all.

As a father myself, I don't agree. I find it impossible to believe that babies are wired poorly, or randomly, or just are amorphous blobs of learning. They're active and inter-active from a very early age. Even before they're born.

Their brain is still growing connections, and re-wiring itself based on sensory input / feedback. i.e. blind people co-opt the vision centers to process sound.

But there is a vision center. There are portions of the brain which are pre-wired to be good at certain activities.

If nothing else, look at the inputs. The nerves from the retina and ears go somewhere. They don't just disappear into random parts of the brain. They're pre-wired to certain areas. Those areas are in turn pre-wired to be good at accepting certain inputs.

In contrast, many animals have much more hard-wired behavior. And insects are little automatons. Are we really going to say that animals are pre-wired with... nothing? And that they learn all of their behavior after they're born?

I find that even harder to believe than the idea that the brain is pre-wired to be good at some things.

Single-bit output (ok, maybe 4 if you count volume) and multiple confusing stimuli as input and barely functional actuators, you should be amazed at how well babies manage to condition their parents into acting like parents right from the get-go!

Breastfeeding is a tough process, how long does it take a fighter pilot to learn how to properly dock to a supply plane for mid-air refuelling?

That's a complicated procedure with lots of bits an pieces that need to work just-so for milk delivery to take place and quite often it is not just the infant that needs to learn.

Prewired doesn't necessarily mean wired at birth. Human development is largely genetically prescribed. With regards to language, most neuroscientists tend to agree that we have some innate capacity to acquire language very fast after a certain age, and assume that this is at least to a large part genetic. http://en.wikipedia.org/wiki/Language_module

> Awareness of needing sleep seems to be entirely absent

Indeed. And this can persist into adulthood...

Even then, I think the pre-wiring only benefits us humans in terms of efficiency at learning and don't change the learning process itself that much.

If you're interested in this, I'm putting together a meetup/workshop/lab at the Palace of Fine Arts in SF every weekend. Come out and share, learn, and build with other people interested in this field.

Think of it as the Home Brew Computer Club for Robotics/AI :)


It acts very organic. But I have to wonder if the organic motion is a good thing. Wouldn't it be more efficient to control the arm using IK, but let the robot "think" where the arm should be ? I mean I could easily imagine a straight line, but I can't draw it.

This would also speed it up imo. Since some things can easily be solved using regular algorithms. Our brains also come with some pre wired functions.

I think one lesson from the deep learning breakthrough in computer vision a few years ago was that deep nets can actually outperform humans in the engineering of features, and what you are suggesting is basically feature engineering in the output space.

After watching another video posted by deepnet (https://youtu.be/EtMyH_--vnU) there seems to be merit to this method.

Yes. In the video, I was struck by how clumsy the robot was. Not clumsy in the typical fashion of robots, but of young children.

Interesting research.

Susan Calvin approves.

there was recently a Talking Machines episode that included some information (not apparent in the title) about difficulties of modeling the world with robots).

"We learn about the Markov decision process (and what happens when you use it in the real world and it becomes a partially observable Markov decision process) "


oh man, that bottle and shoe examples! :o

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact