I find that convincing and a key insight. RL is going to be fundamental to AGI, but he's saying curiosity / unsupervised learning will be necessary. And I say this as a big believer in the need for more work on RL.
This may seem like a naive question, but it's sincere: What makes a scalar reward less effective at modifying a Q function than a scalar error that's used in backprop and assigned to a neural network's coefficients?
reinforcement learning: (after millions of operations you had just performed) out of 1000 you predicted label #136. That's not right, but I won't tell you what you should have done. Also, it could have been right, but maybe you had screwed up something in one of your last 100 predictions and I won't tell you what it was. Good luck.
Take the moment he decided to build a rocket. It was after he had a difficult meeting to buy a russian missile. He quickly roughed out, from first principles, that he should be able to build one himself. The /drive/ to make that analysis was in pursuit of his goal (and required reinforcement learning).
But what lifelong process filled his brain with all the complex and nuanced information needed to rapidly draw that conclusion? His curiosity, aka unsupervised learning, which led him to learn so many things over the years that culminated in that moment.
For this reason, I might make the analogy that RL is the "conductor of the symphony", rather than calling it the "cherry on top" as LeCun does here.
EDIT: followup question: to what extent does DeepMind's Deep-Q learning address the problem of associating remote causes with final outcomes?
In the black-box RL setting, you only see whether what you did was right or wrong, not what the right thing would have been to do. And unlikely a classification system where the output space is relatively small (ImageNet has 1000 classes), an RL agent is searching over an exponentially large space of possible trajectories. Which means that without some additional source of supervision you can spend a long time wandering in the wilderness with no idea of whether what you're doing is reasonable or how to get any reward at all. And when you do get some reward, you have no idea which of the possibly hundreds or thousands of actions you took deserves the credit.
A lot of recent RL research is about finding additional sources of supervision, such as training an agent to mimic the policy of an "expert" (e.g., a search algorithm that runs in a simulator to find an optimal solution, but which requires too much computation to actually apply directly at test time, as in http://arxiv.org/abs/1504.00702), or coming up with proxy objectives like "empowerment" or "curiosity" (which you can define in terms of information theoretic quantities, such as mutual information between the agents' action sequence and future state, e.g. http://arxiv.org/abs/1509.08731) to supplement the actual reward signal. This latter path, the notion of "intrinsic reward" that paulsutter alluded to, is in some sense the merger of RL with unsupervised learning, and a lot of the power is that you're getting new reward signals constantly, not just when you finally manage to achieve some arbitrarily difficult task.
But then I get happy because at least I understand a little bit :)
I would be awesome to have a platform where you get recommendations of what to learn / read or online courses in order to understand a given talk.
Last session was in 2013 though.