
End-to-End Deep Reinforcement Learning Without Reward Engineering - jonbaer
https://bair.berkeley.edu/blog/2019/05/28/end-to-end/
======
iandanforth
This is really important work. It's general and operates within a (non-
researcher) human attention budget.

Soft Actor Critic (SAC) is a great method on it's own but this human-in-the-
loop training for real world problems is a totally viable path for long term
training of robots on challenging tasks.

The major roadblock we still face though is having a single model reach
competency on multiple tasks without losing competency on earlier ones. I'm
willing to train a robot once for each new task, but I'm not willing to
constantly retrain it on old ones or wait for it to start from scratch after
it's learned a few basic tasks.

We expect skill learning to be compositional and, like learning to ride a
bike, easily retained.

This problem (often referred to as life-long learning, or overcoming
catastrophic forgetting) is a deep problem in modern ML.

~~~
so_tired
I never really tried multi-task learning. Is it so catastrophic ?

If I have several tasks, plenty of samples for each task, and a network that
converges well for each task.

Cant i just mix and task-label these samples, train from scratch a slightly
bigger network, and ta dam .. a multi modal network or what ever?

~~~
iandanforth
Yes you can! The "mix" part is key. It's the sequential learning which screws
up networks today. If you randomly sample from tasks you're fine, or if you
can replay older tasks while you're learning new ones (essentially another
form of random sampling) the network can learn multiple tasks. But the moment
you drop a task from a distribution of training data you're going to start
losing competency on it. By default neural networks don't have mechanisms to
protect data (weights) from being over-written.

