
RL²: Fast Reinforcement Learning via Slow Reinforcement Learning (2016) - saycheese
https://arxiv.org/abs/1611.02779
======
iraphael
This seems to be the important bit, that describes what makes their learning
"fast":

"The objective (...) is to maximize the expected total discounted reward
accumulated during a single trial rather than a single episode. Maximizing
this objective is equivalent to minimizing the cumulative pseudo-regret
(Bubeck & Cesa-Bianchi, 2012). Since the underlying MDP changes across trials,
as long as different strategies are required for different MDPs, the agent
must act differently according to its belief over which MDP it is currently
in. Hence, the agent is forced to integrate all the information it has
received, including past actions, rewards, and termination flags, and adapt
its strategy continually. Hence, we have set up an end-to-end optimization
process, where the agent is encouraged to learn a “fast” reinforcement
learning algorithm"

However, one would note that this is still bounded by learning of the RNN, so
I don't really see how this approach makes the algorithm much faster than
"slow" RL, as loads of trials would still be required for any real learning.
Maybe someone more knowledgeable could pitch in.

~~~
dementrock
(Author here)

This is a great question! There are at least two scenarios where having a
meta-learning setup could help. The first is to learn the fast RL algorithm in
simulation, by having a distribution over real world environments (varying
physics, textures, etc), and then run it in the real world. The second is to
learn a fast RL algorithm over a wide range of tasks (for example, on a set of
training games in Universe), and (hopefully) generalize to unseen games,
analogous to generalization in supervised learning.

~~~
saycheese
Does OpenAI have a page listing all of their published papers?

------
joe_the_user
If this works, this seems like it could be very significant.

Broadly, slowness is a serious problem in current machine learning approaches
and so anything that speeds things up is significant.

The approach of learning the learning process would seem to get bonus points
for being interesting and general.

Edit: Published in November, this paper didn't seem to get any comments on the
machine learning Reddit, which is my go-to for informed on this stuff. I'd
love to have someone who knew what they were doing comment here.

------
saycheese
Note: The original title noted OpenAI's role in the paper, which maybe seen by
loading the PDF and reading the author credits.

[https://arxiv.org/pdf/1611.02779v2.pdf](https://arxiv.org/pdf/1611.02779v2.pdf)

------
lucidrains
So when are we going to see the paper where we use an RL net to speed up
another RL net for discovering a neural architecture for learning how to do
gradient descent (by gradient descent)?

~~~
bytefactory
So...you mean a grad student? ;)

~~~
Ar-Curunir
It's called Grad student descent (referring to the grad student's descent into
despair :P)

