
Meta-Reinforcement Learning - mtrazzi
https://blog.floydhub.com/meta-rl/
======
orasis
This seems to be a contextual bandit where the previous reward is included in
the context.

I can’t come up with real world examples where the behavior of the reward
function is changing like this to warrant making different decisions based on
a previous reward.

~~~
maldeh
While a contextual bandit could be learning using more parameters about the
environment than the multi-armed bandit posed in the example, it still has the
goal of finding the best reward given the specific domain (while by and large
using the same strategy as the multi armed bandit). But placed in a different
context, the state space and underlying model parameters could be completely
different, so the previous reward for any given state, or sequence of states,
could be irrelevant.

The goal here is to reward the agent for the search strategy they employed to
arrive at their answer, not the quality of the answer itself.

One possible use case (directly related to their example with multi-armed
bandits, possibly learnt by a contextual bandit but requires a good deal more
modeling) could be retail pricing, where different categories of products have
drastically different demand curves. A meta-algorithm has the promise of
generalizing better and rapidly arriving at the optimal pricing across a wide
range of similar price curves.

------
ReDeiPirati
we already know that deep RL is sample inefficient... is meta-RL really useful
for something not trivial? This seems rather silly.

~~~
mtrazzi
The first applications appear really simple because they illustrate cognitive
abilities that imitate planning/model-based RL related to
neuroscience/psychology.

It doesn't mean that meta-RL won't scale up with more computation (see
[http://www.incompleteideas.net/IncIdeas/BitterLesson.html](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)).

------
Gayax
I don't reall yget the point of your article ? You seem quite dogmatic and
don't even discuss other hypotheses.

~~~
math_monkey
This article's intended audience is clearly the subset of ML engineers that
have a practical rather than a theoretical/academic background in ML. I'd
argue it definitely has a use as a practical guide to understanding an
original approach to RL, which looks like it has good potential. You could
fairly argue that the article could use more mathematical grounding to what's
being explained, like AMS blog posts tend to be. However, consider how afraid
of mathematics a lot of the CS crowd tends to unreasonably be. The pedagogy of
the article is noteworthy, it helps the reader get a hold of the jargon and
ideas of this burgeoning approach. It prepares them for further research, kind
of like Quanta Magazine tries to do, but it allows itself more technicality,
in line with the blog post format. It's not an easy task given how
multidisciplinary meta-RL really is, and the author does a rather great job,
IMO.

------
houqp
Any example for real world application?

~~~
mtrazzi
For more information about why meta-learning (and in particular meta-RL) is
useful see this post: [https://bair.berkeley.edu/blog/2017/07/18/learning-to-
learn/](https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/)

