
Deep Reinforcement Learning is a waste of time - snaky
http://www.jtoy.net/blog/deep-reinforcement-learning-is-a-waste-of-time.html
======
beisner
In a lot of ways, the field has already come to this conclusion. At NeurIPS
this year some of the biggest topics in Deep RL were model-based RL and meta-
learning for RL, both of which aim to learn a generalized representation of an
environment that can be used in a variety of downstream tasks.

------
MasterScrat
If you are not familiar with RL, I recommend first reading the two articles
that the author links to:

\- [https://www.alexirpan.com/2018/02/14/rl-
hard.html](https://www.alexirpan.com/2018/02/14/rl-hard.html)

\- [https://himanshusahni.github.io/2018/02/23/reinforcement-
lea...](https://himanshusahni.github.io/2018/02/23/reinforcement-learning-
never-worked.html)

They are no so recent anymore, but still capture the problem well.

Long story short: RL doesn't work yet. We're not sure it'll ever work. Some
big companies are betting that it will.

> My own hypothesis is that the reward function for learning organisms is
> really driven from maintaining homeostasis and minimizing surprise.

Both directions are actively researched: maximizing surprise (to improve
exploration), and minimizing surprise (to improve exploitation).

See eg "Exploration by Random Network Distillation" for the first, "SURPRISE
MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second.

------
w1nst0nsm1th
Sometimes, send a letter is the best way to do.

Some systems fail to even implement the concept of reward (and punishment) and
the agent is not even 'aware' of what is a reward (or a 'punishment'), and so
the agent don't even know he is being rewarded (or 'punished') is in the first
place. Then the system has to be redesigned to optimize the code.

Sometimes AI is the least straight forward solution, the most expensive and
the less efficient in matter of result.

