
Striving for Simplicity in Off-Policy Deep Reinforcement Learning - jonbaer
https://arxiv.org/abs/1907.04543
======
MasterScrat
I wrote a quick summary on /r/reinforcementlearning:
[https://www.reddit.com/r/reinforcementlearning/comments/cc9g...](https://www.reddit.com/r/reinforcementlearning/comments/cc9gnh/striving_for_simplicity_in_offpolicy_deep/etlgebd/)

------
sgt101
This is a dense read, and I will have to spend a lot of time on it, but is it
really surprising that Atari can be learned from 50m events? This isn't a real
domain, and it's not a complex domain and it's not a domain that synthesises
the challenges of real domains (for example noise between the control input
and outputs, noise in the sensors, dynamism in the domain)

That's ok - but let's not conflate "learning Atari" and "reinforcement
learning"

