
Key Papers in Deep Reinforcement Learning - dsr12
https://spinningup.openai.com/en/latest/spinningup/keypapers.html
======
inarrears
Many of these papers had been featured on HN before:

Neural Episodic Control
[https://news.ycombinator.com/item?id=13843282](https://news.ycombinator.com/item?id=13843282)

Exploration by Random Network Distillation
[https://news.ycombinator.com/item?id=18346943](https://news.ycombinator.com/item?id=18346943)

Evolution Strategies as a Scalable Alternative to Reinforcement Learning
[https://news.ycombinator.com/item?id=13953980](https://news.ycombinator.com/item?id=13953980)

Recurrent World Models Facilitate Policy Evolution
[https://news.ycombinator.com/item?id=16860247](https://news.ycombinator.com/item?id=16860247)

Playing Atari with Deep Reinforcement Learning
[https://news.ycombinator.com/item?id=8484313](https://news.ycombinator.com/item?id=8484313)

------
marmaduke
The spinning up guide is neat, but it seems to assume access to fairly
expensive GPU resources for running models.

[https://cloud.google.com/blog/products/gcp/new-lower-
prices-...](https://cloud.google.com/blog/products/gcp/new-lower-prices-for-
gpus-and-preemptible-local-ssds)

> In US regions, each K80 GPU attached to a VM is priced at $0.45 per hour
> while each P100 costs $1.46 per hour.

The $300 free tier gets you ~600 hours of K80. The spinning up guide suggests
iterating models in <5 min, so that's 7200 iterations.

> start with vanilla policy gradient (also called REINFORCE), DQN, A2C (the
> synchronous version of A3C), PPO (the variant with the clipped objective),
> and DDPG, ... VPG...

that's 6 algorithms, combined with a half a dozen tasks to try, whittles it
down to a few hundred iterations per task/algo combo.

that, combined with a lot of paper-reading, and perhaps clever blogging is
probably enough to get started.

Still, it seems beneficial to democratize DL by making these 5 minute
iterations free, doesn't it?

~~~
wyattk
A great and free option are Colab notebooks from Google:
[https://colab.research.google.com/](https://colab.research.google.com/)

You can attach a GPU for free, and, if I recall, even a TPU. See
[https://colab.research.google.com/notebooks/gpu.ipynb](https://colab.research.google.com/notebooks/gpu.ipynb)

------
rademacher
DRL has definitely shown some excellent results but can someone in the field
comment on this paper, Simple random search provides a competitive approach to
reinforcement learning [1].

[1] [https://arxiv.org/abs/1803.07055](https://arxiv.org/abs/1803.07055)

~~~
twtw
Not a comment on that paper, but for those interested I would highly recommend
Ben Recht's (ont of the authors of that paper) blog series at
[http://www.argmin.net/2018/06/25/outsider-
rl/](http://www.argmin.net/2018/06/25/outsider-rl/).

------
DrNuke
I appreciate a lot they are thinking of beyond-human runs these days, for
example just put a few agents into big, number-crunching climate change models
and see if they are able to “win” (keeping planet Earth ecosystem on this side
of divergence), how (for example, beating the one or two or more most critical
opponents) and at what cost for implementable actions for mankind (how many
resources are needed?). Basically, treat these scenarios as active strategy
games with rewards and a final goal instead of passive, parametric Montecarlo
runs a la think tanks or supranational agencies?

------
rocskipper
Are there curated guides like these for other fields (e.g. NLP)?

~~~
chrisseaton
Bibliographies? Yes most sub fields have one somewhere.

I maintain one for the Ruby programming language
[https://rubybib.org/](https://rubybib.org/).

------
anonymousDan
Does anyone know of examples where reinforcement learning has been applied to
IoT applications?

------
conjectures
I think curated reading lists like this add a lot of value. Otherwise just
knowing where to start is an obstacle.

