
CS 294 Deep Reinforcement Learning, Spring 2017 - aaronjg
http://rll.berkeley.edu/deeprlcourse/
======
anuragramdasan
Quickly glanced through the syllabus and this seems like it covers mostly the
advanced aspects of Reinforcement Learning and assumes you know the basics
concepts such as MDPs, and training models etc.

For those interested in this, would strongly recommend David Silvers intro to
RL[1] before beginning with the above course.

1\.
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)

~~~
kleiba
Everything you say (including the David Silvers recommendation) is already
stated on the course website under "Prerequisites".

------
komaromy
Looks really cool.

I recently hit a roadblock when trying to implement the original DeepMind
Atari algorithm [0] with TensorFlow. They don't mention this in the paper, but
the network wasn't trained to convergence at each training step (maybe this
would be obvious to people more well-versed in deep learning, but it wasn't to
me coming from a classical RL background).

As it turns out, TensorFlow's optimizers don't have a way to manually
terminate training before convergence. That meant I was getting through
several orders of magnitude fewer training steps than the DeepMind team did,
even when accounting for my inferior hardware. This might not be a problem in
some learning cases, where training more on certain examples lets you extract
more information from them, but in games with sparse rewards it's bad.

Of course, TensorFlow does let you do the gradient calculations and updates by
hand, but I wasn't prepared to go that far at the time. Maybe in the next few
weeks I'll dive back into it.

[0] [https://arxiv.org/pdf/1312.5602.pdf](https://arxiv.org/pdf/1312.5602.pdf)

~~~
tfgg
> As it turns out, TensorFlow's optimizers don't have a way to manually
> terminate training before convergence.

I don't know how you determined this, but the optimizer minimize op definitely
only does one step, equivalent to doing the gradient update yourself.

------
tw01
Will non-Berkeley students be able to participate in the discussions on
Piazza?

If not, for those interested in following this course online, we might want to
start a slack channel study group around this to help each other out. PM if
interested.

~~~
markovbling
Please post details on where to join the slack :)

~~~
paulbaumgart
Seconded.

------
psb217
It would be great if cleaned-up demo code for many of these models/algorithms
could be shared in a single "deep RL quickstart" repo.

Various implementations (sometimes of dubious correctness) are already
scattered around Github, but having a single library of code to build from
when booting up a new research project would be a boon to people who don't
have such great access to collaborators' codebases.

Thanks for sharing these resources.

------
concilliatory
will assignments be posted?

~~~
cbfinn
Yes.

