
Reinforcement Learning and DQN – learning to play from pixels - rubenfiszel
https://rubenfiszel.github.io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN.html#asynchronous-methods-for-deep-reinforcement-learning#
======
blueeyes44
Couple key points: This project[0] is going to keep developing. It has Deep Q
Learning now, A3C[1] is working and more features are being added.

Just as important: RL4J and DL4J run on a scientific computing framework
called ND4J[2] that integrates with Spark and trains on multiple GPUs.[3][4]

It's basically porting RL to the production stack of large organizations that
work with the JVM and need to scale.

The key thing to remember is that RL combines with other algorithms, like deep
convolutional nets or Monte Carlo Search Trees. DL4J has the ConvNets
already.[5]

[0]
[https://github.com/deeplearning4j/rl4j](https://github.com/deeplearning4j/rl4j)

[1] [https://arxiv.org/abs/1602.01783](https://arxiv.org/abs/1602.01783)

[2] [http://nd4j.org/](http://nd4j.org/)

[3] [http://deeplearning4j.org/spark](http://deeplearning4j.org/spark)

[4] [http://deeplearning4j.org/gpu](http://deeplearning4j.org/gpu)

[5]
[http://deeplearning4j.org/convolutionalnets.html](http://deeplearning4j.org/convolutionalnets.html)

------
white_oak8
Awesome read!

Given the Doom examples, another work to add to the list at the end is
[https://github.com/Ardavans/DSR](https://github.com/Ardavans/DSR). Extends
the idea of successor representations introduced by Peter Dayan[1] in the
1990s to successor features using a deep neural net. The learning algorithm is
demonstrated with Doom.

[1]
[http://www.gatsby.ucl.ac.uk/~dayan/papers/d93b.pdf](http://www.gatsby.ucl.ac.uk/~dayan/papers/d93b.pdf)

~~~
rubenfiszel
Thank you. It seems very interesting. I will add it as a reference once I am
done reading it :)

------
apathy
This is a terrific piece with a lot of depth, but it would be nice if you
defined what a DQN is (especially since it's an ad-hoc DeepMind term) before
going further. I recognized everything _but_ the specific meaning of the DQN
acronym and ended up looking up DQN itself... standard practice when writing
things up is to expand abbrevations on first use, show the reader what they'll
be abbreviated to, and use that after.

I searched the document and did not see an expansion of DQN to Deep Q-Network
anywhere in the body, nor could I find a link to a definition of the term,
which is kind of silly considering how much effort you put into the rest of
the document. (Why limit your readership and/or opportunities?)

~~~
rubenfiszel
thanks a lot. You are right. I edited and expanded it inside the introduction.

~~~
apathy
To reiterate -- it is a fantastic piece. I went back and re-read it just now,
and there's more meat in that one webpage than in many entire engineering
textbooks. Terrific piece.

------
sabertoothed
I didn't have time to read it yet (bookmarked it for later) but I must say I
LOVE the style of this blog and the integration of images and code with a
beautiful font.

~~~
rubenfiszel
Thank you. It is my first post so I actually spent a decent chunk of time
learning some webdesign to make it look nice. I am using Hakyll as static site
generator and it is really awesome! Highly recommend it.

------
wagonhelm
Awesome!

