
Deep Q-Learning: Space Invaders - stared
http://maciejjaskowski.github.io/2016/03/09/space-invaders.html
======
mratzloff
I wasn't able to play the video for some reason, but I located it on YouTube:

[https://m.youtube.com/watch?v=ZisFfiEdQ_E](https://m.youtube.com/watch?v=ZisFfiEdQ_E)

For comparison, DeepMind:

[https://m.youtube.com/watch?v=ePv0Fs9cGgU](https://m.youtube.com/watch?v=ePv0Fs9cGgU)

Note that 550 is a very low score in Space Invaders.

~~~
nrmn
Not OP but I believe the low score is due to not enough training time and
incorrect parameters such as frameskip. Space Invaders was mentioned as one of
the few games they needed to lower the frameskip (from 4 to 3 or 2?) on
because of the flashing lasers. I'm assuming OP left the parameters as is from
the implementation by Nathan Sprague, which has frameskip at 4, and trained
for a few epochs.

~~~
alexlikeits1999
Changing the frameskip is not needed any more since the implementation does
max of last two frames before processing (same as what DeepMind does)

EDIT: talking about Sprague's implementation btw, not necessarily OPs.

~~~
mjaskowski
Actually I did both frame skipping and amxing out two frames as reported in
the Natures letter

[https://storage.googleapis.com/deepmind-
data/assets/papers/D...](https://storage.googleapis.com/deepmind-
data/assets/papers/DeepMindNature14236Paper.pdf)

------
2bitencryption
I learned of q-learning in the berkeley AI course (the pacman course). So I
sort of get that. but the course didn't touch neural networks.

what's the difference between q-learning with and q-learning without neural
networks? Or, rather, in the process of doing q-learning, where does the
neural network slot in, what does it replace if there is no nn?

~~~
mjaskowski
Note that Neural Network is just a very complex function.

You usually think of Q as a function (S, A) -> (Expected accumulated future
reward)

which is equivalent to S -> A -> (Expected accumulated future reward)

the Neural Network is S -> (A -> (Expected accumulated future reward)) or if
you whish the output layer of neural network consists of |A| neurons. Each
indicates the (Expected accumulated future reward) given current experience.

~~~
2bitencryption
Thanks!

So what we are saying is that a neural network can be used as the
implementation for the q-function? I.e., a q-function is by definition only a
mapping of (S,A) pairs to an expected future reward. We can do this using a
traditional style like value iteration or back propagation, or we can use a
neural network? And it's just a matter of implementation?

~~~
mjaskowski
Yes, we try to approximate Q function with neural network. Which is basically
an enhanced version of gradient-descent Sarsa.

The main trick to notice is that you can't provide consecutive frames as mini-
batches as these would be highly correlated and would derail stochastic
gradient descent.

So we keep many frames (and all other necessary information) in memory and
draw these experiences uniformly to form a minibatch that becomes input to the
neural network

------
phatbyte
This is one of my favourite topics recently. ML and AI is something amazing. A
very nice skill to have. Good article also.

------
karmapolice
As an amateur, I've always wondered if reinforcement learning could work with
games where there are some probabilities in place (e.g. poker). What happens
when the action taken is a good one but the outcome is negative due to bad
luck?

~~~
mjaskowski
Absolutely. Q-learning has this capabilities and a shallow neural network was
used back in 1992 to play backgammon, which has a lot of stochasticity. See
[https://en.wikipedia.org/wiki/TD-Gammon](https://en.wikipedia.org/wiki/TD-
Gammon)

------
serge2k
I would like to learn more about the techniques used here. Can anyone
recommend some books, or online materials but I generally find those worse. I
have a moderately strong math background (undergraduate degree with double
major in CS/Math).

~~~
fizixer
someone posted this comment in another thread [1]. (Also read the two parent
comments of that comment).

Essentially read and do exercises of ISLR (Intro to Statistical Learning, with
applications in R). Will both give you a strong base, and increase your job
prospects (according to the comment).

[1]
[https://news.ycombinator.com/item?id=11286980](https://news.ycombinator.com/item?id=11286980)

P.S: my personal opinion. For every R language exercise in that book, try to
do a similar exercise in Python. If you don't know Python, learn it. (you'll
thank me later).

------
mathgenius
"I omit certain details for the sake of simplicity and I encourage you to read
the original paper."

This link to the original paper appears to point to theano unit test
documentation. Does anyone know what is the "original paper" to look at?

~~~
mjaskowski
Let me fix that. There are actually two papers:
[https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)

and a more recent and more detailed: [https://storage.googleapis.com/deepmind-
data/assets/papers/D...](https://storage.googleapis.com/deepmind-
data/assets/papers/DeepMindNature14236Paper.pdf)

------
kotach
Now, train the network jointly over the game sequence. Or even better, when
given a chance to take action rollout on each action and learn jointly on that
rest of gameplay.

Reinforcement learning is very hard. Especially when you create meaningful
games and then don't use the fact that a whole game is a one long chain of
events, and instead force learning on windowed sequence.

Neural network has enough parameters to remember much of these windows and
will clearly perform well, but the training last too long given the fact that
no structured information is used.

