

Playing Atari with Deep Reinforcement Learning [pdf] - plurby
http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

======
JabavuAdams
Submitted previously, here:
[https://news.ycombinator.com/item?id=7802661](https://news.ycombinator.com/item?id=7802661)

... and here:
[https://news.ycombinator.com/item?id=8484313](https://news.ycombinator.com/item?id=8484313)

------
plurby
Here's the video Deepmind artificial intelligence @ FDOT14:
[https://www.youtube.com/watch?v=EfGD2qveGdQ](https://www.youtube.com/watch?v=EfGD2qveGdQ)

------
rrtwo
Is there any example code available for this or a similar task being 'solved'
using deep and reinforcement learning?

~~~
iandanforth
[http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo...](http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html)

~~~
rrtwo
Thanks!

------
zeidrich
Academic papers use a lot of formulas and equations that can be easily
described in English, but instead are described mathematically.

I often find that when I read them, it really slows me down.

"We refer to a neural network function approximator with weights θ as a
Q-network. A Q-network can be trained by minimising a sequence of loss
functions Li(θi) that changes at each iteration i, L_i (θ_i) = E_s,a∼ρ(·)[(y_i
− Q (s, a; θ_i))^2]"

I get to a place like this and I really have to stop and let it process. Ok,
theta is the set of weights for the network function approximator. L_i is the
loss function at i, wait. what is rho? something about expected return. Wait,
what is y? Oh, ok, it's just E_s∼E [r + γ max_a' ; Q(s', a'; θ_i−1)|s, a] ok,
so that's I guess the equation before this one with respect to those weights
for the previous iteration. So I guess that minus the square of the function
approximator based on the weights of the current iteration? Oh, wait it says
rho(s,a) is the behavior distribution or the probability distirirbution over s
and a.

So I guess I kind of get the point of the loss function? I think it estimates
some difference between the approximated expectation of the current iteration
and the approximated expectation where it should be after the previous
iteration?

I guess I wonder if there are other people, even people who study in the same
domain who can read a paper like that and just understand the formulas more
easily than if it were explained a bit more in English.

I mean, I don't typically work with machine learning, and it's been a long
while since I've been in Academia, but even when I was an undergrad, the
formulas that I used were mostly done because I felt that good papers include
confusing formulas. They were always valid and important, but presenting as
little as necessary almost seemed like a challenge to other people, to show
how I was smarter than them. But it also meant when I read papers by peers
even though I knew what all the parts meant I still had to sit down and think
through the formulas when they were shown.

I don't mean that this is the intention of this paper. But especially since
I'm not very familiar with conventions when talking about artificial neural
networks and machine learning it's especially apparent how challenging it is
to understand, but once I do sort through the jargon and process the intent, I
start to understand it.

I guess I just wonder whether someone reasonably familiar with the subject
matter can read those things without issue (in which case, that's great,
because that's the intended audience) or whether authors could communicate
their point more clearly in English, and use the formulas more as reference or
proof. Or whether it was kind of like me, and they like to leave it to the
reader as a sort of "if it takes you a while to get it, that's your problem,
we're smarter."

My guess is that it could be communicated more clearly, and that it's more a
result of academic style than anything else. But I do wonder if people who are
engrossed in the subject also stumble when it comes to those things.

~~~
CJefferson
I work in A.I., on a different area of search. This paper made no sense to me
either. I have a sneaky feeling the authors might be trying to fancy up
something very simple (that accusation is based on nothing more than reading
many accademic papers and finding this one particularly unreadable, when I
feel this paper should lend itself to a particular easy outline of their
technique at least).

