
New Draft of “Reinforcement Learning: An Introduction, Second Edition” - Gimpei
https://www.dropbox.com/s/d6fyn4a5ag3atzk/bookdraft2016aug.pdf?dl=0
======
sridca
The submitted link is no longer working:

> _Dropbox Error (429) This account 's links are generating too much traffic
> and have been temporarily disabled!_

------
nicklaf
I've uploaded a mirror of the PDF:

[https://instant.io/#678d0be07a0f2260ec6f9b134ec0a1d7c4325e99](https://instant.io/#678d0be07a0f2260ec6f9b134ec0a1d7c4325e99)

If you can, leave the tab open for a while to keep seeding the file! (It's a
torrent.)

~~~
nicklaf
Looks like the author has a new URL for the PDF, hosted by the university:

[https://webdocs.cs.ualberta.ca/~sutton/book/bookdraft2016sep...](https://webdocs.cs.ualberta.ca/~sutton/book/bookdraft2016sep.pdf)

Since it is a draft, the author's new link is the right one to read, since the
torrent I posted might be out of date by the time you read this.

~~~
davidmpaz
Thank you sir!!

------
Smerity
If anyone is interested in exploring this, I highly recommend skipping to the
blackjack exercise. This isn't to say you should read the book fully but I
feel blackjack is simple and enough to get you addicted. Starting with Monte
Carlo sampling is quite approachable and, once you've done that, extending it
is relatively easy.

I also highly recommend reading about Edward O. Thorp[1]. He was a friend of
Claude Shannon, whom he frequented Las Vegas with, and used the IBM 704 (first
mass produced computer with floating point ops) to explore blackjack game
theory in ~1956. To apply his research he borrowed $10,000 from someone with
mob connections and won $11,000 in a single weekend. He also developed the
first wearable computer (for a specific definition of computer).

[1]:
[https://en.wikipedia.org/wiki/Edward_O._Thorp](https://en.wikipedia.org/wiki/Edward_O._Thorp)

------
paulrd
Wonderful! I took his machine learning coarse at the University of Alberta and
we used the first edition in his class - though he didn't make it mandatory to
buy his book. In any case, it was well worth the asking price. The math isn't
too hard and progresses gently enough. I think the lisp code he wrote for the
first edition is online somewhere. I'll have to check his web presence.

~~~
paulrd
The (mostly) lisp code from the first edition can be found here:
[https://webdocs.cs.ualberta.ca/~sutton/book/code/code.html](https://webdocs.cs.ualberta.ca/~sutton/book/code/code.html)

------
apathy
Sweet, this is like finding out that the Bible has been revised and updated.
Except that I immediately went and read this.

Sutton & Barto is the Bible for RL. Having it be freely available updated is
wonderful!

------
coderunner
Here is the draft linked from the author's website:

[http://incompleteideas.net/sutton/book/bookdraft2016sep.pdf](http://incompleteideas.net/sutton/book/bookdraft2016sep.pdf)

------
piedradura
just reading page 15, arg max = maximal ..., I think that global maximum or
local maximum is better than maximal.

I would like to read all the interesting fruit of RL in just one hour, can
someone suggest a short book for someone with advanced maths skills?

Thanks a lot to the authors the book seems to be really interesting.

Edit: In page 25, an extended example: tic-tac-toe the rule to update the
value of each state v(s)=v(s)+a(v(s')-v(s)) doesn't take into account that if
in s' there is a winning strategy by the policy then previous values is also
part of a winning strategy. So if v(s')=1 (win) then v(s)=1 (I can win). In my
very humble opinion, the author should digress a title to talk about this very
important point.

~~~
Eridrus
The book is hundreds of pages long, if he diverges to talk about everything in
Chapter 1 it would be a mess.

The scenario you describe is if alhpa=1, and it would do poorly. Try thinking
about games where the opponent doesn't play an optimal game. Try thinking of
stochastic environments.

~~~
piedradura
What I suggest is to use the function: if v(s')==1 then 1 else the usual rule.

~~~
Eridrus
Lets pretend alpha = 1 on a win and alpha = 0.1 on a loss.

Imagine a scenario where you play a game and the opponent plays poorly and you
win; you then try and repeat the same thing again, this time the opponent has
learnt from their mistakes and beats you. You'll keep playing the same losing
move significantly more times because it worked that one time.

I don't know why everyone wants to second-guess the first chapter of the
standard textbook in this space with what seems like no experience even
thinking about this topic...

~~~
piedradura
When you lose the value of v' change and so change the value of v.

------
danmaz74
I did my thesis about reinforced learning. Unfortunately didn't work with that
afterwards, but I really think it's one of the most interesting approaches to
machine learning for real world application.

~~~
jlas
What was your thesis? And what are some areas do you see RL being applied
(besides the somewhat contrived game solvers we've been seeing)?

------
nonickfx
Can someone please provide a mirror? Both links don't work anymore! Thanks a
lot guys

------
davidmpaz
Is the link still available? No seeders anymore ?

Don't get anything from it.

Thanks

~~~
nicklaf
Try now.

There is also information about the book on the author's page about it on his
website:

[https://webdocs.cs.ualberta.ca/~sutton/book/the-
book.html](https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html)

------
InquilineKea
How is it different from the first edition?

~~~
_delirium
There's a "Preface to the Second Edition" near the front, which has a summary
of changes. Main points are: 1) notation was overhauled, 2) Chapters 2-8 were
reworked to only use tabular methods, with function approximation introduced
later; 3) the function approximation coverage is then greatly expanded in the
second section of the book (Chs. 9-13); and 4) new chapters 14-15 on
connections between RL and psychology and neuroscience.

The scope is generally about the same though, perhaps because it's intended to
be used as a single-semester textbook, so there isn't a big expansion into
areas of RL other than those covered in the first edition (e.g. POMDPs are
only briefly mentioned).

------
WmyEE0UsWAwC2i
Does any one has a mirror?

~~~
_RPM
Hey, I hope this doesn't come across as rude, but it seems like you're not a
native English writer. The correct grammar for your question would be:

Does anyone have a mirror?

edit: Most people like when natives correct their English.

~~~
ChristianGeek
OP is a cat. The grammar is correct.

~~~
T-A
Wait, shouldn't that be "Can I haz mirror?"?

------
zump
Will RL take over Deep Convolutional Networks as having the best results?!

~~~
qd6pwu4
I think they are parallel and can be used together, like Deep Reinforcement
Learning.

~~~
nicklaf
_I think they are parallel and can be used together_

Yep:

[https://en.wikipedia.org/wiki/AlphaGo#Algorithm](https://en.wikipedia.org/wiki/AlphaGo#Algorithm)

~~~
VodkaHaze
That's a sequential use. They got the initial strategy with DL, and made it
stronger with MCTS

