> Dropbox Error (429) This account's links are generating too much traffic and have been temporarily disabled!
If you can, leave the tab open for a while to keep seeding the file! (It's a torrent.)
Since it is a draft, the author's new link is the right one to read, since the torrent I posted might be out of date by the time you read this.
I also highly recommend reading about Edward O. Thorp. He was a friend of Claude Shannon, whom he frequented Las Vegas with, and used the IBM 704 (first mass produced computer with floating point ops) to explore blackjack game theory in ~1956.
To apply his research he borrowed $10,000 from someone with mob connections and won $11,000 in a single weekend.
He also developed the first wearable computer (for a specific definition of computer).
Sutton & Barto is the Bible for RL. Having it be freely available updated is wonderful!
I would like to read all the interesting fruit of RL in just one hour, can someone suggest a short book for someone with advanced maths skills?
Thanks a lot to the authors the book seems to be really interesting.
Edit: In page 25, an extended example: tic-tac-toe the rule to update the value of each state v(s)=v(s)+a(v(s')-v(s)) doesn't take into account that if in s' there is a winning strategy by the policy then previous values is also part of a winning strategy. So if v(s')=1 (win) then v(s)=1 (I can win).
In my very humble opinion, the author should digress a title to talk about this very important point.
The scenario you describe is if alhpa=1, and it would do poorly. Try thinking about games where the opponent doesn't play an optimal game. Try thinking of stochastic environments.
Imagine a scenario where you play a game and the opponent plays poorly and you win; you then try and repeat the same thing again, this time the opponent has learnt from their mistakes and beats you. You'll keep playing the same losing move significantly more times because it worked that one time.
I don't know why everyone wants to second-guess the first chapter of the standard textbook in this space with what seems like no experience even thinking about this topic...
Don't get anything from it.
There is also information about the book on the author's page about it on his website:
The scope is generally about the same though, perhaps because it's intended to be used as a single-semester textbook, so there isn't a big expansion into areas of RL other than those covered in the first edition (e.g. POMDPs are only briefly mentioned).
Does anyone have a mirror?
edit: Most people like when natives correct their English.