
Learning Reinforcement Learning, with Code, Exercises, and Solutions - dennybritz
http://www.wildml.com/2016/10/learning-reinforcement-learning/
======
joanderson
I remember reading a comment sometime ago on this site where people on HN seem
to have a positive bias towards topics on machine learning and would always
upvote it. ML seems to be such a tech buzzword these days. This blog post is
at the top of this site, yet there is not a single comment of discussion. I am
not trying to sound negative, but rather was wondering if other people shared
the same opinion.

Regarding the post: This seems like a useful resource. When I read many of
these papers, a code supplement makes understanding it so much better. I do a
lot of research with RNNs with respect to language modeling and implementing
various models when I started researching this field was very useful to get a
better understanding. I got great feedback on my implementations where people
said it helped them understand.

~~~
dennybritz
Another reason may be that this is a pure "resource" post that doesn't make an
argument or represents personal opinions that people could easily comment on.
It's not a good basis to start a discussion, unlike many other HN posts.

However, I'd appreciate more comments of course ;)

~~~
joanderson
I guess you raise a good point since it is only a "resource" post. My comment
was geared toward all ML related posts in general.

I do research in ML so I love seeing these posts, but the fact that words like
neural networks has become such a buzzword is somewhat disappointing. I
remember all the buzz when Swiftkey released their "neural network" keyboard
simply because of the name.

------
brockf
It's great to see more hacker-friendly introductions to reinforcement
learning. Like most facets of machine learning, there are so many interesting
applications of reinforcement learning (e.g., we're using RL to optimize email
marketing campaigns at Optimail), and we'll only find more as more non-
academic hackers discover it.

~~~
make3
... still, I think it does people disservice to not make it clear that ml is
math and that it can only really be done well if you understand what is
happening, ie, if you take the time to understand the maths, which is not that
hard btw.

Calling something cool/hacking because you don't want to take the time to
understand the maths is something Trump would do if he was a programmer

~~~
spenuke
Can you expand on what math you're talking about that is both necessary and
not that hard? "Not that hard", to me, indicates that a reasonably intelligent
person could teach themselves without university instructors (present or
past).

------
notbigml
I find the python code very clear, but I would prefer to see a real life
interesting application that doesn't require a lot of computation. In a post
in wildml there is an example of using NLP and deep learning for a simple task
but after 22 hours of computation the final result is a little disappointing
to say the least.

I like to read wildml.com and fastml.com blogs, but I would like to find more
simple applications that shows real value without using lots of resources.
Perhaps there is a subfield of RL where using some kind of proper human
intelligence one can hope to beat those giants provided of unlimited
computational and financial resources

~~~
mdda
I gave a talk a PyConSG this year[1], which included a demonstration of
training a Reinforcement Learning model on a 'Bubble Breaker' game. There's
also more detail available[2].

The Jupyter notebook is included in the GitHub repo[3], and includes a 'scaled
down version' that takes ~5mins to train on a MacBook's CPU. There's also a
downloadable 'full scale' model that was trained in ~7hours on a Titan X. It
plays the game (on average) better than me...

[1] [http://blog.mdda.net/ai/2016/06/23/workshop-at-pycon-
sg-2016](http://blog.mdda.net/ai/2016/06/23/workshop-at-pycon-sg-2016) (has
slides, and YouTube link) [2] [http://redcatlabs.com/2016-07-30_FifthElephant-
DeepLearning-...](http://redcatlabs.com/2016-07-30_FifthElephant-DeepLearning-
Workshop/#/46) [3] [https://github.com/mdda/deep-learning-
workshop](https://github.com/mdda/deep-learning-workshop) : have a look at
notebooks/7-Reinforcement-Learning.ipynb

------
orthoganol
> but RL is also widely used in Robotics, Image Processing and Natural
> Language Processing

RL for NLP? I would love to know about counter examples, but I'm not aware of
a serious project using RL for NLP, let alone 'widely used.' However I do
believe RL makes sense for a number of NLP problems.

Either way, well done. I appreciate a collection of the algos from Sutton's
book (great book), and in Python.

~~~
dennybritz
A lot of recent research uses RL to "fine-tune" NLP models. A practical
example would be Google's recently announced Machine Translation System
([https://arxiv.org/abs/1609.08144](https://arxiv.org/abs/1609.08144)). It
uses RL to directly optimize BLEU scores on translated sentences.

You'll find similar applications in state-of-the art models for chatbots for
example. Though I agree, "widely used" may be somewhat of an overstatement.
But it's becoming more common.

On a side note, I actually think RL makes a lot of sense for many NLP problems
and it would be super interesting to build a pure RL approach to language
modeling or translation. Nobody has managed to do that quite yet.

~~~
orthoganol
I agree, I would love to see or even work on pure RL approaches in NLP. I
would also like to see HMMs explored more, which I think also make a lot of
sense for NLP, if you think of the problem as sequences of hidden states
represented by semantic frames or other hand crafted language features
producing phrases and sentences.

