

Pavlov.js – Reinforcement learning using Markov Decision Processes - nepstein
https://github.com/nathanEpstein/pavlov.js

======
pekk
The name is annoying since what Pavlov studied was more like supervised
learning (prediction given input/output)

~~~
nepstein
I can definitely understand wanting to classify Pavlov as supervised learning.
I think it's a murky issue because supervised and reinforcement are very
closely related (and it is frequently possible to reframe problems of one type
as problems of the other depending on what type of model one uses).

My two main reasons for going with this name are as follows (if people see
issues in my logic, I'm happy to be convinced):

1) Reinforcement learning gets its name from the behaviorist psychology
concept of reinforcement in which an agent's actions are met with rewards in
order to shape that agent's future behavior. This is precisely the kind of
response conditioning that Pavlov is well known for.

2) The key difference is what the training data look like.

In a supervised learning problem, the training data are input/output pairs (a
stimulus and an appropriate action).

In reinforcement learning, the training data are input and reward pairs (an
action and the reward applied to that action).

I would argue that Pavlov's experiments more like the latter case - the dogs
are not shown 'this is the correct action for this stimulus', they are shown
'this is the reward for this stimulus'.

~~~
graycat
Markov decision processes is an old, mature, at times deep, and polished
field. Names include R. Bellman, E. Dynkin, R. Rockafellar, D. Bertsekas.

There are connections with scenario aggregation, potentials, linear-quadratic-
Gaussian certainty equivalence, currents of sigma algebras, the strong Markov
property, stopping times, and much more.

Can we be more clear on just what the Markov processes involved actually are
and, then, how they are to be used?

~~~
nepstein
The README links to a blog post
([http://nepste.in/jekyll/update/2015/02/22/MDP.html](http://nepste.in/jekyll/update/2015/02/22/MDP.html))
which details how the library is implemented from the definition of a MDP.

For a more rigorous treatment, Andrew Ng's notes
([http://cs229.stanford.edu/notes/cs229-notes12.pdf](http://cs229.stanford.edu/notes/cs229-notes12.pdf))
are an excellent resource.

~~~
graycat
Your first reference is good enough -- it's okay.

All I saw was the Github page of gibberish -- I don't use Github whatever the
heck it is. But your URL was fine.

So, the work is a relatively routine application of classic work from
optimization going way back, e.g., to Bellman.

The "Reinforcement learning" terminology looks like a new label for some quite
ancient wine.

I've wondered what _machine learning_ had that was good and new, and so far
I've seen some that is good but not new and some that is new but not good.

For an application, it would be good to justify the Markov assumption, that
is, that the past and future of the process are conditionally independent
given the present.

For a more detailed treatment, I'd recommend, say,

E. B. Dynkin and A. A. Yushkevich, 'Controlled Markov Processes'.

~~~
defen
Hi - in a previous comment you mention a paper you wrote that describes a
distribution-free multivariate anomaly detector (this is the comment:
[https://news.ycombinator.com/item?id=9580929](https://news.ycombinator.com/item?id=9580929))

Would you mind emailing me a copy of it please? Address in profile. Thanks in
advance!

