
Comparing a recurrent neural network with a Markov chain - YeGoblynQueenne
http://www.r-bloggers.com/is-deep-learning-a-markov-chain-in-disguise/
======
canjobear
An RNN conditions the generation of each character on the entire previous
history of characters generated. A Markov chain only conditions on a fixed
window. Perhaps a particular RNN will learn to truncate its conditioning
context and behave as a Markov chain, or perhaps not; but RNNs in general
certainly can generate formal languages that Markov chains cannot.

For example, consider that the RNN in Karpathy's post (the one linked to in
this article) was capable of generating well-formed XML, generating matching
opening and closing tags with an apparently unbounded amount of material
between them. No Markov chain can do this.

------
llasram
No. Even in toy examples with little internal structure, consider the length
of the sequence generated. For both models we can include a synthetic "end of
sequence" token which the model can produce when run generatively. In the case
of the Markov model the length distribution must be essentially geometric --
at each point that it's possible to end the sequence, the model has some
possibility of ending the sequence, conditioned only on the previous n
characters and not on any other property. The RNN states can model any
arbitrary function, and thus are capable of generating sequences matching any
arbitrary length distribution.

------
Dn_Ab
One can view RNNs as a sort of generalization to markov chains. RNNs have the
advantage of a memory, context tracking and are not limited to learning
patterns of some specific length. RNNs can apply these advantages to learn
subtleties of grammars, balance parenthesis, the proper use of punctuation and
other things that a markov chain might never learn (and certainly not memory
efficiently). For any given piece of text, RNNs can be said to have gotten
closer to understanding what was consumed.

The other question is, are those difficult to learn things truly worth the
cost of training and running an RNN? If a fast and simple markov chain serves,
as is likely the case in practical settings, then it is better to go with the
markov chain. The RNN will still make obvious mistakes, all while correctly
using subtle rules that trouble even humans. Unfortunately, this combination
is exactly the kind of thing that will leave observers less than impressed:
"Yes I know it rambles insensibly but look, it uses punctuation far better
than your average forum dweller!" Alas, anyone who has gone through the
trouble of making a gourad shaded triangle spin in Mode X and proudly showing
their childhood friends, can explain just what sort of reaction to expect.

Eh, so, the moral here is pay attention to cost effectiveness and don't make
things any more complicated than they need to be.

Yoav Goldberg treats much the same thing as this blog post but with far more
detail and attention to subtlety here:
[http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139](http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139)

~~~
contravariant
Viewing RNNs as a generalisation of Markov chains ia a bit confusing, because
what you're calling a Markov chain isn't really a Markov chain in its most
general form.

The one characteristic a Markov chain must have is that the transition
probabilities are completely determined by its current state. This property is
true for both RNNs and what you call Markov chains. The main difference is
that the state space for RNNs is a lot bigger and better at describing the
current context (it was designed to be).

~~~
joe_the_user
Formally, there is no limit to the number of states in a Markov chain.

So in this sense, actually a RNN is a kind of Hidden Markov Chain - one with
more structures added to it. The structure might an RNN better than an HMM but
it doesn't make it more general, it makes it more specific.

------
curuinor
No, because the distributedness of the representation means that exponentially
large Markov chains would be needed to replace linearly growing deep models.
See this paper on the recurrent temporal RBM
([http://www.cs.utoronto.ca/~ilya/pubs/2008/rtrbm.pdf](http://www.cs.utoronto.ca/~ilya/pubs/2008/rtrbm.pdf)).

It's like saying, "aren't binary trees just linked lists?"

------
gjm11
The evidence the author offers that Markov chains are as effective as RNNs is
that if you're given a tiny snippet of the output of each it can be hard to
tell which is which.

But the thing that (both in theory and in practice) distinguishes RNNs from
simpler constructs with very finite amounts of state is precisely what happens
with not-so-tiny amounts of output. RNNs can produce sizable syntactically
valid chunks of languages with "nested" structure -- open and close tags in
XML, parentheses in Lisp, curly braces in C, etc.; Markov chains can't. It
seems reasonable to guess that RNNs will also be able to produce better-
structured natural language text than Markov chains ever will.

Show us a _page_ of fake Shakespeare from the RNN and a page from the Markov
chain, and I don't think it will be so difficult to tell which is which.

------
hrrld
For one-dimensional problems (Like forward generation of word sequences) there
are definite comparisons to be made.

In my experience (having used both nets and markov processes in computational
creativity), nets hold more promise for higher-dimensional problems (e.g., 2d
image generation).

I'm all for having more toys the toybox though, so I hope work continues on
both/all fronts.

------
nimish
i wouldn't be surprised if someone discovered that a RNN was a way of
compactly representing a markov chain with more than one history state or
wasn't in fact all that much better

------
joe_the_user
Correct me if I'm wrong but I get the impression is that Convolutional Neural
Networks (cnn) are the constructs that have demonstrated a unique
effectiveness for deep neural networks and have done this primarily through
pattern recognition.

Recurrent NNs are a different beast, predicting sequential processes and so
operating in an area closer to hidden Markov chain that have some successes
but their operations and direction don't seem as convnets.

Note that since the game of go is a sequence of moves, one might training a
RNN to play go. However, AlphaGo (very roughly) used a Convnet to tell a Monte
Carlo Tree structure which final positions looked good.

I'm going partly from:
[http://neuralnetworksanddeeplearning.com/chap6.html#other_ap...](http://neuralnetworksanddeeplearning.com/chap6.html#other_approaches_to_deep_neural_nets)

------
zitterbewegung
This is a clickbaity title when it should be can simpler models have the same
performance of deep learning models.

~~~
tosseraccount
Any link to current benchmarks?

------
kotach
Comparison on a task that is generating text.

It also depends on how the HMMs are trained. Are they trained by reducing the
loss over each document, or are they trained by just taking the frequency of
all the needed transitions?

Training using joint loss will be much more effective for this problem,
training conditional random fields and then sampling them will also be
extremely good for this problem and allows arbitrary features for each letter.

RNNs do have a great representational power but the lack of training jointly
makes them as linear as CNNs. This representational power saves the day but
will never really catch long distance dependencies.

Both approaches have the so called label-bias. (if HMM is trained by
frequency)

CRFs wouldn't have that problem, and RNNs somehow seem to avoid it by great
representational power although the training does say that label bias exists
there.

------
varelse
Wouldn't it have been nice if he indicated which sequences came from the RNN
and which came from the HMM? My guess is that the HMM would generate more
feasible (though not necessarily better and in fact quite constrained)
sequences given the constraint of the last 5 characters having happened at
least once in the training set, so, shooting from the hip entirely, 1 and 4
are the HMM?

------
cttet
Hinton already mentioned this in his coursera course in 2012?

------
dang
Since people are saying the title is linkbaity, we replaced it with a neutral
description of the article. The HN guidelines ask submitters not to use the
original title when it is misleading or linkbait:

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
YeGoblynQueenne
The title is provocative but I thought it was OK with the guidelines. My
apologies that it had to be changed.

~~~
dang
I didn't think it was that bad either, and didn't mean to imply that you
should have changed it. We just like to remind the community of the guideline
about titles, since there are some common misconceptions about it.

------
MrGando
No, I don't think it is (just reading the title here)

