Hacker News new | past | comments | ask | show | jobs | submit login
Comparing a recurrent neural network with a Markov chain (r-bloggers.com)
119 points by YeGoblynQueenne on Feb 27, 2016 | hide | past | favorite | 19 comments

An RNN conditions the generation of each character on the entire previous history of characters generated. A Markov chain only conditions on a fixed window. Perhaps a particular RNN will learn to truncate its conditioning context and behave as a Markov chain, or perhaps not; but RNNs in general certainly can generate formal languages that Markov chains cannot.

For example, consider that the RNN in Karpathy's post (the one linked to in this article) was capable of generating well-formed XML, generating matching opening and closing tags with an apparently unbounded amount of material between them. No Markov chain can do this.

No. Even in toy examples with little internal structure, consider the length of the sequence generated. For both models we can include a synthetic "end of sequence" token which the model can produce when run generatively. In the case of the Markov model the length distribution must be essentially geometric -- at each point that it's possible to end the sequence, the model has some possibility of ending the sequence, conditioned only on the previous n characters and not on any other property. The RNN states can model any arbitrary function, and thus are capable of generating sequences matching any arbitrary length distribution.

One can view RNNs as a sort of generalization to markov chains. RNNs have the advantage of a memory, context tracking and are not limited to learning patterns of some specific length. RNNs can apply these advantages to learn subtleties of grammars, balance parenthesis, the proper use of punctuation and other things that a markov chain might never learn (and certainly not memory efficiently). For any given piece of text, RNNs can be said to have gotten closer to understanding what was consumed.

The other question is, are those difficult to learn things truly worth the cost of training and running an RNN? If a fast and simple markov chain serves, as is likely the case in practical settings, then it is better to go with the markov chain. The RNN will still make obvious mistakes, all while correctly using subtle rules that trouble even humans. Unfortunately, this combination is exactly the kind of thing that will leave observers less than impressed: "Yes I know it rambles insensibly but look, it uses punctuation far better than your average forum dweller!" Alas, anyone who has gone through the trouble of making a gourad shaded triangle spin in Mode X and proudly showing their childhood friends, can explain just what sort of reaction to expect.

Eh, so, the moral here is pay attention to cost effectiveness and don't make things any more complicated than they need to be.

Yoav Goldberg treats much the same thing as this blog post but with far more detail and attention to subtlety here: http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139

Viewing RNNs as a generalisation of Markov chains ia a bit confusing, because what you're calling a Markov chain isn't really a Markov chain in its most general form.

The one characteristic a Markov chain must have is that the transition probabilities are completely determined by its current state. This property is true for both RNNs and what you call Markov chains. The main difference is that the state space for RNNs is a lot bigger and better at describing the current context (it was designed to be).

Formally, there is no limit to the number of states in a Markov chain.

So in this sense, actually a RNN is a kind of Hidden Markov Chain - one with more structures added to it. The structure might an RNN better than an HMM but it doesn't make it more general, it makes it more specific.

No, because the distributedness of the representation means that exponentially large Markov chains would be needed to replace linearly growing deep models. See this paper on the recurrent temporal RBM (http://www.cs.utoronto.ca/~ilya/pubs/2008/rtrbm.pdf).

It's like saying, "aren't binary trees just linked lists?"

The evidence the author offers that Markov chains are as effective as RNNs is that if you're given a tiny snippet of the output of each it can be hard to tell which is which.

But the thing that (both in theory and in practice) distinguishes RNNs from simpler constructs with very finite amounts of state is precisely what happens with not-so-tiny amounts of output. RNNs can produce sizable syntactically valid chunks of languages with "nested" structure -- open and close tags in XML, parentheses in Lisp, curly braces in C, etc.; Markov chains can't. It seems reasonable to guess that RNNs will also be able to produce better-structured natural language text than Markov chains ever will.

Show us a page of fake Shakespeare from the RNN and a page from the Markov chain, and I don't think it will be so difficult to tell which is which.

For one-dimensional problems (Like forward generation of word sequences) there are definite comparisons to be made.

In my experience (having used both nets and markov processes in computational creativity), nets hold more promise for higher-dimensional problems (e.g., 2d image generation).

I'm all for having more toys the toybox though, so I hope work continues on both/all fronts.

i wouldn't be surprised if someone discovered that a RNN was a way of compactly representing a markov chain with more than one history state or wasn't in fact all that much better

Correct me if I'm wrong but I get the impression is that Convolutional Neural Networks (cnn) are the constructs that have demonstrated a unique effectiveness for deep neural networks and have done this primarily through pattern recognition.

Recurrent NNs are a different beast, predicting sequential processes and so operating in an area closer to hidden Markov chain that have some successes but their operations and direction don't seem as convnets.

Note that since the game of go is a sequence of moves, one might training a RNN to play go. However, AlphaGo (very roughly) used a Convnet to tell a Monte Carlo Tree structure which final positions looked good.

I'm going partly from: http://neuralnetworksanddeeplearning.com/chap6.html#other_ap...

This is a clickbaity title when it should be can simpler models have the same performance of deep learning models.

Any link to current benchmarks?

Comparison on a task that is generating text.

It also depends on how the HMMs are trained. Are they trained by reducing the loss over each document, or are they trained by just taking the frequency of all the needed transitions?

Training using joint loss will be much more effective for this problem, training conditional random fields and then sampling them will also be extremely good for this problem and allows arbitrary features for each letter.

RNNs do have a great representational power but the lack of training jointly makes them as linear as CNNs. This representational power saves the day but will never really catch long distance dependencies.

Both approaches have the so called label-bias. (if HMM is trained by frequency)

CRFs wouldn't have that problem, and RNNs somehow seem to avoid it by great representational power although the training does say that label bias exists there.

Wouldn't it have been nice if he indicated which sequences came from the RNN and which came from the HMM? My guess is that the HMM would generate more feasible (though not necessarily better and in fact quite constrained) sequences given the constraint of the last 5 characters having happened at least once in the training set, so, shooting from the hip entirely, 1 and 4 are the HMM?

Hinton already mentioned this in his coursera course in 2012?

Since people are saying the title is linkbaity, we replaced it with a neutral description of the article. The HN guidelines ask submitters not to use the original title when it is misleading or linkbait:


The title is provocative but I thought it was OK with the guidelines. My apologies that it had to be changed.

I didn't think it was that bad either, and didn't mean to imply that you should have changed it. We just like to remind the community of the guideline about titles, since there are some common misconceptions about it.

No, I don't think it is (just reading the title here)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact