Hacker News new | past | comments | ask | show | jobs | submit login
Generating Text with Markov Chains (healeycodes.com)
72 points by healeycodes 11 months ago | hide | past | favorite | 22 comments

I used to write Markov-based chat bots. Something I thought I observed, but tried and failed to show mathematically, is the possibility of well-connected neighborhoods of the graph (cliques?) leading to some outputs being more likely than their a priori likelihood in the input.

For example, once a simple Markov bot learns the input phrase "had had" it will also start to generate output phrases a human would assign 0% probability to like "had had had" and "had had had had". This in itself isn't a violation of the principles behind the model (it would have to look at more words to distinguish these).

The question is whether more complicated loops among related words can create "thickets" where the output generation can get "tangled up" and generate improbable output at a slightly higher rate than the formal analysis says a Markov model of that order should do for given input frequencies. An example of such a thicket would be something like, "have to have had had to have had".

Essentially, I'm hypothesizing that the percentage values of the weighted probabilities of transitions does not tell the whole story, because the high-level structure of the graph has an add-on effect. A weaker hypothesis is that the state-space of Markov models contains such pathological examples, but that these states are not reachable by normal learning.

Unfortunately I lacked the mathematical chops / expertise to formalize these ideas myself, nor do I personally know anyone who could help me explore these ideas.

I have observed the same phenomena with both simple Markov chains and whatever Apple used for autosuggest on the iPhone.

That said, your specific example reminded me of a specific joke sentence: https://en.m.wikipedia.org/wiki/James_while_John_had_had_had...

I actually hadn't seen that one (I did know of the "Buffalo buffalo" family of sentences).

Regarding observation of the phenomenon, in a way I first observed the inverse of it. My first proofs of concept didn't learn probabilities, only structure. (One might say all edges had the same probability.) Yet aesthetically they performed just as well as versions which included probability in the algorithm.

So basically to calculate the stationary distribution (distribution over a long time of evaluation) of a markov chain, you have to calculate the eigenvectors of the transition matrix. And these take into account the graph structure, so I think the stationary distribution will be exactly what is calculated this way, there isn't really any space for anything beyond that.

Incidentally, this is very similar to the original pagerank algorithm. The rank roughly corresponds to the probability you will be on any given page if you randomly follow links.

There's a non-zero probability of transitioning from "had" to "had", but I don't understand how there could be a non-zero probability of "had" following "had had" or of "had had" following "had had" unless the model is being trained on its own output.

Traditionally only the last output is known by a markov chain and when it comes to chat bot generation the output is usually one word at a time. The bot doesn't track that the last 2+ words it output are the same, all it knows is that the last word it said is 'had', so what next?

There are countless markov chain chat bots made though and plenty bend the rules, to some extent though they are all susceptible to this. In an old irc chat of mine we had a lot of fun abusing one such bot to get it stuck in these kinds of loops, in our case it was the word 'fuck'. You'll have to forgive our teenager sense of humor.

It's perfectly fine, theoretically for the Markov chain to depend on more than the previous token. When you start to do that though, unless you can write down a function (or approximation, like an NN) the number of states in your state space that you have to enumerate explodes very quickly. This is the simple motivation behind stuff like GPT or neural reinforcement learning.

I've done something similar without first learning about Markov Chains. One of my more interesting experiments was creating messed-up laws. I fed it the constitution and alice in wonderland, and it made the most surreal laws. The great thing about them is they don't need to know about language. You could make one to create images, another to create names for cities. I made one to create 'pronounceable passwords'. It took the top 1000 words of a dictionary, and then it would spit out things which could potentially be words, of any length. Of course, the pronounceability of a word like Shestatond is debatable.

I published a program to do basically this on Usenet in 1987.

Someone created a fixed version on github at


This is a neat introduction to the subject.

If you want to see more of what they can do, and you have had any exposure to Chomsky, you might also appreciate https://rubberducky.org/cgi-bin/chomsky.pl.

I wrote 'Bloviate' to mess around with Markov chains. From Goldilocks and the 3 bears it prodoces gems such as:

“Someone’s been sitting my porridge,” said the bedroom.

You can download it here: https://successfulsoftware.net/2019/04/02/bloviate/

8 years ago I implemented something like this for your tweets at https://thatcan.be/my/next/tweet/

It still causes a spike of traffic every now and then from Twitter.

thank you for this clear and instructive write up! i appreciate the high-level, concise breakdown.

For the Finns reading this, there's a Twitter bot "ylekov" [1] that combines headlines from different Finnish news outlets using Markov chains. Sometimes they come out pretty funny

> Suora lähetys virkaanastujaisista – ainakin kaksi tonnia kokaiinia

[1]: https://twitter.com/ylekov_uutiset

Yup, those are Markov Chains. Not to sound snotty or mean, but... so what? Why should we be interested?

Looks kinda like you followed a tutorial and did a write up.

I tried to code up Markov chains when I was first learning to program but I found that many resources didn't have clear/terse enough code snippets. Most articles also didn't walk you from zero to understanding.

So I'm trying to fill that gap and help out anyone who is trying to learn the basics.

It's a well-written practical demonstration of an interesting thing that [some people might not know about yet][0], with links for further reading. What's the problem?


No problem. Didn't say I had one. Just curious, wanted more information about why OP posted, what made their post special.

Still, didn't seem very novel or deep. Might as well have been someone posting their first timing circuit test, water/air tunnel simulation, or computer vision project. So, good for them. But I thought HN wasn't a glad handing cheerleader chorus.

Searching for Introduction to Markov Chains gets tons of hits at various levels of explanation that are nearly a decade old.

Dangit, that's the one I meant to link to! I copied the link without checking. Thanks!

The same result generated with "AI" doesn't get this question.


IMHO they probably should if they're this basic.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact