
Markov chain algorithm in Go - jgrant27
http://golang.org/doc/codewalk/markov/
======
mjn
As a side note if you want an "industrial-strength" applied version, at the
cost of losing the small/elegant implementation, there's been considerable
work in the past years on building robust Markov-language-modeling packages. I
mention it here mostly because I didn't realize until recently myself that
there was much to gain on top of the vanilla versions that are fairly
straightforward to DIY.

One active area of research is smoothing the models, so they capture
regularities without just memorizing the input text. The vanilla approach
chooses a fixed order of Markov chain (e.g. 2-grams or 3-grams), where you
choose n either ad-hoc, or (ideally) to balance underfitting (miss context you
could use) against overfitting (end up memorizing verbatim text strings due to
insufficient data). But the right "n" to choose can vary based on context as
well, since different words show up more often in the training data, and are
more or less variable in what kinds of words follow them. So there are a
number of approaches to choose n adaptively and/or to smooth the counts
("Katz's backoff model", "Kneser-Ney smoothing", etc.).

From a more engineering point of view, there's also considerable work on
pruning the models and representing them efficiently, if you want to work with
things of the size of, say, the Google Books ngram dataset.

A few free-software packages that implement a good portion of the recent work:

<http://kheafield.com/code/kenlm/>

<http://code.google.com/p/berkeleylm/>

~~~
kappaloris
I created a dumb bigram version that also memorizes leaps. Not much theory
behind it, but the fact that the resulting text has an uneven level of
delirium is somewhat funnier.

<https://github.com/kappaloris/markrov-text-generator>

~~~
sillysaurus
Thanks! I was hoping to find a bunch of examples of the text your solution
generates, but sadly there's only one. Installing it seems to require some
effort, too, so it's a bit hard to see a demo of your solution.

------
stiff
Various kinds of Markov models are pretty interesting and have many
interesting applications that are not too complicated, it's a pity this
particular example is being done to death, since it doesn't give much insight
into the topic. Shannon's classic paper on information theory is very readable
and shows the usefulness of the concept:

[http://cm.bell-
labs.com/cm/ms/what/shannonday/shannon1948.pd...](http://cm.bell-
labs.com/cm/ms/what/shannonday/shannon1948.pdf)

~~~
Isamu
Yes! - read the paper. Or if you are too "busy" on HN, at least skip to
sections I.3. THE SERIES OF APPROXIMATIONS TO ENGLISH and I.4. GRAPHICAL
REPRESENTATION OF A MARKOFF PROCESS.

See also the Usenet trolling program that indirectly inspired the Go example:
<http://en.wikipedia.org/wiki/Mark_V_Shaney>

------
johansch
I simply do not understand HN's fascination with this type of posts.
"Something that could have been implemented in any other turing complete
language NOW IN GO".

~~~
jwilliams
If you know Markov Chains it's an insight to see how it's implemented in (the
relatively new) Go. If you don't, you get to learn about Markov Chains.

Plus the codewalk itself is cool. I'm unsure of the UX, but certainly enough
for a discussion.

~~~
jlarocco
Not sure I believe that.

There's only insight to be gained if the Go version is particularly
interesting due to features of Go.

------
exratione
Here's a small Node.js state machine / Markov chain tool package that tries to
make what's going on under the hood a little more obvious, so as to allow for
tinkering. It contains an example for name generation.

<https://github.com/exratione/state-engines>

------
joshrotenberg
Meta, but codewalk looks pretty cool.

~~~
acc00
I was pretty sure this was the point of the submission.

I'd like to see more of this. A fresh interesting take on representing
programs, especially welcome as not being served as yet another eclipse plugin
or a multi-screen .vimrc.

------
gburt
I implemented this in PHP yesterday if anyone is interested:

[https://github.com/gburtini/Learning-Library-for-
PHP/blob/ma...](https://github.com/gburtini/Learning-Library-for-
PHP/blob/master/lib/unsupervised/markovchain.php)

Edit: the thread above me is talking about bigram implementations, mine
supports n-gram by setting the variable $degree when you train to anything you
like. By default, its 1-gram.

------
MostAwesomeDude
Huh. 85 lines, ignoring the explanation at the top. I wonder if this could be
golfed to be smaller; my Markov generator for Time Cube is less than half the
height (in Python):
[https://github.com/MostAwesomeDude/airbrush/blob/master/tips...](https://github.com/MostAwesomeDude/airbrush/blob/master/tipsum.py)

~~~
GhotiFish
I would be interested to see an excerpt from that!

~~~
MostAwesomeDude
<http://airbr.us/h/tipsum> has a running instance.

------
pjmlp
Oh man, one letter character variables. :(

~~~
thirsteh
..are good if they are descriptive/intuitive. It's idiomatic Go--long variable
names most certainly are not.

~~~
voidlogic
Unless they are loop counters, array indices, they never are
descriptive/intuitive outside of a 5 line function. I write a lot of Go
(understatement) and for all but the former, my min is three, rarely two
letters. I also buck the "system" and use "this" in my receivers :)

Nothing is worse than seeing a "r" in the body of some function or as a struct
member and not knowing if it is a reader, response, request so rather than
using "r", I'll go with rdr, resp, req or this (were applicable). That is
hardly verbose and our code is immensely more readable than something
"idiomatic" like the Go standard library IMHO. I think this really should not
be considered an issue of idiomaticness, if is was gofmt or go vet should
care.

The short is if you like Go and not single letter var names, great use Go with
longer variables names! I realize this is a religious issue so I can already
feel the flames burning :)

~~~
thirsteh
I don't consider req a long variable name. What I'm trying to communicate is
that req is better than request, and res better than response, since they
communicate the same thing and are easier to read and write. When it makes
sense, i.e. when it can't lead to confusion, the variables have one-letter
names--oftentimes this is e.g. i, j for loop variables or n for a number, and
so on. Also, it is very common for a method on a type Bar to bind the Bar to a
variable b, i.e. func (b Bar) Foo() { ... }

This is how the entire Go stdlib is written, and how most packages are
written.

