Markov chain paper title generator (vedantmisra.com)
36 points by vedant on Jan 6, 2012 | 10 comments

I'd also recommend a good game of arXiv vs. snarXiv: http://snarxiv.org/vs-arxiv/

As a non high-energy physicist, it's surprisingly hard! I usually do _worse_ than random chance, sometimes substantially so.

The author also has perhaps my favorite definition of a CFG: "The snarXiv is based on a con­text free gram­mar (CFG) — basically a set of rules for computer-generated mad libs."

Ruby gem (without scraper): https://github.com/michaeldv/gabbler

When I ran it on my advisor's name, it gave the actual titles of two of his papers. Entertaining, though.

It seems to be a common pattern. When I ran it with the argument "A Einstein", I got "On the electrodynamics of moving bodies." as one of the results.

Edit: Also, for "P Erdos", I got "On a new law of large numbers."

From the article:

If you’re using a small corpus and long Markov chains, you’ll end up with lots of actual strings from the corpus, and no fake ones. If this happens, experiment with the second parameter to the constructor for the class “MarkovGenerator.”

For the authors you are using, the corpus is too small.

Since Paul Erdos is one of the most prolific mathematicians ever, the problem is more likely that the default chain length (4) is too long for even a relatively large corpus. A chain length of 2 does much better.

Maybe you could keep actual paper names in a separate array, and check generated results against them before returning..

Also see http://pdos.csail.mit.edu/scigen/, an automatic CS paper generator. One of their papers was accepted to a conference, and they gave some hilarious speeches (also in link).

Lovely :)

Reminds me of my own playing with markov chains; http://williamedwardscoder.tumblr.com/post/13292744100/the-s...

shameless plug:

I wrote a "Lorem ipsum" replacement based on Markov chains and some public domain books as the corpus, http://wordum.net

It would be much funnier with a demo page.

