This is a lot like an app I wrote a few years ago for a programming contest. It uses the same technique to generate random invention descriptions using patent application abstracts: http://eurekaapp.com/ (Yes, I realize how unreasonably slow it is)
Hey that's cool! I imagine that you probably caught yourself, a time or two, actually trying to read and understand what the "invention" is. As I just did.
It seems that it needs to sample more tweets. I've done 5 tweets, and 2 of them were exact tweets that I sent (they were only a few words long, and I guess they had words that I rarely tweet, so it had few or no other ways to go once it started repeating the tweet)
Edit: Then again since there are only a few words in a tweet, you'd have to go a really long way back to really ensure that won't happen. Possibly farther back than Twitter will let you.
Oslaka died 2 kilometers in order to come from seafaring folk, so I do it. I spend twenty minutes left! He is hideously scarred. My host removes his face a lot in college.
This reminds me of an old Perl script I made with Markov chains for a very similar sort of random nonsense text generation.
I think I fed it some text from a few usenet kooks/conspiracy theorists and something like Alice in Wonderland and got quite a few laughs a long time ago, though it was made to allow you to combine arbitrary texts into a single chain.
Very apt, but what n-gram length is being used? n=1 is my guess, since "as often as" is a common English construct. Obvious feature request: tweakable lengths.
edit: I'd make the fix myself and send a pull request but I don't know haskell and am too lazy to figure it out.
n=1 is indeed being used. The problem with a larger n is that you get original tweets really often, because the dataset is limited (as much tweets as you can get in one request).
Along those lines, I've been playing recently with using the google ngram data for markov chaining. The size of the corpus allows using 5grams without the problem of seeing text that has actually been written before (mostly.. it could just decide to spit out the complete works of shakespeare any second!), and the results were more interesting to me than most markov chains I've seen before. http://kitenet.net/~joey/blog/entry/dadagoogoo/
Random example 1: nothing had pleased God to bestow upon you as to participation in physical activity and exercise . Don ' t rain .
Random example 2: sad and terrifying each time . After a quick nap .
I did not read the paper, but what does more accurate mean in this case ? Likelihood of some unseen data ? Seems pretty hard to define or measure to me, if the goal is shear amusement.
I'll leave this here, since it's loosely on-topic (of Markov chains:) http://www.joshmillard.com/garkov/ — Garfield strips with MC-generated text instead of the original. I've had countless hours of fun with them.
Unfortunately, if you do as I did and tweet what it generated, you can't use it again, as it would be eating its own output; and generating nonsense from nonsense is not so entertaining.
http://twitter.com/markov_chains