
A Markov text generator that learns from the Twitter streaming API - pclark
http://twitter.com/TweetOvermind
======
mahmud
Here is a layman's introduction to Markov text generation:

<http://www.in-vacua.com/markov_text.html>

With some code:

[http://uswaretech.com/blog/2009/06/pseudo-random-text-
markov...](http://uswaretech.com/blog/2009/06/pseudo-random-text-markov-
chains-python/)

~~~
mootymoots
Thank god for this... I had no idea what Markov was!

------
NathanKP
Wow, I built a Markov text generator years ago, and made a PHP bot to crawl
web pages to gather text for it, but that was before Twitter even existed. One
suggestion might be to add a dictionary of english words and acronyms so that
you can weed out nonsense words and other languages. It would probably have a
big effect on the accuracy.

I wonder if the code is available anywhere? I'd love to experiment with it
myself and see if I can improve on it some.

Edit: I found the code: <http://github.com/OEP/markov>

------
mootothemax
This is a rather fun idea, I like it a lot :) So far I'm using a mix of PHP,
C, and more PHP and have just set the bugger live here:
<https://twitter.com/markov_chains>

~~~
cmelbye
Wow, yours looks like the quality is much better (especially so considering it
just started learning a few hours ago). The OP's is a bunch of meaningless
gibberish.

~~~
mootothemax
Thanks! To be honest, I'm really quite scared by how often its tweets kinda-
sorta make sense ;)

------
avar
There was a similar thread on reddit 2 days ago:
[http://www.reddit.com/r/programming/comments/c6o1t/i_created...](http://www.reddit.com/r/programming/comments/c6o1t/i_created_a_markov_text_generator_that_learns/)

I pointed out a bot that I run there: <http://twitter.com/twatterhose>

Here's the code: <http://github.com/avar/bot-twatterhose>

And the Markov engine powering it: <http://hailo.github.com/>

------
moss
This is delightful. What I'd really like to see now is one that could learn
from people's reactions to it: pay attention to which of its tweets were
retweeted or favorited and try to generate more like those.

------
thristian
A friend of mine runs a similar bot on Twitter, seeded with Twitter messages
it sees:

<http://twitter.com/x11r5>

...on Identica, seeded with Identica messages it sees:

<http://identi.ca/x11r5>

...and on the web, seeded by content from an IRC channel:

<http://www.x11r5.com/>

As the about page says, "X11R5 is an insane geek on Identica, X11R5 is an
insane 12-year-old on Twitter."

<http://www.x11r5.com/wtf>?

------
Mongoose
I built something like this for Sunlight Labs' Apps For America contest last
summer. It feeds US patent application abstracts to a Markov processor to
generate random invention descriptions. <http://eurekaapp.com/>

It's pretty slow (tail call optimization in Ruby would be nice), but what it
spits out tends to be pretty funny.

------
Isamu
This is a reference to a classic hack, Mark V. Shaney, where a Markov-chain
generator was used to post to Usenet.
(<http://en.wikipedia.org/wiki/Mark_V_Shaney>)

Bonus tie-ins to HN topics:

Rob Pike was one of the perpetrators (an author of the Go language)

It was featured in Martin Gardner's Mathematical Games column in Scientific
American

(yeah, I wrote one too, after reading the article. I think it's catnip to
hackers, like the Game of Life)

------
owensmartin
I'm sorry, but I really don't see the benefit for having made this thing-- the
tweets it puts out are useless!

Are we just playing around in how to program Markov text generation? Because
who cares; they figured that out ages ago.

~~~
spolsky
hmm. I can't tell the difference between that and any other twitter feed.
Mindless gibberish brain-hand-grenades that take 45 minutes to decipher and
turn out to be about a television series I've never seen.

