Twitter Bot Finds Anagrams of Twitter Statuses 113 points by nateguchi on Aug 13, 2014 | hide | past | web | favorite | 22 comments

 "Mustache got thicker" vs "git checkout hamster"This bot is pretty funny.Anyone know how it works? I'm assuming it just sorts the string and puts it in a hashmap/table and looks for collisions.
 I found "another math genius" vs "He ain't smart enough." pretty funny
 Probably just takes tweets, canonicalizes them, and then hashes them based on a 26-length vector of character counts. For every new tweet, it looks for old tweets with the same character count.
 The source is on github : https://github.com/cmyr/anagramatron
 I thought this was deep:Go All out or Die trying.v/sR u really going to do it?
 This particular one had me in splits (:
 I'm a bit surprised that there are anagrams to be found. It's easy to find them if they exist, but there's no guarantee at all that there actually should be collisions.
 Fermi estimate time!Anagrams are just sentences with the same letter counts. The anagrams they're posting have 25ish letters... how many ways are there to distribute 25 balls into 26 bins? (25+26)!/25!/26! is ~250 trillion. The birthday paradox square roots that down to ~10 million, and the fact that we prefer some bins (fewer Zs, more Es) probably cuts it down even further to ~1 million.So one anagram per million short tweets; hundreds per day. Doesn't seem too unreasonable.
 Quick sanity check: most of the anagrams there are from short tweets, as you'd predict.
 I'm not a statistician.Is it really that surprising? English has plenty of redundancy; Twitter statuses have limited length.What's surprising to me is the niceness of the found anagrams. "another math genius" / "he ain't smart enough".
 > What's surprising to me is the niceness of the found anagrams.That's because they are manually curated [0]`````` Q: Is this manually curated? A: Mostly for issues of volume ( there are a lot of variations of 'goooood mooornnniinng!', there are a lot of spam bots posting subtely different versions of the same message, etc) the bot doesn't automatically post every anagram it finds. Essentially there's an iphone client that reviews matches, which are manually approved or rejected. `````` [0] https://github.com/cmyr/anagramatron
 I am a statistician. Maybe I should sit down and actually do some calculations.
 It's actually extremely likely. The chance that any two statuses are anagrams is miniscule, and even the chance that a particular status has an anagram among all other statuses is probably small, but the chances that there are no collisions at all is tiny.See a description of the Birthday Paradox[1] for the mathematics behind this. For example, if you put 70 people in a room, there is a 99.9% chance that two people share a Birthday.
 I find it interesting that they're manually approving the hits, because, as they indicate, most hits are (nearly) identical.It shouldn't be too difficult to solve this automatically though. Identical hits can be discarded very easily. The ones that only have a few words or letters reversed can be detected with some kind of similarity algorithm.
 I had a look at the source code, and it does quite a bit of filtering, particularly around making sure the words are unique, and there is a primitive character comparison algorithm.The code could be simplified by using Python's set() and improved by doing a copy'n'paste on a Levenshtein function.
 oh hey yea that would've been useful. ^_^
 Nice one :)I love when people use programming to play with words.Sad it's english only. I may work on a french version.But really nice idea.
 hey, author here: It's english only mostly because of volume, and because I review results. Making a french language version would mostly just be a matter of hosting. If you're interested let me know, I'd love to help you out.
 maybe as a side project.But i have some friends who makes rap and there is some diamonds that i've found with your bot :=I want to see this world change.Let's see what I can do right now=And :=you have destroyed medo you deserve my hate?=i keep it in mind but it will not be before six months.I will make a pull or notify with github, i code also in python.In french we have some software to find rimes.Putting the finding in a database could be a nice addition, i could help to compose text.You have my admiration for the idea and the execution ...
 This is delightfully ironic
 Oily Shirted Filch Linguist

Applications are open for YC Summer 2019

Search: