Ah, very good! The other interesting point is they use a dictionary that include...

jacobharris · on April 1, 2013

Yes, to clarify, I started with the base CMUdict for syllable counts, but I had the program keep track of any term misses it ran into. This way I could augment its vocabulary. It also helped me find some tokenization bugs and also try some rules for dealing with compound words like "unsportsmanlike"

moultano · on April 1, 2013

One approximate hack that works pretty well is to count the number of blocks of vowels separated by consonants. It breaks on some words, but was close enough to use for something I was working on. (Datamining rhymes from lyrics.)