See top entry: http://www.urbandictionary.com/define.php?term=fuzzy+wuzzy
Not to take anything away from the tech - that looks awesome and I can already think of a few uses for it.
The term is from a Kipling poem:
"A derogatory term for a black person, especially one with fuzzy hair... From... one of Rudyard Kipling's... poems, written in 1918. The poem is in the voice of an unsophisticated British soldier and expresses admiration rather than contempt, although expressed in terms that sound patronizing today."
Seriously people, Google your package names or you might end up looking like a divvy.
I had a set of products from Yahoo! that needed their equivalent product in a set of products from Amazon. I indexed all the Amazon products into Xapian and let the search functionality do its magic by using the Yahoo product title as the search keyword. It also had a scoring mechanism and worked flawlessly for my needs.
Skim it once to collect vocabulary, then use it as a reference for IR algorithms.
Our eventual solution was to use a trained matcher, but obviously it was not ideal since human intervention was required :(
It's great to now have this in python.
Anyway, this looks like a really useful library. Glad it's freely available.
From what I can see, this will also give 100 for 'NEW', 'KEES', 'YANK' - all of which could mean something completely different. How do they deal with this?
On occasion there are false positives, in which case Our algorithm is the Borg. They will be assimilated. Their grammatical and syntactical distinctiveness will be added to our own. Resistance is futile.
Purpose: Find duplicates in mess data sets with names and physical addresses