

Python NLTK Bayesian Classifier for word sense disambiguation - 92% accuracy - beagledude
http://www.litfuel.net/plush/?postid=200

======
terra_t
92% accuracy, unfortunately, isn't good enough.

Bag-of-words models perform pretty well at classification and search, and the
main thing you need to improve search is to boost scores when words are close
together.

You might think you could improve performance by using semantically better
defined features, but even 92% accuracy adds enough noise to foil your plans.

It's a big problem in A.I. systems that have multiple stages. You might have 5
steps in a chain which are each 90% accurate, but put them together and you've
got a system that sucks. Ultimately there's a need for a holistic approach
that can use higher-level information to fix mistakes and ambiguities at the
lower levels.

~~~
endtime
92% in general would actually be really good for word sense disambiguation,
but..."Apple" is a really easy choice. I'd like to see how he does with a
trickier word like "right" (as in civil, vs. not wrong, vs. not left).

~~~
terra_t
Yes, it is good, but not good enough for many applications. You're also left
with the issue that one kind of "apple" is more common than the other kind of
"apple" so the baseline accuracy of something that always assumes it's one
kind of apple might be surprisingly good.

That said, text-to-speech is a system where it's important to do
disambiguation of a particular set of words. For instead,

"I read the news today, oh boy", "read" sounds like "red"

"I read the news every day", "read" sounds like "reed"

You need to be able to disambiguate the word sense to be able to correctly
read the world "read". There are maybe 20 or so very common words that are
like this, so a modest amount of work in this area would be part of a good TTS
system.

------
alanman25
As someone who has spent a considerable amount of time studying NLP, I have to
say that this post outlines a pretty naive approach when it comes to
disambiguating words.

Here are some questions:

\- What happens when we change the language model? \- What happens when we
intersperse language models (English phrases within Chinese)? \- What if
someone were to just say "i love apple"?

This post title is also __very __misleading. The 92% accuracy reflects only
one particular use case. How about attempting to disambiguate hundreds and
thousands of terms?

~~~
l0nwlf
> As someone who has spent a considerable amount of time studying NLP

I'm quite interested in how will you approach this problem ?

------
vietor
A blinking favicon? Seriously? I couldn't finish the article because my eyes
jumped to the tab bar every 5 seconds.

Has anyone else seen this elsewhere? It's new to me and I was surprised by how
obnoxious it was given that the web isn't exactly a stranger to obnoxious
flashing content.

~~~
dkarl
It doesn't blink for me. (In case you're wondering why you're getting
downvoted -- though I didn't downvote you myself.)

~~~
vietor
[EDIT: Can't edit the top message anymore, but the problem was broken browser
and amusing 'blink' vs ' blink' confusion as detailed deeper in the thread.
Also in chrome it doesn't appear to animate, removing the annoyance entirely.]

Interesting. I asked a few coworkers if it was just me and they confirmed it.
The actual favicon doesn't blink for you? <http://www.litfuel.net/favicon.ico>

It reports itself as a 6 frame gif for me. Maybe your browser is just more
sane than mine (Firefox 3.x) and refuses to honor animated favicons?

I figured I was getting rightly downvoted because I wasn't saying anything
about NLTK.

~~~
dkarl
I tried Firefox 3.6 out of curiosity, and it doesn't blink, even when I open
the favicon itself in a browser tab. I think you must have changed your
browser config at some point and forgotten about it.

~~~
vietor
So this is actually pretty funny.

In my local version of firefox, the entire icon vanishes for about 5 seconds,
every 5 seconds. Causing a notable visual disturbance. In Chrome on another
system the _eyes_ blink every 5 seconds or so. So yeah, my browser is being
broken.

But when I asked people if it blinked ... they, of course, said yes.

EDIT: Also, as noted in an edit above, when viewed in Chrome it didn't animate
unless the image was accessed directly. So for you it may blink, not blink, or
really blink, depending on your browser and configuration...

------
natch
Terrible choice of words to use, since the capitalization (or not) of the word
carries so much information. It's hard to take this seriously given his
apparent obliviousness.

~~~
fauigerzigerk
It may not be the best choice of words, but capitalization plays no role in
his tests as everything is transformed to lower case on line 30:
<http://pastebin.com/4B1xHHht>

------
fibonacci1
He did it for one word. Bad article title.

~~~
beagledude
It was actually a method of using wikipedia to build your corpus for any
ambiguous word to automatically build some word sense disambiguation in your
application. One word was just a simple example of using that data.

~~~
danieldk
\- The article does not add anything new. Using Wikipedia for word sense
disambiguation has been a hot topic for some years. [1]

\- The article title implies that this is somehow a spectacular finding. Doing
word sense disambiguation for one word is not that interesting, and there is
no comparison with existing methods to show that this is actually a high
score. I suspect that it is not that spectacular, since 'Apple' is relatively
easy to disambiguate using a few context words.

[1] E.g. see:

\- Using Wikipedia for Automatic Word Sense Disambiguation, R. Mihalcea, 2007,
for a discussion of using Wikipedia to train a word sense disambiguator.

\- Integrating multiple knowledge sources to disambiguate word sense: An
exemplar-based approach, H.T. Ng and H.B. Lee, 1996, provide a good overview
of types of features that can be used in disambiguation. They use features
that go beyond simple 'bag of word' and 'bag of n-gram' features, e.g. by
using syntactical patterns.

There is a whole lot more research of course, but just to show two examples
that describe far more sophisticated approaches.

------
tkahnoski
This is an interesting exercise in building a very specific word disambiguator
('apple' the company vs 'apple' the fruit).

It is a testmanet to NLTK that this can be accomplished in less than 100
lines.

~~~
danieldk
Maybe apart from stemming), it's not hard to implement this in ~100 lines
without NLTK.:

\- In naive Bayes classification, model parameters can usually be estimated
using relative frequencies in the training data.

\- WordPunctTokenizer is a very simple tokenizer that makes anything matching
\w+ and [^\w\s]+ a separate token.

\- Extracting Bigrams from a list of tokens is trivial.

Of course, using NLTK will be very helpful in many situations, but this is
hardly a testament to NLTK.

------
jlees
I always prefer to see these things in context (how well does a naive rule-
based classifier do? what's the P/R/F-score i.e. how many Apples are apples
and apples are Apples? What about Apple Records?)

It's still fun to remember how quick and easy something like this is though.
Any interest in similar articles on named entity recognition, sentiment/topic
classification and spam filtering? I've been meaning to do a few for a while,
but you know how it is.

~~~
beagledude
I'd love to see some more articles on the subject out there, definitely take
the time to post something. Entity or sentiment would be my first choices :)

------
cybernytrix
I'm surprised that no one mentioned this paper that first evaluated this
approach to using Wikipedia data:
<http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-259.pdf> That said, the major
drawback of using Wikipedia is the size. If this approach is to be used for
all words (not just Apple) then the total training corpus will be several GBs.
Definitely not practical...

~~~
nl
What's not practical about it?

GBs of data are pretty easy to handle these days

~~~
cybernytrix
GBs and TBs of data is common, not for this task. All you are doing is Word
Sense Disambiguation and there are algorithms to do WSD that work with much
much smaller training sets. Just don't think that the exponential increase in
training data is justified...

