
Stanford researchers to open source model they say has nailed sentiment analysis - suprgeek
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/
======
hooande
_Socher thinks his model could reach upward of 95 percent accuracy, but it
will never be completely perfect._

Ridiculous accuracy for something as complex as sentiment analysis. You don't
hear established researchers say something like this often. Moving any number
from 85% to 95% is the work of the gods.

I wonder if the code they release will include some version of the data from
the mechnical turk project. Code for this is great and many people (myself
included) will be able to learn a lot from it. But it won't have the same
level of reproducibility without the data.

If they do release it they will effectively be giving away the money they
spent on mechanical turk. 11,000 HITS ain't cheap and they probably had
redundant sampling. If they decide to make this data public as well it would
be a big win for research because labelled data is so important to machine
learning work.

Open sourcing the code associated with a research paper is already a huge
deal. It's great to see big name researchers like Andrew Ng pushing the trend
for publishing code. If nothing else this is a great example for computer
science papers going forward.

~~~
warmfuzzykitten
Yes, but a little realism here: They have actually moved it from 80% (using
single word analysis) to 85%. That's fantastic, but not the work of the gods.

~~~
ninjin
I agree that the article is sensational and what really confuses me is that
when I read the article and their press release (found below) they are far
more humble and down to earth (I really like the writing and the work in
general). My personal guess is that this is a classical case of a journalist
going overboard (been there...).

[http://engineering.stanford.edu/news/stanford-algorithm-
anal...](http://engineering.stanford.edu/news/stanford-algorithm-analyzes-
sentence-sentiment-advances-machine-learning)

------
mrmaddog
Here is a link to the live demo:
[http://nlp.stanford.edu:8080/sentiment/rntnDemo.html](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html)

This is really fun to play with, and I'm surprised how well it can parse the
sentiment of sample sentences I threw at it. I've tried a couple random
examples (like "I don't know what the artist was smoking, but the song made no
sense (though I liked the beat!)") and have not yet gotten a wrong analysis.
Even the phrase parsing is pretty spot-on.

As a side note, this is much more interesting than the "sediment" analysis I
excepted after skimming the title. (Unfortunately though, the analyzer got
this final sentence wrong:
[http://cl.ly/image/301u1q46263m](http://cl.ly/image/301u1q46263m))

Edit: seems like this system could get significantly more robust with more
data. If you look in the comments section, you can see some comments from the
professor himself, i.e. "Possibly because the word "buying", only appears once
in the entire dataset and it's in a pretty negative context:
[http://nlp.stanford.edu/sentiment/treebank.html?w=buying"](http://nlp.stanford.edu/sentiment/treebank.html?w=buying")

If you gave it 100,000 phrases, I wouldn't be surprised if it could hit the
95% mark that Socher mentions.

~~~
cantrevealname
> I'm surprised how well it can parse the sentiment

I tried to fool it, but it took quite a bit of effort to contrive a sentence
that it got wrong.

The sentence I finally fooled it with:

\--> I enjoy clipping my toenails to paying any more attention to this movie.

which it rated as highly positive.

~~~
eulerphi
Your sentence is malformed. It should be "I'd enjoy clipping my toenails to
paying any more attention to this movie."

Still, I'm not even sure if using the preposition "to" in that manner is
proper english.

~~~
telephonetemp
It isn't proper English. A proper version of this sentence could be "I'd
_prefer_ clipping my toenails to paying any more attention to this movie" but
even then the "paying any more attention" part would probably cause a _human_
sentiment analyser to do a double take at it.

~~~
StavrosK
The software above correctly rates this as negative.

------
jfriedly
I read the paper when this first showed up on HN[1]. The most important thing
they did was to create a training set with higher granularity in the data than
much of anything previously seen. Based on their training set, their algorithm
was able to achieve 85% positive/negative accuracy on sentences, but
previously state-of-the-art algorithms moved from 80% accuracy up to 83%
accuracy when adapted to their training set. While their algorithm appears to
be better than anything they tested against, this is fundamentally an
incremental improvement, not groundbreaking research. The real win here came
from using a better dataset.

[1]
[http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf](http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)

Edit: formatting

~~~
vivekn
Will be interesting to play around with that dataset.

------
PaulHoule
I wouldn't quite say they "nailed" it.

One issue is that a system like this needs to be trained for the specific
kinds of documents you are processing. For instance, if you are looking about
people's opinions on stocks, there is specific terminology to look for such as
"buy", "sell", "short" or "long", "missed earnings", price targets, etc.

This isn't so much a problem with their method, but it is a problem w/ the
specific model they are publishing.

I like that they are using "beyond bag of words" methods and I find it very
believable they could get much better results if they had a bigger training
set and more effort in tuning.

One advantage us commercial folks have is that we don't need to bet on every
hand. Reviews like that one of the "Room" are ambiguous at best and should be
filed as so.

------
lightsidelabs
First, let me say that this is really creative work and I'm glad it's being
presented at EMNLP.

"Sentiment analysis" is too broad of a category to really cover in a single
article like this. What they've done is taken a very difficult problem,
sentence-level binary sentiment, and made solid progress on it. The baseline
for this dataset using totally naive techniques is around 75%, and their
results are the state of the art.

The move from 85% to 95% isn't really an interesting one. What really matters
is exploring the numerous other open questions in the field of affect
recognition, notably two thing:

* Sentiment at different granularities. Document level analysis has been far above 90% for years; this work is pushing forward sentence level. Other work is making great progress on targeted opinions even finer-grained than that, like looking at specific attributes of products. What if you like a movie's acting but not its plot? This structured nuance is not addressed here.

* Domain adaptation. You talk about movies in a different way from almost anything else. A movie review is positive if it's unpredictable; your opinion of the unpredictability of dishwashers or political candidates is probably different. For anything beyond movie reviews this method may work, but this particular dataset certainly won't.

Looking forward to seeing more from this group, as ever; Chris Manning's
research team has an excellent reputation in the field.

------
eli_gottlieb
_What makes the Sentiment Treebank so novel is that the team split those
nearly 11,000 sentences into more than 215,000 individual phrases and then
used human workers via Amazon Mechanical Turk to classify each phrase on a
scale from “very negative” to “very positive.”_

Can someone here please explain whether the use of Mechanical Turk here is a
cop-out from building a better computational model, or just an ordinary use of
supervised learning in place of unsupervised?

~~~
landongn
Seems straightforward:

\- certain combinations of words within phrases score all over the place

\- hand those to mechanical turk for human classification

\- understand where the results differ from the model

\- patch the model where necessary when it breaks down.

The example they gave with the "but..." at the apex of the sentence is
difficult primarily because it's ambiguous to what proceeds it. It could be
positive or could be negative, especially from a programmatic standpoint.

Really fascinating stuff. Can't wait to see the code.

~~~
eli_gottlieb
No, wait. So as someone who _isn 't_ remotely well-educated on machine
learning, what teaching model _would_ this be called? "Reinforcement learning"
is an AI term, but it's more what this sounds like: sometimes the model is
wrong, so we hand the data off to a human being who comes up with a definitive
Right Answer which is then used to fix the model.

~~~
feral
What "landongn" describes, where certain data 'score all over the place' \-
i.e. where the model tells you it is uncertain about some phrases, and where
those phrases are subsequently manually annotated by a human, sounds like
'Active Learning'. [0]

If the model is telling us that it is uncertain about specific examples, and
would like more information on examples like those, that's active learning.

That sounds different from what you describe in your post, depending on what
you mean by 'sometimes the model is wrong, so we hand the data off to a human
being'.

It depends on how we know the model is wrong.

If we know its wrong on a test datum, which is part of a big set of test data
humans labelled without any input from the model, then its standard
'supervised learning'.

If, instead, the model is 'wrong' because it expresses uncertainty for
particular test data, then, if we go and have a human classify that data it
was uncertain about, and retrain the model, then we are probably doing Active
Learning. In this case, the model/system is (at least partly) guiding the
learning process.

Reinforcement learning is neither of these things exactly - it describes a
more general framework, where the system is getting rewarded based on how well
its performing.

Lets say you want to choose 1 of 5 labels for each datum. In supervised
learning, the system gets given the right label for each training example. In
a RL setup, it might be shown an example, have to guess a label, and maybe be
told if it got the right guess, but if it guessed wrong, just told it was
wrong - but not necessarily told what the right answer was.

There's a little fuzziness to how all these terms are used in practice.

[0]
[http://en.wikipedia.org/wiki/Active_learning_(machine_learni...](http://en.wikipedia.org/wiki/Active_learning_\(machine_learning\))

~~~
eli_gottlieb
Yeah, and this is why I desperately need to take a machine learning class. Why
oh why did I finish undergrad in seven semesters instead of eight?

~~~
MaxGabriel
The CalTech course _Learning from Data_ started on Monday. The professor has
excellent reviews:
[https://news.ycombinator.com/item?id=6385602](https://news.ycombinator.com/item?id=6385602)

So far I think they're accurate. Homework 1's due in a few days, but the
lowest 2 homeworks are dropped.

~~~
eli_gottlieb
Before signing up for an online course I'm pretty set on starting my new
semester at Technion and figuring out what my Real Life We Will Count This
Against You workload is.

------
vivekn
While 95% accuracy would be a really phenomenal achievement, an accuracy in
the range of 85-90% is achievable using methods simpler than deep neural nets.
I have done some work on sentiment analysis in the past. I used a Naive Bayes
model with some enhancements like n-grams, negation handling and information
filtering and was able to get more than 88% accuracy on a similar dataset
based on movie reviews.

You can find more details here
-[http://arxiv.org/ftp/arxiv/papers/1305/1305.6143.pdf](http://arxiv.org/ftp/arxiv/papers/1305/1305.6143.pdf)
and the code over here -
[https://github.com/vivekn/sentiment/blob/master/info.py](https://github.com/vivekn/sentiment/blob/master/info.py)

------
bambax
Just tried with this great phrase from the late Roger Ebert (slightly modified
to fit in one sentence; the original is four different sentences):

> _The movie has been signed by Michael Bay: this is the same man who directed
> "The Rock" in 1996; now he has made "Transformers: Revenge of the Fallen",
> and, well, Faust made a better deal._

It correctly identifies the sentence as negative, while all words taken
individually are either neutral or positive... I'm impressed.

------
joeblau
This is HUGE. So many companies are trying to use sentiment analysis as their
marketing tool for how they parse social media. With an open source tool, it
would make it easier for regular developers who man not know much about NLP to
tap into that part of the industry.

As I'm reading though the article I see that it says the algorithm can
understand "Human Language." By this I'm guessing they mean English. One thing
I learned about sentiment analysis is that analyzing other languages may prove
to be a bit more difficult.

Another question I have is to run it up against this very basic sentiment
analysis engine that my old manager built which basically had 13 positive
words and 13 negative words and was about 80% accurate as well: no neural
networks, AI or machine learning needed.

~~~
vivekn
The model was trained on an English data set, but if you train the same model
on some other data, it can handle other languages as well.

~~~
joeblau
Okay, I'm asking because I know that other languages have nuances that english
doesn't. I didn't realize that the algorithm could perform at such a high
accuracy while still being language agnostic.

------
biot
The simple phrase "Not bad." results in a negative sentiment. This should be
at least neutral, if not slightly positive. Interestingly, omitting the period
gives a neutral result.

~~~
bkmartin
I would think this would be one of the easiest to get right... bad=negative
Not=negative ... two negatives=positive

------
fauigerzigerk
_Over time and with more sample sentences, Socher thinks his model could reach
upward of 95 percent accuracy_

It would be interesting to read the paper to find out what accuracy really
means here. I doubt that human readers agree on the sentiment of movie reviews
95% of the time.

------
eadlam
_Stanford Ph.D. student Richard Socher appreciates the work Google and others
are doing to build neural networks that can understand human language. He just
thinks his work is more useful ..._

 _" We’re actually able to put whole sentences and longer phrases into vector
spaces without ignoring the order of the words."_

Wait, didn't Mikolov et al. (Google) [just figure out][1] how to put entire
languages into vector spaces?

[1]: [http://arxiv.org/abs/1309.4168](http://arxiv.org/abs/1309.4168)

------
nl
As someone who has done some work in the sentiment analysis field, I present
this comment as the perfect example of why sentiment analysis is easy and the
linked research is clearly bunk.

------
utopkara
Just having the state of the art as open source is in itself fantastic. The
fact that their approach is a considerable improvement over the previous
approaches is icing on the cake.

~~~
PeterisP
Actually, for pretty much any NLP problem the state of art is open source -
they often aren't packaged as convenient libraries, but the actual best-in-
field methods usually have both detailed algorithm descriptions in the
published papers (from which we can and sometimes do a direct
reimplementation), and a reference implementation with available source, that
they used to get the measurements proving that it really is state of the art.
Sure, those research implementations tend to be 'not-production' level of
polish, often needing some pain to install and convert your particular data;
but they are available. In a few cases the best known method is a commercial
implementation; but then usually the #2 implementation is almost as good and
that's available.

------
bkmartin
It got mine wrong...

"That makes about as much sense as a whale and a dolphin getting it on."

Keep working on it guys... I wish I understood sentiment trees well enough to
be able to train it properly for this statement... Is a sentiment tree able to
properly represent sarcasm and innuendo? <\--- Honest question

~~~
yannyu
Highly idiomatic phrases are always going to be problems for natural language
processing. Further, people aren't 100% accurate in language processing
either.

I'd say that if you posed that statement to 1000 English speakers from around
the world, at least 1% of them would be baffled by it.

All that is to say non-conventional uses of language are always going to be a
problem for natural language processing. If a certain kind of innuendo and
sarcasm is represented often in its training data, then the model SHOULD be
able to understand it when it sees it again.

------
aroman
I wonder what the accuracy for native English speakers is in doing ternary
sentiment analysis.

I also wonder about sentences which could be understood and defended as being
positive to one human reader and negative to another.

"That is the craziest thing I've ever heard." or simply "That is sick."

------
GFischer
Wow, it will open more possibilities a lot of companies I know of (and some
projects of mine too :) ).

Off the top of my head, I know of a company that's trying to tackle online
complaints (VozDirecta.com), another that feeds "what they're saying about
your company"...

------
aantix
I feel like Borat for doing this, but I entered :

"I loved this movie.. NOT!"

and it classified it as positive. :)

~~~
koyote
I did the same!

It also doesn't seem to like swear words and web abbreviations (like j/k for
example). Perhaps with more data (from twitter or similar) it could go a long
way though; as it definitely needs to learn about the more uncouth words.

------
Abundnce10
Are there any links to the code?

~~~
mtraven
Not yet, they said it would be released in late October. Paper is here:
[http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf](http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)

------
ciferkey
I'm talking an NLP class this semester and its nice to finally be able to dig
into material like this rather than giving it a light read through. Can't wait
until the code drops!

------
TallGuyShort
Very impressive! Not to detract from how impressed I am, but I did manage to
trick it once: "It could be better" was postive / very positive.

------
anaphor
I'll be interested to read Language Log's (namely Mark Liberman's) opinion of
this once it gets released.

