

Ask HN: How to determine temperature of text? - manch

Given a review, say, about a book. What is the best way to determine if the review likes the book or hates it? Just doing a keyword search of "good" or "best" doesn't seem to cut it. What if the review says something like "this is not as good as that"? Some semantic analysis seems necessary.<p>The harder question: now that we know the review is positive, how can we assign a degree of positiveness? "The best since sliced bread" is much more positive than "good, but can be better".
======
mbrubeck
The usual approach for these problems is to admit that you don't know what the
rules are, and instead train a statistical classifier based on some sample
data:

<http://crm114.sourceforge.net/>

Depending on how hard your problem is, the accuracy of the classifier may not
be very good. You can try to come up with fancier "features" to extract from
the items and feed to the classifier, or you can use "artificial artifical
intelligence" and just farm out the work to Mechanical Turk at a penny per
review.

------
gdp
Well outside my field, but I believe this is a reasonably well-established
problem in text processing/NLP, known as "Sentiment Analysis" -

<http://en.wikipedia.org/wiki/Sentiment_analysis>

My very brief google scholar search on that key phrase turned up a whole lot
of results that look like they may provide fairly concrete algorithms and
techniques for doing what you describe.

------
keefe
As other posters have said, it's a genuinely hard problem that touches on
natural language quantification. You can find an excellent resource on the
relationship between english words here <http://wordnet.princeton.edu/>. You
could manually rank a set of adjectives and then use wordnet to find synonyms
and generate some heuristic from there, but the bottom line is I think it's
going to be a huge time sink unless you get someone with NLP experience
involved.

------
manch
Thank you all for your wonderful comments. I now have some concrete directions
to look. I must say it is much harder than I initially thought.

Are there companies employing any of these techniques to provide some service?
This seems to be the next step up from keyword-based search. I remember seeing
some startups providing a gauge of positive press coverage on stock symbols.
How do they do it?

~~~
mbrubeck
For one example, I've seen a couple of sites that classify Twitter posts about
movies as positive or negative. I don't know what techniques they're using.

<http://flixpulse.com/>

<http://www.twittercritic.com/>

~~~
manch
Just checked out the 2 sites. They don't seem to be active? The movies there
seem to be dated.

Actually flixpulse.com classified some reviews wrong. Here are some of the
"bad" reviews on "Bolt":

\-- "Bolt" is bringing AWESOME back \-- @seanbonner awesome! BOLT loves you
back.

Whatever algo they use need some tuning. But I don't think anyone is actively
working on it anymore.

