

TextBlob – Finding Sentiments in Text - shekhargulati
https://www.openshift.com/blogs/day-9-textblob-finding-sentiments-in-text

======
languagehacker
TextBlob is nice, but for sentiment analysis it basically wraps Pattern
([https://github.com/clips/pattern](https://github.com/clips/pattern)).

Pattern uses a whitelist heuristic approach that's got poor precision and is
no longer state of the art.

Not that there's anything wrong with this code, but anyone who is looking to
use sentiment analysis in their work should consider researching more up-to-
date alternatives which don't necessary have Pythonic alternatives just yet.

~~~
rmrfrmrf
It's not state of the art, but I just ran this guy's code with about 100
samples and it was spot on. It should be noted that the "state of the art" in
sentiment analysis is actually pretty far advanced. For basic polarity
testing, this works fine.

~~~
mark_integerdsv
Would you sell the results?

That's basically my measure for real world vs academic.

~~~
rmrfrmrf
As a library, I'd have no problem selling a service that used this, with the
idea that for an MVP we could launch and then switch out with a more robust
library.

------
fintler
Does anyone know of any email clients that will bump potentially negative or
angry emails to the top of my inbox?

~~~
sandyshankar
We are working on a tool which will flag mails that might require urgent
attention. You can mail me if you are interested in discussing further.

------
primaryobjects
An API like that is good for an introduction, but I think you'll get better
results with a machine learning approach (my pet project
[http://www.sentimentview.com](http://www.sentimentview.com)). When I was
running tests with a baseline algorithm (just matching against positive and
negative keywords), I saw results from 62% accuracy with the baseline to 80%
with an SVM [http://blog.sentimentview.com/post/59031004797/learning-
curv...](http://blog.sentimentview.com/post/59031004797/learning-curves-from-
twitter-sentiment-analysis).

~~~
nl
I'm looking at your examples[1], and I'm seeing a _lot_ of analysis that seems
pretty wrong to me.

To be fair, I haven't counted and done the maths (and I understand that human
sentiment analysis often doesn't agree with other humans), but.. well, look at
these examples:

Positive: Thank you Urbana voter for pointing out what we knew. Cheat lie
steal the election. We will NEVER become an Islam nation as Obama wants.

Positive: Fun fact: @kbzeese got 1.5% of the vote in 2006 US Senate General
Election, but thinks he knows what the people want #headdesk

Negative: What is our present condition? We have just carried an election on
principles fairly stated to the people.

Does your model give confidence? It looks to me like you are making it bi-
modal, but if you had a "neither positive nor negative" category it might fix
some of the issues?

Eg, this is rates as positive, but I'd rate it as neither positive nor
negative: "Should ballot papers in Northern Ireland include photographs of the
candidates standing? Make your views known.
[http://t.co/NySCHsmecE"](http://t.co/NySCHsmecE")

Of course tweets are a pretty difficult thing to run sentiment analysis on:
those random hashtags break a lot of machine learning models, whilst a human
can read them and realise something like "#headdesk" is probably bad (although
in that case the phrase "thinks he knows" is something that a model could
probably use if it understood n-grams)

[1]
[http://www.sentimentview.com/#examples](http://www.sentimentview.com/#examples)

~~~
primaryobjects
Yes, all very good points. I didn't use a neutral category, as it's difficult
to gauge whether a tweet should be neutral. I've thought of detecting them
based upon news-speakish tweets, as you've pointed out. That would require its
own machine learning run, just to sort out neutral from containing-sentiment.

Also keep in mind, different topics work better. The term "election" has a lot
of news headlines, which many are probably neutral, skewing the results. More
consumer-ish topics yield better results. But yes, tweets are difficult to
analyze. I've done another recent experiment with tweet analysis, if you like
this kind of stuff
[http://primaryobjects.com/CMS/Article158.aspx](http://primaryobjects.com/CMS/Article158.aspx)

------
talleyrand
I've found TextBlob to be a great tool.

