

Twitter sentiment analysis using Python and NLTK - ananthrk
http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/

======
detour
The Pattern library has sentiment analysis built-in, pretty fun toolkit to
play around with.

<http://www.clips.ua.ac.be/pages/pattern-en#sentiment>

~~~
fdb
In Pattern, sentiment analysis is a one-liner:

    
    
        >>> from pattern.en import sentiment
        >>> print sentiment(
        >>>     "The movie attempts to be surreal by incorporating various time paradoxes,"
        >>>     "but it's presented in such a ridiculous way it's seriously boring.") 
    
        (-0.34, 1.0)

------
lrvick
Great write-up. My company (Tawlk) actually open sourced a library to automate
this very thing. We typically get around 80% accuracy with about 2 million
samples.

You can grab our sample set here:
<https://github.com/downloads/Tawlk/synt/sample_data.bz2>

And check out the project here: <http://github.com/Tawlk/synt>

It also ships with a full CLI interface if you just want to play with it
without getting knee deep into the code.

Also if you want to to see a stripped down stand-alone code sample that steps
you through the process I made this gist:

<https://gist.github.com/1266556>

Enjoy :)

------
denzil_correa
A better example is shown by Jacob Perkins on his blog -
[http://streamhacker.com/2010/05/10/text-classification-
senti...](http://streamhacker.com/2010/05/10/text-classification-sentiment-
analysis-naive-bayes-classifier/)

------
abyssknight
Sounds like what tawlk does. Wonder if their training data/method is better,
though.

~~~
lrvick
The method is mostly the same one that is used within our synt library
(htto://github.com/Tawlk/synt). We built quite a bit on top of it however.
That said, the author did a great job of explaining the process.

Good encouragement for me to better document synt.

------
jasonkolb
What are neutral tweets classified as?

~~~
lrvick
It is a binary classifier so everything is at least slightly negative or
slightly positive in a range from -1 to 1.

Think of it like leveler tool used in construction. Nothing is ever
_perfectly_ level. It is either tilting one way or the other, but there is an
acceptable range people will generally call 'level'. Neutral is the same.

If the classifier rates something something as 0.001 then that is probably
safe to call it 'neutral'. It would be up to the application to decide on a
'neutral range'. You could for instance just flag anything between -0.2..0.2
as 'neutral'. It is good to define functions like these last so you can adjust
the range manually until you have reduced false positives to a minimum with
your particular data set.

