Hacker News new | past | comments | ask | show | jobs | submit login
Sentiment Analysis in Python (andybromberg.com)
117 points by abromberg on March 15, 2013 | hide | past | favorite | 8 comments



Cool post, Andy. NLTK is a lot of fun, but it's not necessarily a production-ready solution -- for instance, scaling it out to other languages may pose some problems with respect to utf-8. NLTK's real purpose is more for pedagogy, and your blog post is a nice addition to teaching people Python and computational linguistics at the same time.

You might be interested in checking out pattern ( http://www.clips.ua.ac.be/pages/pattern ). It has a heuristic approach to sentiment analysis built right in that might be worth comparing your features against.

Finally, as far as classification goes, Python is pretty all right, but can be a tad slow working through large amounts of data. I've found that text classification at scale is best left to an external library, with Python doing feature extraction and managing the data pipeline. In the past I've built out feature sets with Python and then passed them to TADM ( http://tadm.sourceforge.net/ ). The advantage of TADM is that, being written in C++, it's meticulously optimized. Of course, you have fewer modeling options available to you. That's just one example; there are plenty of these kinds of services written in Java, too, for instance.

Thanks for a good read!


Thanks for the comment! I'll definitely check out Pattern and TADM and see what I can do with it.


Great to see our good friend Jacob Perkins aka Streamhacker referenced several times in the article. The man's a genius and an inspiration. Turning Text Processing into a successful self-sustaining API "project" http://streamhacker.com/2013/02/27/monetizing-textprocessing...


That was a great read. As I understand OP's main concern here is accuracy. What about performance? NLTK is good start point and it deemed as slow.I really like to hear about runtime performance for same functionality with Python and R.


I think NLTK is pure python because it's supposed to work as a teaching tool as well. Perhaps pypy can come into play for that, although in real world situations I've only see marginal gains from pypy so far.


I really enjoyed this. It's a good practical example that's helped me to make sense of a lot of the words that fly around in this space.


Really good stuff. Any other good links on sentiment analysis using python? For example to analyze tweets?


great write up, and great analysis. i looked at this about a year ago, wanted to do some playing in NLTK and figured twitter data was a place to learn to do sentiment analysis. my bright idea - movie reviews and tasting - has been done to death :)

really great writeup, thanks for this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: