A comparison of open source tools for sentiment analysis

tomw1808 · on April 12, 2015

Interesting experiment. I have a couple of questions though:

Why use a naive bayes? Is there a personal preference? Have you ever tried to use a SVM with a linear kernel, which is perfectly suitable for large text corpora?

Have you considered removing words with a too low IDF? As far as I see the words with the highest discriminative power are having a positive count of "1", which is not really surprising, that they are highly influencing your results - IMHO, please correct me if I'm wrong.

Other than that, interesting read, definitely.

sfotiadis · on April 12, 2015

I've responded you on reddit tom.

tomw1808 · on April 12, 2015

me too... for the rest of the HN folks:

http://www.reddit.com/r/MachineLearning/comments/32blqx/a_co...

or read the http://reader.newscombinator.com

alok-g · on April 12, 2015

Beautiful website design! I see that it originated from a theme by Mark Reid, though you have probably made changes (e.g., print does not show the top bar in the original theme from Mark)

Do you mind if I re-use for my upcoming blog (with attribution, of course).

sfotiadis · on April 13, 2015

I've used the Jekyll Bootstrap theme from Mark Reid (http://jekyllbootstrap.com/usage/jekyll-theming.html) and made only minor modifications. You can see them on my github https://github.com/sfotiadis/sfotiadis.github.io and of course you can re-use everything you like.

ncza · on April 12, 2015

Can someone recommend an easy setup for analysis of German texts to see if they are positive or negative? I was too stupid myself to find and use something.

rspeer · on April 12, 2015

The only thing that's easy about sentiment analysis is doing it badly.

ncza · on April 12, 2015

Are there no "solutions for managers" yet that require no knowledge of the backgrounds?

tomw1808 · on April 13, 2015

The solution for managers is outsourcing or hiring people who know what they are doing. Sentiment analysis is touching so many different topics in the area of text-analysis, its seriously not very easy. I wouldn't say its as hard as quantum mechanics, but you need a fair share of maths and programming skills to do it properly and know how to use the tools, imho.

tomw1808 · on April 12, 2015

As far as I know you can start with the pre-trained models for the Apache OpenNLP library. And from there on use a translator or something similar?