Interesting experiment. I have a couple of questions though:
Why use a naive bayes? Is there a personal preference? Have you ever tried to use a SVM with a linear kernel, which is perfectly suitable for large text corpora?
Have you considered removing words with a too low IDF? As far as I see the words with the highest discriminative power are having a positive count of "1", which is not really surprising, that they are highly influencing your results - IMHO, please correct me if I'm wrong.
Beautiful website design! I see that it originated from a theme by Mark Reid, though you have probably made changes (e.g., print does not show the top bar in the original theme from Mark)
Do you mind if I re-use for my upcoming blog (with attribution, of course).
Can someone recommend an easy setup for analysis of German texts to see if they are positive or negative? I was too stupid myself to find and use something.
The solution for managers is outsourcing or hiring people who know what they are doing. Sentiment analysis is touching so many different topics in the area of text-analysis, its seriously not very easy. I wouldn't say its as hard as quantum mechanics, but you need a fair share of maths and programming skills to do it properly and know how to use the tools, imho.
Why use a naive bayes? Is there a personal preference? Have you ever tried to use a SVM with a linear kernel, which is perfectly suitable for large text corpora?
Have you considered removing words with a too low IDF? As far as I see the words with the highest discriminative power are having a positive count of "1", which is not really surprising, that they are highly influencing your results - IMHO, please correct me if I'm wrong.
Other than that, interesting read, definitely.