So does this mean something like crowd sourced scoring of sentiment does better ...

andischo · on Oct 3, 2018

Short answer: yes, crowd sourcing would work better.

Long answer: It's difficult to determine how good/bad people actually are at detecting the correct sentiment, as data sets containing phrase/sentence <-> sentiment pairs are often created by majority decision of human taggers. E.g. 7 people are given the same training examples and whatever most of them choose is then used as "correct" answer (gold standard). This might not be the real correct answer though. However, even if we accept this gold standard to actually be the absolute truth, most humans only have a correct detection rate of about ~80% (this is a very rough number, as it depends strongly on the source material, e.g. Tweets, product reviews, etc.). Still, this is way better than computers perform at the moment.

tomsmeding · on Oct 3, 2018

Then again, I assume those texts are written by humans for humans. So isn't the "correct" sentiment exactly what humans tend to make of it? And if humans aren't very good at detecting the sentiment, maybe the writer is at fault, not the readers.

I think letting a number of people read the text and choosing the majority vote as the text's sentiment might not actually be a very bad way of determining that.

dwaltrip · on Oct 3, 2018

It might be correct to say that a group of humans is interpreting the sentiment "incorrectly" if they don't have all the relevant context / information.