

Everybody loves a good snog – Finding sentiment in the UK via Twitter - robhawkes
http://rawkes.com/blog/2011/04/30/royal-wedding-twitter-sentiment

======
TamDenholm
As a commenter on the article has already asked, it'd be nice to have an
explanation on how the sentiment is actually determined.

~~~
robhawkes
I'll be writing this up in more detail soon, but the sentiment is basically
derived through the ANEW dataset. This dataset scores a few thousand English
words with sentiment values from 1 (very unhappy) to 9 (very happy). I find
the sentiment value for each word in a tweet and then average that value by
the amount of words in a tweet to get the sentiment value for each tweet.

~~~
JonnieCache
This seems a rather simplistic approach when analysing sentiment in a country
known the world over for truly impenetrable sarcasm, and the application of
irony so blunt that it often boils down to simply stating the exact opposite
to what you actually mean, often with a total absence of syntactic cues that
this is happening.

The humour that brits derive from this often seems like it is _defined_ by
fact that these cues aren't there _at all_. In short, I think you'd have to do
a lot of quite complex higher order link analysis to determine people's true
opinions.

In this example, the royal wedding, imagine the following tweet: "This is a
fantastic and important day for britain, a fabulous use of public money, I
feel so proud to celebrate such a deserving couple!!"

Put it like this, I wouldn't bet money on a simple word scoring algorithm
getting the intended sentiment correct here.

~~~
robhawkes
Thanks for your input Jonnie, you've highlighted one of the main limitations
of text-based sentiment analysis. I've never always asserted that the
sentiment derived from this study is not indicitive of "true" sentiment of an
individual person. Rather, the sentiment derived here is simply saying that
people on Twitter are using particular words that connotate happiness or
sadness.

This is the same limitation that most other studies have found, but it doesn't
make the results any less interesting.

Another way to look at it is as signal-to-noise. Whatever stereotypes of the
UK exists, we don't all talk sarcastically (at least not all the time).
Because of this the majority of tweets, which are probably not sarcastic,
average out the hard to read tweets. Again, this is something that other
studies have found as well.

My aim here isn't to fault the method of sentiment analysis. I'll leave that
to the guys at Florida University who created it. :)

~~~
JonnieCache
Oh I didn't mean to criticise anything you're doing. There's no such thing as
bad data mining, only bad analysis of the results :)

I just worry that people in our governments and security services will sieze
on work like this which is largely being carried out for the amusement and
satisfaction of academics and general geeks, and decide that it is appropriate
to make specific judgements about specific people based on the data.

It doesn't help that I read this article just after I woke up this morning:

[http://www.guardian.co.uk/uk/2011/may/03/protester-sue-
polic...](http://www.guardian.co.uk/uk/2011/may/03/protester-sue-police-
secret-surveillance)

Combine it with this recent episode: (the guy was eventually convicted and
fined)

[http://www.guardian.co.uk/world/2010/jan/18/robin-hood-
airpo...](http://www.guardian.co.uk/world/2010/jan/18/robin-hood-airport-
twitter-arrest)

...and with all the recent hype over 'cyberwar,' I start feeling a bit chilly.

~~~
robhawkes
I agree completely. These results shouldn't be simply taken at face value if
they're to be used for further analysis.

------
mattonweb
Really interesting Rob! Well done.

------
xnerdr
This information would be much more valuble with data for a "normal" day. For
example, Is the main bump around noon just a regular lunch high?

~~~
robhawkes
You're right. I haven't been able to run every type of comparison yet, but in
the meantime you could check out the other graphs that I've made that cover
normal days:
[http://www.flickr.com/photos/robhawkes/sets/7215762573356079...](http://www.flickr.com/photos/robhawkes/sets/72157625733560795/)

