

Monitoring mood in the UK using Twitter - robhawkes
http://rawkes.com/articles/people-love-a-good-smooch-on-a-balcony

======
jsiarto
This is a very interesting study--nice work! Our company does social research
for all types of companies and we've found most automated sentiment analysis
to be subpar (at best, 60% accurate). The problem with just looking at words
is that there is no context of the whole Tweet and computers are generally bad
at picking up sarcasm, innuendo and turns of phrase that may contain negative
words in a positive manner (toward the brand or company).

I realize that this isn't the key focus of your paper, but we've found that
sampling and human analysis/tagging is far more accurate at judging the
sentiment around a brand, company or topic.

~~~
robhawkes
Thank you.

In the context of this study, I found that it was impossible to accurately
infer 'sentiment' of a single tweet or person (not just because of sarcasm and
other nuances). However, when you take the average of a group (wisdom of the
crowd) then the results are much more promising. A trends noticed across
thousands of users is also more interesting than the potentially unreliable
sentiment of a single person.

In this case, I suppose it is definitely just the words that are being
analysed – not true 'sentiment.' I wouldn't rule it out as inaccurate though,
it just depends what you're looking for and how you use the results. Compared
to other sentiment data-sets, the ANEW approach seems more more detailed (the
original scorings are created from human tagging).

I do agree though, that automated approaches can be inaccurate if you're
looking for fine-level analysis.

~~~
jsiarto
Completely agree--we always start our analysis with basic questions the client
wants answered. I also appreciate how in-depth your piece was--we need more
research and case studies like this. If I have to listen to one more social
monitoring tool salesperson tell me how amazing their sentiment analysis is
without ever showing concrete examples and results...

------
Nursie
While this is interesting, I am less than convinced that twitter users are a
representative cross-section of the UK population.

~~~
robhawkes
The point of this study and others like it is to research questions just like
that.

~~~
Nursie
Then... cool!

------
msy
Pick a random tweet from a public hashtag. Put aside all knowledge you have of
major sporting events, news & current affairs. Pick how 'happy' it is from
1-9.

~~~
robhawkes
Picking a single tweet will not result in an accurate reading of sentiment, I
talk about this in the study. In fact, the only way right now to gauge
anything near useful analysis of 'sentiment' (it's just textual analysis
really, not true emotion) is to use the wisdom of the crowd and find trends.
Surprisingly, this shows some pretty interesting results, regardless of
whether it's accurate in the sense of showing a single person's emotion.

It's also worth pointing out that the data-set used in this study to gauge
'sentiment' is based on firm psychology and actually infers much more than
just perceived happiness.

~~~
msy
So a single tweet's analysis will be inaccurate, but an aggregation of many
inaccurate results becomes meaningful? Obviously you can find interesting
trends in large pools of data but how do you find an interesting and valid
trend in a large pool of inaccurate data? How do you know you've found a trend
and not an artifact of your algorithm?

~~~
robhawkes
"So a single tweet's analysis will be inaccurate, but an aggregation of many
inaccurate results becomes meaningful?"

Effectively, yes. It's called the Wisdom of the Crowd:
<http://en.wikipedia.org/wiki/Wisdom_of_the_crowd>

To rule out artefacts you need to work out a) what you're looking for, and b)
whether it is backed up by anything else. For example, in this study my
findings are backed up by other, different studies. The findings also
correlate with key public events.

Being able to infer true sentiment is not what is claimed here. Instead,
you're able to infer 'sentiment' trends that _are_ backed up in some way by
other studies and research.

I'm 100% sure that these approaches aren't perfect, however they are proving
useful. For example, one group of people are using a very similar approach to
take average 'sentiment' on Twitter and use it to predict stock market
fluctuations 3 days in advance. It works and it's proven not to be fluke.
Something is in the results, however inaccurate a single tweet is.

~~~
msy
The only firm that I'm aware of that actually trades on twitter data is
Derwent Capital Management. Strangely for a company that claims to be sitting
on a crystal ball for the markets they've chosen to provide a platform for
others to trade on rather than simply making all those billions themselves,
odd that.

I'm well aware of the concept of the wisdom of the crowd but if your incoming
data is noise then your ability to build any kind of aggregate analysis on top
of it is going to be nil & Post-hoc analysis means you can find only things
you already know are there. ANEW/AFINN etcetc are basically the white flag to
any kind of meaningful automated analysis of tweets and resorting to simply
dumb word counts instead. Yes, you can capture a broad pattern but the shape
and strength of that? The contours are an artifact of the list you use, it's
utterly arbitrary. Throw a few more random phrases on the list, assign them
some arbitrary values and presto, new results! If an tweet goes round twitter
in a minute with hundreds of thousands of RTs this kind of analysis will miss
it completely unless it's lucky enough to be preloaded with the right
dictionary. To work in this kind of context an algorithm has to be able to
trim its own sails.

There's nothing particularly wrong with your work, I'm just sick of the cycles
wasted attempting to do an extreme version of the problems that NLP already
struggles with.

~~~
robhawkes
Don't worry yourself too much with my study if it bothers you that I wasted
cycles. This study was merely a an undergraduate university dissertation that
attempted to take a look at one area of sentiment analysis. I am not an expert
in NLP, nor do I pretend to be. I'm sure there are better and more appropriate
ways to do this.

------
willvarfar
Very interesting paper. (non-scribd version would be nice.)

I'm surprised the corpus of words is so small and that there is no attempt to
provide context. It just wasn't what I was expecting. I was expecting more of
a markov chain kind of setup.

In Swedish, for example, there are lots of negative words that are trendy to
use in a way that is extremely positive, rather like "wicked!" in English
slang.

~~~
robhawkes
Non-scribd version:
[https://www.dropbox.com/s/5r3nfol818s4qna/People%20love%20a%...](https://www.dropbox.com/s/5r3nfol818s4qna/People%20love%20a%20good%20smooch%20on%20a%20balcony%20-%20Monitoring%20mood%20in%20the%20UK%20via%20Twitter%20%28Rob%20Hawkes%29.pdf)

Context is sacrificed with the ANEW approach, though that's not to say that
accuracy and results are compromised. It just means that you need to look at
the output in a different way, after understanding the limitations of the
input.

In ANEW (the data-set used), words that imply multiple meanings are likely to
receive more neutral values (it's all coded by humans).

The SentiWordNet data-set is probably a little more like what you're
expecting: <http://sentiwordnet.isti.cnr.it/>

------
untog
Interesting- I've recently been trying to do something extremely similar for
the time period of Obama's inauguration. I downloaded 300,000 tweets, but the
overwhelming majority were positive.

Good news for the country perhaps, but it ruined my visualisation plans...

~~~
robhawkes
How did you analyse them? Most rough or basic approaches result in
overwhelmingly neutral or positive outcomes.

~~~
untog
I tried a sample with some existing 2012 Politics Twitter data:

<https://code.google.com/p/sasa-tool/>

and as you state, even the most negative tended to come out as neutral at
best. However, even a basis analysis of the hashtags used seemed to show that
most messages were positive. It seems that Twitter as a whole leans somewhat
to the left.

My original plan had been to create a heatmap (similar to one I made here:
<http://heatmapdemo.alastair.is/>), with red and blue 'clouds' to indicate
which parts of the country were happy and which were angry as the inauguration
went on, but the data just doesn't seem to be out there.

------
gmac
Related: <http://mappiness.org.uk> (my PhD, a longitudinal happiness study
using an iPhone app — thesis at <http://etheses.lse.ac.uk/383>).

------
brador
Could I get a download link on that article? Scribd is paywalling.

~~~
robhawkes
Sure can!

[https://www.dropbox.com/s/5r3nfol818s4qna/People%20love%20a%...](https://www.dropbox.com/s/5r3nfol818s4qna/People%20love%20a%20good%20smooch%20on%20a%20balcony%20-%20Monitoring%20mood%20in%20the%20UK%20via%20Twitter%20%28Rob%20Hawkes%29.pdf)

~~~
brador
Thanks Rob! Is the software open source? if yes, got a link?, if not, what
tech/stack was used please?

~~~
robhawkes
I'm going to be refining and updating the tech behind it (it's quite old) but
I wrote a little more here, at least for the Twitter scraping:
[http://rawkes.com/articles/how-i-scraped-and-stored-
over-3-m...](http://rawkes.com/articles/how-i-scraped-and-stored-
over-3-million-tweets)

As for the sentiment analysis… I'm unable to release the data-set as it's
owned by a university in Florida and only available to students. A quick
Google on sentiment analysis, or 'Affective Norms for English Words' will come
up with useful things. :)

------
Atomcan
Do you know Ireland isn't part of the UK?

~~~
robhawkes
You'll also notice that part of France is included in the Twitter corpus as
the geo-fence used to enclose the UK was a rectangle and so included some
unwanted areas on the edges. A restriction with the Twitter Streaming API,
unfortunately.

~~~
Atomcan
Fair enough. Did you include that in the paper? I can't find it.

~~~
robhawkes
I'm not allowed to release the raw tweets due to Twitter API restrictions. I
wish I could!

