
Computer program uses Twitter to 'map mood of nation' - eplanit
http://www.bbc.co.uk/news/technology-24001692
======
msy
As a human, out of curiosity I've previously tried to classify random tweets
into emotions and failed, there just isn't enough context to make the
judgements not arbitrary. In consequence I'm deeply suspicious of software
that claims to be able to do so.

~~~
Shank
I worked with a lot of Twitter data over the summer as part of an internship.
I can say that signal to noise ratio is so abysmally inverted, this is almost
certainly doing a lot of filtering just to get meaningful source data.

For instance, you have to filter out most posts with links (because they tend
to have very little associated data, or are spam), anything in languages you
don't have parsers or datasets for, etc. Then you can cut on character count
(because there are a surprising amount of useless tweets that are under 5-9
characters)...

As others have and will continue to state, this further cuts down the sample
size of an already fairly small sample (500+ million in the world[0]). If you
want to be specific and only look at a single nation, you're cutting that
sample size based on who has location turned on and/or broadcasts it in their
profile. This is a significantly smaller number.

Edit: This is obviously a better sample size than a random poll of
individuals, however, but that doesn't mean that the cumulative quality will
be drastically higher.

[0] - [http://www.statisticbrain.com/twitter-
statistics/](http://www.statisticbrain.com/twitter-statistics/) (sources from
three different input points)

------
TomGullen
Not map mood of nation, map mood of Twitter users in a nation. Big difference.

~~~
diminoten
It's a large overlap.

~~~
JoeAltmaier
Its a tiny, tiny overlap. Check this out for twitter demographics:

[http://socialmediatoday.com/index.php?q=SMC/78505](http://socialmediatoday.com/index.php?q=SMC/78505)

~~~
diminoten
20% is hardly tiny.

~~~
JoeAltmaier
Read carefully. 20% of 20% == 4%

------
primaryobjects
This is a timely article, as my current project deals with Twitter sentiment
analysis [http://www.sentimentview.com](http://www.sentimentview.com).

Although, I'm not classifying by specific emotion type as the article
describes. In all, I'm seeing accuracy rates of around 80%. This is in
comparison to a brute-force word list, which scored an accuracy of 62%. I'd be
curious to see what kind of accuracy the researches in the article are
achieving, with so many sub-division classifications. Not to mention, how did
they derive their initial model?

~~~
nhebb
How do you detect sarcasm?

------
troebr
It's the first exercise in datasci-001 on coursera... Not much of an
invention.
[https://class.coursera.org/datasci-001](https://class.coursera.org/datasci-001)

------
danso
Statistics 101...sheesh. Not only is it limited to users of Twitter (and not
the nation), but it's limited to users who geocode their tweets, which may be
as little 1 to 2% of them.

After the presidential election, Gallup overhauled its polling methodology.
Why? Because they significantly overestimated Mitt Romney's popularity, and
one of the factors may have been relying too much on landline polling, as
older (i.e. more conservative) people are disproportionately likely to still
use landlines.

If Gallup's predictions are as sensitive to a factor like that, I imagine this
neophyte tech/science company is going to be way off in their measurements.

------
lukeh
What about We Feel Fine [1], saw a TED talk on this _years_ ago and loved it.

[1] [http://www.wefeelfine.org](http://www.wefeelfine.org)

------
tomrod
I love this. As an economist, I'm very interested on the correlation between
this distribution of rhetorical reactions Twitter captures and economic
happenings. I first realized there might be a connection with the BP oil
spill. One would expect that oil company and possibly general energy stocks
would fall after such a negative happening. Can we predict the impact of such
circumstances in the Tokyo exchange too, based on the rhetoric we can measure
in each nation (US and Japan)? Oh man, the future is exciting as big data
collection and analysis mature.

