Geographical analysis tools should be used in these types of analyses, apart from just looking at blobs on a map. I used k-means based cluster analysis to find groups of happy and sad areas but again the groups turned out to be nothing conclusive.
The web GIS company I ended up working for used sentiment analysis of tweets by aggregated them into regions, so as to find positive and negative areas during a specific timeframe (for example, US elections). The regions had demographics which could be used statistically, and in general some interesting patterns were observed.
When you're using things as short as Tweets, and as broad as "general sentiment", you're probably making accuracy even worse, to the point that simpler demographic analysis or bag-of-words clustering (i.e., cluster areas by diction rather than by sentiment) yields more reliable results, even for sentiment.
I've built a map that takes a geofenced stream of tweets and runs AFINN-111 sentiment analysis on them, and then displays them in real time on a map of London.
Negative sentiments are displayed as Red tweets, happy tweets are Blue.
The whole thing is built on node.js using node-tweet-stream, node-sentiment and socket.io. The frontend map is leaflet with stamen design's Toner tiles.
It's quite fun to watch, especially when there's a football match or a concert. If you click on the "follow tweets" checkbox, new tweets pop up as they arrive, although currently that makes the map pan north.
Maybe include a feature to select the colors for happy/sad/average with a button to return to defaults?
Black for sad, light grey for neutral, something like a medium bright green for happy would be my picks.
We did the same with Tweets and Surfing. http://devwax.herokuapp.com/ from the meetup: http://www.meetup.com/DevWax/. It was all done in a weekend with some drinking and surfing, so it's a bit rough. The trouble with surfing was that the locations are very disparate and hard to guess. Fun to have a go at though...
Are you in London? (We are)
I'm only using tweets on which users chose to publish their locaiton, so it isn't all of the tweets, but a good chunk of them.
This approach only works when aggregating tweets for a larger area. E.g. comparing 10,000 tweets each in UK county, or perhaps for cities.
For even larger areas (think regions / countries) you could look through the user bios, or previous tweets to pull out any names or locations and do some analysis to work out which broader region they are in.
One suggestion I would have is some sort of filtering based on the content of the tweets. This tweet returned "feeling good":
"New post featuring: @NewLookPRTeam @nextofficial @Matalan @hmunitedkingdom @uoeurope @Accessorize @ASOS http://t.co/nGsFj306xT #fbloggers"
This one returned "feeling average":
"@dannykobe17 @DanielRacheter @Khuds_ @shangambling @Umar_Wilshere19 @_mikenewell_ @BlueKay10 is that ollie?"
Whereas there's absolutely no real sentiment to derive from this sort of thing.
I chose red and blue because they're quite perceptually separated, so it's easy to point them out on the map.
All in all, though, impressive. Sure some are misclassified but it seems like a significant majority are not, including a lot of the hard ones. Good work!
It's got a bit more than just twitter feelz. Mostly Boris Bike usage rate :D
The impressive part of this for me is the visualization. Really nice.
I would disagree but apply the caveat that there has to be heavy filtering of what is being analysed to derive anything of value from it.
A tweet like:
"“@BarkhamTaylor: #beatcameronathisowngame @aliceehoughton” let's get it trending #adrian"
Has no value for sentiment analysis, yet is lumped in with the rest of them, while something like:
"Sunny day in London :) @ Green Park http://t.co/S71RQR2luU"
Clearly has value regarding sentiment analysis. The current problem being that all of the junk gets marked as "average" or similar because sentiment can't be derived from it, which in the overall set skews things greatly.
i want to do also nyc and sf.