

Looking At The World Through Twitter Data - arashdelijani
http://arashd.scripts.mit.edu/blog/?p=34

======
Anon84
If you're interested in how information diffuses through social networks like
Twitter, take a look at Truthy (one of my projects):

<http://truthy.indiana.edu>

    
    
         Truthy is a system to analyze and visualize the
         diffusion of information on Twitter. The Truthy system
         evaluates thousands of tweets an hour to identify new
         and emerging bursts of activity around memes of various 
         flavors. The data and statistics provided by Truthy are
         designed to aid in the study of social epidemics: How do
         memes propagate through the Twittersphere? What causes a
         burst of popularity?

~~~
arashdelijani
sounds like a cool thing to tackle. We'll definitely look at it!

~~~
Anon84
Great. Let me know if you have any questions or suggestions. I'm nearby at
Northeastern.

------
TravisPe
A friend and I started playing around with twitter data back in early 2010. We
currently have something close to over 587 million tweets collected (We
stopped collecting earlier this year). We only pulled English tweets and those
that described what someone was feeling (Im, I am, I feel, I am feeling, etc.
along with the negatives I don't feel, I do not feel, etc).

We were able to see some interesting events happen during the time though.
This is a graph of the anxiety levels of twitter on March 11th, the bottom
axis is the hour of the day EST. The earthquake hit Japan @ 1:46 EST.

<http://i.imgur.com/BeBwa.jpg>

There is a strange dip around noon that we are unsure of how to account for as
our servers did not report any failures.

It was a fun project to play around with.

~~~
cpeterso
> There is a strange dip around noon that we are unsure of how to account for
> as our servers did not report any failures.

Maybe people are away from their computer at lunch.

What do the blue and green line colors indicate? It would also be interesting
to track emoticons. :)

~~~
TravisPe
The green line represents the number of tweets that were marked as being
anxious and the blue lines represents tweets marked as calm.

You can see that after the tsunami hit there was a general spike in the
overall traffic, but a much larger spike for tweets where the user described
being anxious.

We also analyzed the tweets for emotions flagging each to either be "happy" or
"sad". Don't have the data able to be displayed in any consumable format at
the moment though.

These are some logs for the day (totals)

    
    
                       Calm   Anxious Happy   Sad
      2011-03-08       2034   8730    77032   94119
      2011-03-09       1349   5129    47708   59406
      2011-03-10       1614   6020    51623   72214
      2011-03-11       4126   20427   87763   126688
      2011-03-12       3251   13009   104434  136389
    

We had 96 adjectives we used to filter for anxiety and 3242 adjectives we used
for emotions (happy/sad).

------
Permit
Out of curiosity, is Twitter data such as this freely available to anyone, or
was this specially acquired for this set of students? I can imagine a number
of interesting projects that might arise out of such a data set.

~~~
Anon84
You need to get whitelisted to have access to it. The only problem is that
since they partnered with Gnip, Twitter no longer gives whitelist access for
free
([http://www.readwriteweb.com/archives/twitter_to_sell_50_of_a...](http://www.readwriteweb.com/archives/twitter_to_sell_50_of_all_tweets_for_360kyear_thro.php)
).

I was lucky enough to get it more than two years ago and have been
accumulating data ever since.

------
tmostak
I've also been collecting twitter data for a bit. I developed a heatmapping
application that runs on the GPU to produce time-animated heatmaps in real-
time for any user-generated query over a Solr database of hundreds of millions
of geotagged tweets. You can see a rough demo at <http://youtu.be/4_v2EZGiA7w>
. Hopefully I'll release it as a web app when I get time this summer.

------
seeingfurther
We've been working on a similar project since last year @
<http://smogfarm.com/> Feel free to get in touch!

------
akshaykarthik
Wow... This is awesome. I actually did a project for my high school science
fair that focused on analyzing twitter. It was no where near as sophisticated
but it really opened my eyes to the massive amount of data and the
availability of commodity hardware that can actually handle terabytes of data.

------
joejohnson
_But, this is because non-English tweets that we have discarded are much more
frequent during the night in our time zone, and they often don’t contain the
word ‘a’ as often as English tweets do._

This doesn't make sense; are they only discarding the non-English tweets
during certains times?

~~~
arashdelijani
We just mean that there's more tweeting going on in non-English speaking
countries when it's night-time here.

~~~
tmostak
Why don't you offer an option to normalize against the number of English-
speaking tweets (or any other language you can pick out) over a given hour.
You could build your own classifier or use something like Apache Tika.
Regardless, really really nice work. The graph is beautiful and highly
functional. Did you guys write the graph lib yourselves?

------
jermaink
Hi, if you like that kind of stuff, I might give you an intro with Peter
Gloor, who is author of swarmcreativity.net and at the MIT Center for
Collective Intelligence. Tag #Twitter, Stock Prediction, Mood etc. You might
meet on campus :)

~~~
arashdelijani
We know Peter, actually. He's a great guy and we've been talking to him about
this. Thanks though! :)

------
grout
<http://topsy.com/> yeah. we do that.

------
akg
Interesting data. I would be curious to find out how the general sentiment
correlates with consumer behavior, e.g., financial market swings, purchases on
amazon.com, google searches, etc.

~~~
Anon84
Twitter Mood Predicts the Stock Market
[http://www.sciencedirect.com/science/article/pii/S1877750311...](http://www.sciencedirect.com/science/article/pii/S187775031100007X)
(and also in the arXiv <http://arxiv.org/abs/1010.3003> )

------
roarktoohey
It would be cool, possibly profitable, to see stock symbols and their price
change mapped vs. mentions of the ticker (like IBM).

------
tzm
Great work. I'll be following your updates. I'm building a platform for
developers to crunch such APIs / data sets..

------
mrlinx
Is any of this data available? It would be great to have access to it.

------
christiangenco
A bit off-topic, but what did you use to draw the graphs?

~~~
arashdelijani
We used Flot after trying out quite a few other libraries
<http://code.google.com/p/flot/>

------
molsongolden
My friend Kang and I

Ahhhhhhh

