

Twitter Data Dump: InfoChimps Puts 1B Connections Up For Sale - ypavan
http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php

======
physcab
Two thoughts about this:

1) It seems a little shady to be taking a scrape of another service and
putting it up for sale. It may be legal--but shady no less.

2) More data is not always better. Your application has to be perfectly suited
for the data set and if it is not, then you run the risk of drawing false
conclusions. Things are not always as easy as they may seem.

------
ypavan
The original infochimps blog post is at
[http://blog.infochimps.org/2009/11/11/twitter-census-
publish...](http://blog.infochimps.org/2009/11/11/twitter-census-publishing-
the-first-of-many-datasets/)

Seems there is very interesting info out there:

* The packaged data was contains almost the entire history of Twitter: 35 million users, one billion relationships, and half a billion Tweets, reaching back to March 2006.

* the first dataset has # Hashtags, links and smiley emoticons used across Twitter on an hour by hour basis.

* the second dataset contains @ messages, RT and favorites and who they came from - 1 Billion relations making what the company calls a "conversation metric."

~~~
petewarden
Reading the article, I thought the second data set was only mapping user id
numbers to screen names and vice versatile?I'd love to be mistaken though, the
data you describe would be gold dust.

------
ramanujan
This goes up there with the AOL dataset in terms of being a crucial asset for
anyone doing web research.

Who wants to bet that this will be available on torrents in days?

------
coderdude
This is surprising. It's a no-brainer that using someone's API to scrape their
entire site (and releasing the data!) is begging for a lawsuit. You don't even
have the somewhat weak defense of "it's public data because it's publicly
accessible" when you rip it from a service they provide. I'd be willing to bet
that at least in several places their TOS says "hey, we know what you're
thinking, but..."

~~~
coderdude
To anyone interested, I asked this question over on the InfoChimps blog.
Response I got clued me in to just how little I knew of Twitter's TOS:
[http://blog.infochimps.org/2009/11/11/twitter-census-
publish...](http://blog.infochimps.org/2009/11/11/twitter-census-publishing-
the-first-of-many-datasets/comment-page-1/#comment-233)

So disregard what I said in the context of Twitter data.

------
ajju
This cannot be legal if Twitter had a half decent lawyer write their TOS.

