

Twitter to Sell 50% of All Tweets for $360,000 a Year Through Gnip - hornokplease
http://www.readwriteweb.com/archives/twitter_to_sell_50_of_all_tweets_for_360kyear_thro.php

======
smithbits
Let's see, at the 1000 tweets per second figure from the article that's 365 *
24 * 3600 * 500 = 15,768,000,000 tweets for $360,000. Or 0.002 cents per
tweet. So 500ish tweets are worth about a penny, and your $360K would be for
15 gigatweets which would weigh in at about 2.2 TB[1]. So my tweets aren't
worthless, they're just very very cheap.

[1] That's a hard drive manufacturer terabyte, not a real one

~~~
bennysaurus
Sweet, so my account is worth about a whole $1 in tweets

~~~
invisible
That is $1 per sale :). Maybe your account is worth $10-$20 total.

------
aheilbut
We really need (and will inevitably get) an open, distributed protocol for
status updates. It's insane for everything to be routed through one (or 2, or
3) companies.

~~~
rottendevice
Who would host the servers though? A standards organization? I doubt they
could afford it.

That, or it will be like email, where everyone has @example.com appended to
their username.

Neither situation strikes me as desirable.

~~~
chunkbot
In the future, it's quite likely that a standards organization _could_ afford
it.

Processor speed, disk capacity, network bandwidth, and available software are
all growing much more rapidly than online populations.

In some years' time I might be able to run an operation like Google, Facebook,
or Twitter from my bedroom.

~~~
eru
The current Google perhaps, but not the Google of the future.

~~~
chunkbot
I'm not sure I _want_ the Google of the future.

------
arnabdotorg
The key value is not as much in the data itself, as much as the _timeliness_
of the data. Access to the halfhose allows you to answer a _very_ valuable
question:

"What's happening _right now_?"

This question is worth a lot of money, and something that doesn't have a good
algorithmic solution(e.g. Google News.) Twitter is probably the only company
that has a privacy-compliant solution to this, hence making it a very
monetizable product.

~~~
djb_hackernews
But Twitter already provides an API for that. You can ask what are the current
top 10 trending topics at any time.

------
harscoat
_All these platforms,..., they all realize that the data they have is
extremely valuable to everyone from API partners to marketers .... I think all
these companies could see that there's more money in data services than there
could be for them in advertising._

The value would be back into the API and not into weird sponsored trendic
topics. Twitter seems to go back towards Alex Payne's vision (data hose
platform) and away from Biz Stone's (twitter as a media with _celebrities_
etc...). They could also set up separated Twitters Hoses: like there could be
automatic sensors data input for the Internet of things for instance,
separated from human input. Any link where Twitter guys are speaking of this?

------
paolomaffei
What are the technical measures to prevent people to just scrape 100% of the
tweets, if any?

~~~
dasil003
Simple rate-limiting / DOS prevention would be plenty given the volume of
tweets.

~~~
ahi
Not that hard to get around. IPs are easy to come by.

~~~
mrduncan
I disagree. Twitter is pushing upwards of 3000 tweets per second [1], it'd be
nearly impossible to scrape at that rate and go without notice, regardless of
how many IPs you've got at your disposal.

[1] <http://mashable.com/2010/06/25/tps-record/>

~~~
ahi
You don't need to scrape for each tweet. Scrape every active user every couple
hours/days depending upon their tweet frequency. Not much point in having a
database up to date every second since processing it in real time is almost
impossible anyway.

Twitter would notice a major scraping operation, but if it's done correctly
they wouldn't be able to distinguish between user IPs and bot IPs.

edit: Barracuda already did more than 10% of users just for a white paper:
[http://www.barracudanetworks.com/ns/news_and_events/index.ph...](http://www.barracudanetworks.com/ns/news_and_events/index.php?nid=387)

150,000,000 registered users only takes 170 days at 10 users a second for a
first pass. Focus on frequent tweeters for subsequent scrapes. Even among the
~20% of twitter accounts that are active, most don't need to be scraped daily,
and the most active accounts are likely spam.

~~~
njharman
> Scrape every active user

translates to "Scrape every user" unless you know of some magical way to get
list of "active" users.

Guess howmany active users there are? Guess how many servers you need running
to get through those in 1 hour. My guess is something much more than $360,000
worth / yr.

~~~
borism
_> unless you know of some magical way to get list of "active" users_

look for users who tweeted in last X days? also look for their
repliers+buddies, since they too are likely to be active. doesn't seem hugely
complicated to me?

~~~
njharman
> look for users who tweeted in last X days?

Requires you to check aka scrape __every __user to see which ones tweeted in
last X days.

------
LiveTheDream
Does this include private tweets and DMs?

~~~
irons
In the past, firehose access has meant public tweets only. I presume Gnip is
getting the same data as the Library of Congress and other firehose consumers,
but details haven't been spelled out yet.

------
djb_hackernews
I work for a media aggregator, and we do a lot of different media (microblogs
like twitter included). We index close to 100M documents a month. We also
charge a fraction of 360k/yr. I'd like to know who the target market is?

Anyone here from people genuinely ready to spend that kind of cash for 50% of
Twitter?

------
aresant
I'm trying to find my "privacy outrage" / "who actually owns your content"
soapbox but having trouble.

In fact, I find it kind of refreshing that Twitter is just flat out saying all
your base belong to us, and anybody that wants it gets access for $360,000 a
year.

This in contrast to Google, Facebook, Yahoo etc that muddy the waters whenever
it comes to how they're actually using / sharing your data.

~~~
tptacek
I'm not sure how you'd manage "privacy outrage" over Twitter messages, since
they're almost always public.

~~~
aresant
The concept of "Privacy" means to me that the average user is aware of how
their profile, their content, etc is going to be used by the host.

There's a difference between the perception of your feed being public, and
Twitter selling your feed data to a corporation to utilize for targeting,
advertising, reselling to employment agencies in the future long after your
stupid teenage profile is deleted, whatever.

When a solid 10% of the users are kids (and a much much larger percentage is
entirely clueless) it's worth questioning and the people that do know what's
actually going on have a responsibility to ask questions.

That's my best attempt to get fired up about privacy on twitter, you forced my
hand - oh won't you please think of the children?

Ref twitter age - [http://royal.pingdom.com/2010/02/16/study-ages-of-social-
net...](http://royal.pingdom.com/2010/02/16/study-ages-of-social-network-
users/)

~~~
andreyf
I'd rather not get into a discussion of what "privacy" means, but Twitter (and
Facebook) are almost certainly violating most of their users' perceptions of
what Twitter does with their tweets. That said, if they anonymize the tweets
and hold the buyers contractually obligated not to de-anonymize them, I think
most users would be OK with that. Each user owns their tweets, as does
Twitter, but only Twitter owns all of the tweets.

~~~
tptacek
It's a broadcast medium. People literally keep score with each other about how
many people they can get to follow them, and how many people they can get to
RT their messages. I think you're simply dead wrong about this.

~~~
pyre
1\. Reality and perceptions don't always collide. Like the people that post
messages on Facebook about calling into work sick when they aren't... only to
have their boss read the message and fire them over it.

2\. What about the people whose Twitter accounts are private?

3\. The people 'racing' with each other for followers or retweets are by
definition more public than most other people. Unless you are going to claim
that all or most Twitter users fall into that category. Trying to use them to
categorize the user base of Twitter as a whole seems a bit off.

4\. Twitter _is_ a broadcast medium, but what we are talking about are the
perceptions of the people using it, not the reality of the situation. There
are plenty of people that broadcast stuff publicly that they wouldn't want
their parents to read. Why would they do so? "My parents aren't on Twitter."
I'm sure the same thing applies to bosses and the workplace.

------
robryan
I think a distinction needs to be made, I agree that a company using this many
of the Tweets as part of a commercial effort should pay.

I also think that a research effort with the ability to process 100% of Tweets
can most likely afford to pay. Something like the 5% though, I think a great
deal of research can be done on the Twitter platform may be prevented because
of the cost. For research maybe you could charge the bandwidth required to
deliver the stream, no idea what kind of ballpark this would be in.

~~~
neilc
_I also think that a research effort with the ability to process 100% of
Tweets can most likely afford to pay_

Processing 100% of all tweets is not actually very hard or expensive: 2000
messages/sec is small potatoes if you want to do it in realtime, and even
easier if you are just doing batch analysis queries. You could do it (with
reasonable performance) for much, much less than $360,000/year (let alone 2x
or 3x that for the 100% feed).

~~~
robryan
Not that amount, but afford to pay some amount for access, obviously for the
massive full price little if any research that can't be monetized will be
conducted. It's a net loss for us all and no gain for Twitter.

------
twymer
This is pretty interesting. There are a lot of interesting projects and data
analysis that come from analyzing tweets but the price tag seems far beyond
what any of these would be willing to pay.

Previously some serious time would be spent scraping content from the feeds
but given that it's only 2% of the content and would take months (at least) to
gather a significant amount makes it less than ideal. Although I would assume
for a vast majority of cases, months are worth less than $360k.

------
kin
isn't there already a free stream for new tweets that could be stored and
analyzed? is this just a sale of the 50% of tweets that have already passed?
why so pricey for historical market research data when current data is free?

~~~
Timothee
From my understanding, the current data is not free. Yes, it's fully
accessible, but to get access to the whole thing you would need to either
scrape the site which Twitter will most likely disallow, or use the API, whose
access Twitter controls.

------
jrockway
jrockway to post 50% less content to Twitter for free.

------
xstaticdev
That's a lot of tweets about Justin Bieber!

------
zrgiu
i'm selling my gtalk, Y!m and Skype statuses. $10000/year. Anyone interested?

------
isaacsu
Anyone keen to discuss this issue further? Join the live chat at
<http://twich.me/twitter360k>

------
pointillistic
I can't believe anyone would pay coin for that crap. Might as well get one
year of the NYT crossword puzzles for the same price, it's a better value. Or
randomly generate "I am having a coffee with my cat" in all possible
combinations.

P.S. if this just a "display" does it mean there is no any value to the stale
links?

~~~
sp4rki
You missed the point of the whole thing. This is not about displaying tweets,
it's about the ability of a analyzing trends, news, popularity, etc. via
tweets. It's a huge deal to be able to get 50% (let alone 100%) of all the
tweets on a given moment.

~~~
steveklabnik
Not only is it not about displaying tweets, it's specifically disallowed:

> Customers will only be allowed to analyze the messages, not display them

