DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second

geuis · on Nov 30, 2011

I find the architecture to be really interesting and useful to learn from. However, they are way too expensive. 1000 tweets is not very much data. I'm building a realtime app now and easily am processing tens of thousands of relevant tweets a day. While a service like Datasift could alleviate a lot of heavy lifting on my part, the cost just doesn't make up for it. It feels like their business model is currently focused on use-cases requiring highly specific targeting, but not intended for use where services need high volumes of certain types of data. Shame, that.

quipo · on Nov 30, 2011

@geuis: the cost for 1000 tweets is $0.10, and that's the Twitter license fee. You get $10 in free credits when you sign up, and you can have an On Demand plan starting from $10. It should definitely make it very affordable to anyone wanting to play with the Twitter firehose, the barrier-to-entry has never been so low.

nirvana · on Nov 30, 2011

Where are you getting your tweet feed? My initial interest in data sift was because they let you get a feed from the firehose. Twitter doesn't seem to let you do this.

All I want is tweets relevant to a particular subject, and in the early days I don't want to be paying hundreds of dollars for it... I've got keywords and phrases I can use to find them, if I just had access to an API that would let me. (Maybe twitter offers this, I couldn't find it in the past.)

geuis · on Nov 30, 2011

I'm using the stream, but using track filtering. So far it's working very well for my purposes. You are right in that different use cases might need the firehouse and that's where services like gnip and datasift really come in handy. It's too bad that there's not a middle ground.

Titanous · on Nov 30, 2011

If you are just looking for keyword streams, they're available for free: https://dev.twitter.com/docs/streaming-api/methods

alexro · on Nov 30, 2011

We have recently seen a number of startups (including YC founded AFAIK) that look for ways to make twitter data useful. But I don't remember catching any notice of such successes.

While DataSift realtime capabilities look really impressive, I'm afraid there isn't that much of use-cases to pay for the data mined that way. Even DataSift's own list of possible use cases looks bleak.

In any way though DataSift should be fine with applying their expertise to other sources of data, which doesn't bear the same cost as the twitter's firehose.

djb_hackernews · on Nov 30, 2011

Excellent article.

I worked on a project that I integrated with DataSift.

Lorenzo and every one else I emailed with were very quick to respond and the product performed as expected (besides some hiccups that I can easily understand due to growing pains).

Compared to Gnip, (which we also integrated with) DataSift won hands down on both quality of product and customer relations.

However, I'm still suspect of the usefulness of Twitter analytics/ data mining.

kalkat · on Nov 30, 2011

There is a lot of value in mining data off social media, but only if you can convert that into simple and easy use cases. We are doing something like that, and we are very happy about the way it is shaping up.

on Nov 29, 2011

[deleted]

aespinoza · on Nov 30, 2011

I find it interesting, specially on what tools/languages they use, and how their architecture is designed to process such a big number of tweets.

stavros · on Nov 30, 2011

Ugh, sorry, I finished reading right at "Information Sources", thinking that was the end of the article, and missed everything after that. I'll delete my previous comment, thanks.