Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to graph total tweets about "bitcoin" in given timeframe?
3 points by vjvj on Dec 6, 2013 | hide | past | favorite | 5 comments
I'm not a coder (undergrad economics) but can learn fast.

What I want to do is find a way of tracking tweets about bitcoin on an hourly basis.

The idea is to compare this with a graph of the price of bitcoin over time and see if there is any relationship between the two.

Why am I doing this? Mainly curiousity and improving my technical skills.

Any points appreciated. So far I have started learning about APIs and JSON and am trying to figure out how to use JSON feeds in a program that is good at manipulating data (i.e not my browser).




Use the Twitter Search API. Yes, yes I know people say it's not as complete as the hose, but it's actually quite close to complete if you query it constantly without sleeping between requests. Just keep track of the MAX Tweet ID, and get tweets > this ID for every request.

Especially for queries where there aren't more than XXX results per minute (yes, even for terms like Bitcoin as u can see here: https://twitter.com/search?q=bitcoin&src=typd&f=realtime)

Then map those tweets to the specific hour (in epoch time for something), or even specific minute, or second if you want more granular detail. Index them to a search index like ElasticSearch or SOLR, and then do a faceted search, which will return the # of results for each time period. You can then graph this fairly easily.

The flaw I see in your experiment is that there's a TON of noise in Twitter. Maybe filter out tweets by obvious spam bots (ie looking at follower/following ratio, ignoring users who only tweet hashtags, or URLs, etc).


You'll want to use the Twitter public timeline firehose - https://dev.twitter.com/docs/api/1.1/get/statuses/firehose

Insert any matching bitcoin into your database with the associated time. Then you can sum/avg/aggregate it by time (within [epoch seconds] - [epoch seconds + 3600])


I think its hard/expensive to access the firehose no?


Wasn't much of an issue for me but since he is somewhat new I imagine reading a stream and basically doing an infinite foreach would be a tad easier than adding queuing, API ratelimits (per hour? forget which), timed polling, etc.


There is no mechanism for search "completeness" without access to the fire hose.

http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data...

^ that is a good resource if you ever get access to the firehose.

Shoot me an email anthony@makerops.com, and when I get home I can shoot you an email with some example code that Ive written in the past.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: