

Ask HN: How to graph total tweets about "bitcoin" in given timeframe?  - vjvj

I&#x27;m not a coder (undergrad economics) but can learn fast.<p>What I want to do is find a way of tracking tweets about bitcoin on an hourly basis.<p>The idea is to compare this with a graph of the price of bitcoin over time and see if there is any relationship between the two.<p>Why am I doing this? Mainly curiousity and improving my technical skills.<p>Any points appreciated. So far I have started learning about APIs and JSON and am trying to figure out how to use JSON feeds in a program that is good at manipulating data (i.e not my browser).
======
AznHisoka
Use the Twitter Search API. Yes, yes I know people say it's not as complete as
the hose, but it's actually quite close to complete if you query it constantly
without sleeping between requests. Just keep track of the MAX Tweet ID, and
get tweets > this ID for every request.

Especially for queries where there aren't more than XXX results per minute
(yes, even for terms like Bitcoin as u can see here:
[https://twitter.com/search?q=bitcoin&src=typd&f=realtime](https://twitter.com/search?q=bitcoin&src=typd&f=realtime))

Then map those tweets to the specific hour (in epoch time for something), or
even specific minute, or second if you want more granular detail. Index them
to a search index like ElasticSearch or SOLR, and then do a faceted search,
which will return the # of results for each time period. You can then graph
this fairly easily.

The flaw I see in your experiment is that there's a TON of noise in Twitter.
Maybe filter out tweets by obvious spam bots (ie looking at follower/following
ratio, ignoring users who only tweet hashtags, or URLs, etc).

------
dylz
You'll want to use the Twitter public timeline firehose -
[https://dev.twitter.com/docs/api/1.1/get/statuses/firehose](https://dev.twitter.com/docs/api/1.1/get/statuses/firehose)

Insert any matching bitcoin into your database with the associated time. Then
you can sum/avg/aggregate it by time (within [epoch seconds] - [epoch seconds
+ 3600])

~~~
makerops
I think its hard/expensive to access the firehose no?

~~~
dylz
Wasn't much of an issue for me but since he is somewhat new I imagine reading
a stream and basically doing an infinite foreach would be a tad easier than
adding queuing, API ratelimits (per hour? forget which), timed polling, etc.

------
makerops
There is no mechanism for search "completeness" without access to the fire
hose.

[http://blog.cloudera.com/blog/2012/09/analyzing-twitter-
data...](http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-
hadoop/)

^ that is a good resource if you ever get access to the firehose.

Shoot me an email anthony@makerops.com, and when I get home I can shoot you an
email with some example code that Ive written in the past.

