Hacker Newsnew | comments | show | ask | jobs | submit | majia's comments login

What we used for counting is slightly different from what you see in the Twitter widgets (yeah, those tweets are from Twitter directly). In our backend, we have a pretty conservative filter that matches a bag of phrases, such as "voted for barrack obama", "voted for pres obama", etc. The accuracy is over 95%. Of course, political tweets are full of sarcasm and humor, and Twitter is full of demographic bias. This is just a fun project for us.

-----


How many of the votes are from the 25,000+ people that retweeted Michelle Obama saying 'voted for President Obama'?

https://twitter.com/MichelleObama/status/265906946530496513

You'd probably clean up a whole lot by ignoring tweets containing "RT". That seems to be much of the stream.

-----


We don't remove RT tweets, but instead, we only count each user once. If a user retweeted Michelle, s/he probably will vote for Obama. But if a user have a few tweets in favor of Obama, it's counted once only.

-----


The accuracy is over 95%

Citation needed...

How can you draw this conclusion at this point in the process? I'm genuinely curious to your filtering scheme to be able to extract information out of such a noisy data stream.

-----


This is not a scientific research, so I didn't compute std, t-stats, etc. But I did pull a few hundred tweets from our database and counted how many wrong ones we had. That's where the number comes from. The filtering scheme is very simple: classify only if we're confident. There are many tweets containing "voted", but we only took ones we have a strong confidence and throw away the rest. For a complete set of keywords used for filtering, please feel free to email.

-----


Good observation! It is actually a seven-day smoothing to offset some weekly periodicity, but the backward influence is very limited. We tested this with a hourly granularity without smoothening (the curves become really fuzzy), and it shows very similar results.

-----


There was a huge hype around Twitter data one or two years back. Then it was followed by suspicion and criticism Now we should take a more balanced view. Twitter data or social media data aren't all-powerful, but I believe that if we look at it from the right perspectives, we still could learn something meaningful, despite its demographic bias. But certainly we have to be very cautious about any conclusion we draw. This is still a work in progress and we'll try to extract unbiased information from the biased source.

-----


Social media data and election are both very subjective topics. We tried to generate some insights, but we thought it would be better to let readers to interpret results themselves.

-----


Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: