Hacker News new | past | comments | ask | show | jobs | submit login

I heard one of the "big data social scientists" to claim that they can now replace the population with the sample (as this is what the Twitter really is) because of, you know, big data.



If you have a demographic model of twitter users and appropriately re-weight, why is this a bad approach to take?


Because you don't (and probably can't) know that this model is accurate? Also, there are a lot of populations which are not present on Twitter at all, and more which are not present in meaningful numbers.


Such models presumably can be tested in some sense. E.g. build some sort of predictive model of elections in each state using Twitter users as a surrogate, see how well you do.

I'm not claiming that such a model is guaranteed to be perfect or even as good as a pollster, but it sounds in principle like a reasonable thing to do, if you take appropriate precautions to reduce bias.


At least one can try definitely.


If I understood correctly then this was considered as a not necessary step.

Now given the Twitters nature then is there such a model that is even approximately accurate?

Or in other words, what relevant data Twitters collects and how accurate it is?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: