I'd like to understand the significance of this. I'm extremely interested in comments as big data (to try to understand social networking patterns better from studying the structure of nested comments) and I'd like to get a handle on the opposite point of view - what is it that you prefer about this approach?
While this might be interesting to you it is in the best interest of the site owners and commenters worldwide that not everything they write is instantly correlatable to anyone who happens to work for Facebook or be in a position where they can somehow buy or otherwise demand this data from them.
I guess collection, processing and stockpiling of user data should be regulated for the benefit of consumers, moderators and world peace.
I was confused by the original post because a) I'm not sure why using SQLite makes comments into 'not big data' (is it extraordinarily hard to get statistical information out of SQLite or something?) and b) even if a blog or website is completely private, might not the owner want to analyze what's going on within the private social network they've created?
Also, I question your blanket assertion about what is or in't in the interest of other people. You seem to have taken my prior comment as an intent to exploit comment data for financial gain.
Then you might get a license from me. And if you haven't tested with the HN dataset that is where I would start.
I was confused by the original post because a) I'm not sure why using SQLite makes comments into 'not big data' (is it extraordinarily hard to get statistical information out of SQLite or something?)
No, getting data out of SQLite is easy. But I have yet to hear anyone using SQLite for anything that would be called big data in a way that makes sense. (Yes, my previous employers file share where we could upload 5GB files doesn't qualify.)
I think the comment on the website is supposed to be funny in a bitter way.
and b) even if a blog or website is completely private, might not the owner want to analyze what's going on within the private social network they've created?
I have yet to manage to think of a private website that might contain enough non-spam comments to qualify as big data.
And for small and medium size data plain SQLite is wonderful : )
What I and others are skeptical to is REAL big data, where certain companies scoop up everything they can get their hands on. This is an often dirty, potentially really harmful raw material as we have seen in articles about the moderation teams that makes sure Facebook and others stays reasonably clean.
With a little bit of processing this "nuclear waste" is also possible to weaponize. Two ideas off the top of my head:
- Facebook and others can easily name thousands of people who waste their employers time during the day.
- or post a number of people who run pseudonymous accounts for various reasons but have failed to maintain proper opsec.