Hacker News new | past | comments | ask | show | jobs | submit login

You seem to have me confused with someone else.



Care to explain?


I don't work for FB, nor am I in a position to demand big data flows from someone else. I'm just intensely curious about the dynamics of commented discussions and think it's an under-studied phenomenon. I'm not interested in making money out of it, I just want to understand how ideas promulgate and what factors drive virtual communities.

I was confused by the original post because a) I'm not sure why using SQLite makes comments into 'not big data' (is it extraordinarily hard to get statistical information out of SQLite or something?) and b) even if a blog or website is completely private, might not the owner want to analyze what's going on within the private social network they've created?

Also, I question your blanket assertion about what is or in't in the interest of other people. You seem to have taken my prior comment as an intent to exploit comment data for financial gain.


Aha.

Then you might get a license from me. And if you haven't tested with the HN dataset that is where I would start.

I was confused by the original post because a) I'm not sure why using SQLite makes comments into 'not big data' (is it extraordinarily hard to get statistical information out of SQLite or something?)

No, getting data out of SQLite is easy. But I have yet to hear anyone using SQLite for anything that would be called big data in a way that makes sense. (Yes, my previous employers file share where we could upload 5GB files doesn't qualify.)

I think the comment on the website is supposed to be funny in a bitter way.

and b) even if a blog or website is completely private, might not the owner want to analyze what's going on within the private social network they've created?

I have yet to manage to think of a private website that might contain enough non-spam comments to qualify as big data.

And for small and medium size data plain SQLite is wonderful : )

What I and others are skeptical to is REAL big data, where certain companies scoop up everything they can get their hands on. This is an often dirty, potentially really harmful raw material as we have seen in articles about the moderation teams that makes sure Facebook and others stays reasonably clean.

With a little bit of processing this "nuclear waste" is also possible to weaponize. Two ideas off the top of my head:

- Facebook and others can easily name thousands of people who waste their employers time during the day.

- or post a number of people who run pseudonymous accounts for various reasons but have failed to maintain proper opsec.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: