Hacker News new | past | comments | ask | show | jobs | submit login


Then you might get a license from me. And if you haven't tested with the HN dataset that is where I would start.

I was confused by the original post because a) I'm not sure why using SQLite makes comments into 'not big data' (is it extraordinarily hard to get statistical information out of SQLite or something?)

No, getting data out of SQLite is easy. But I have yet to hear anyone using SQLite for anything that would be called big data in a way that makes sense. (Yes, my previous employers file share where we could upload 5GB files doesn't qualify.)

I think the comment on the website is supposed to be funny in a bitter way.

and b) even if a blog or website is completely private, might not the owner want to analyze what's going on within the private social network they've created?

I have yet to manage to think of a private website that might contain enough non-spam comments to qualify as big data.

And for small and medium size data plain SQLite is wonderful : )

What I and others are skeptical to is REAL big data, where certain companies scoop up everything they can get their hands on. This is an often dirty, potentially really harmful raw material as we have seen in articles about the moderation teams that makes sure Facebook and others stays reasonably clean.

With a little bit of processing this "nuclear waste" is also possible to weaponize. Two ideas off the top of my head:

- Facebook and others can easily name thousands of people who waste their employers time during the day.

- or post a number of people who run pseudonymous accounts for various reasons but have failed to maintain proper opsec.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact