One time I met a company who insisted they were sending tens of TB of data per d...

tetha · on March 3, 2019

> The funny thing about "big data" in my experience, is just how small it actually becomes when you start using the right tools.

Rings way too true for me atm.

My current workplace is currently struggling, because one of our application stores something like a combined 300G of analytics data in the database with the application data. Modifying the table causes hours of downtime because everyone claims that backwards compatible db changes are too hard. And everyone is scared because with more users there's "so much more analytics data" incoming. Yes, with 300Gb across 3-4 years.

And I'm just wondering why it's not an option to just move all of that into one decently sized mysql/postgres instance. Give it SSDs, 30 - 60Gb of ram for the hot dataset (1-2 month) and it'll just solve our problems. But apparently, "that's too hard to do and takes too much time" without further reason.

dredmorbius · on March 3, 2019

All of the source text for all of Google+ Communities posts is a few hundred GB. This for 8.1 million communities and ~10 million active users.

Add in images and the rest of the Web payload (800 KiB per page), and that swells to Petabyte range. But the actual scale of the user-entered text is stunningly small.