Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How the heck does anyone have that much data? I once built myself a compressed plaintext library from one of those data-hoarder sources that had almost every fiction book in existence, and that was like 4TB compressed (but would've been much less if I bothered hunting for duplicates and dropped non-English).

I suspect the only way you could have 20PB is if you have metrics you don't aggregate or keep ancient logs (why do you need to know your auth service had a transient timeout a year ago?)



Lots of things can get to that much data, especially in aggregate. Off the top of my head: video/image hosting, scientific applications (genomics, high energy physics, the latter of which can generate PBs of data in a single experiment), finance (granular historic market/order data), etc.


In addition to what others have mentioned, before the "AI bubble", there was a "data science bubble" where every little signal about your users/everything had to be saved so that it could be analyzed later.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: