Hacker News new | past | comments | ask | show | jobs | submit login

Stackoverflow is about 10GB: https://archive.org/details/stackexchange

English Wikipedia is about 15GB: https://en.wikipedia.org/wiki/Wikipedia:Database_download#En...

Reddit is about 1TB: https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_eve...

Common Crawl (May/June 2020):

"It contains 2.75 billion web pages or 255 TiB of uncompressed content". About 50TB compressed.

https://commoncrawl.org/2020/06/may-june-2020-crawl-archive-...

I think 100TB would be plenty. We're not quite there yet but I wouldn't be surprised if you could get "The Internet" on one consumer drive in a few years. This would be a cool product.




Something tells me that 50TB doesnt include video media. There are dozens of chinese pirate streaming sites with more movies and TV than whole of netflix and all its competitors. Then there is YT.


Or pictures.

If you look at that reddit link it's about "every single comment" -- and that's about 1 TB. I'd bet that the pictures from one fairly active if niche sub would exceed that, e.g. r/battlecars or r/breaddit




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: