Google's paper on Percolator from 2010 says there are more than 1T web pages. 9 ...

quelsolaar · on June 26, 2019

I'm not saying its easy, its not, but people tend to think that because Google is so huge, you have to be that huge to do what Google does. My argument is that in terms of hardware google need expensive hardware because they have so many users, not because what they do requires that hardware to deliver the service for one or a few users.

I have a gigabit link to my apartment (go Swedish infrastructure!). At that theoretic speed I get 450 gigs an hour, so I could download ten tera in a day. We can easily slow that down by an order of magnitude and its still a very viable thing to do. If someone wrote the software to do this, one could imagine some kind of federated solution for downloading the data, so that every user doesn't have to hit every web server.

z3t4 · on June 26, 2019

Could be done with a p2p "swarm". Peers get asigned pages to index then share the result.