This is the other end of extrapolation. There is a lot of stuff in the world, but its finite stuff. And information, like lots of things, has its own inverse power law. When building a search engine folks say "Gee you don't have nearly the hardware that Google does, how can you ever hope to compete?" and the answer is the inverse power law. Looking at queries that are served out of the index vs being served off the long tail it drops off dramatically.
How many companies are there in the world? 100M? Give them each a a megabyte for a pictures and their information, that is 100 TB of data, 50 disk drives, 150 if they are triply replicated. And what are the net new businesses in a given year? 2% annual growth rate for the worlds economies over the long scale, that's maybe 2M new businesses a year, or 2TB of data, or 1 (or 3) new disks a year?
A billion people on Facebook? 1/7th of the world population? What a megabyte for each of them is PB of data? The Backblaze guys can put a petabyte of storage in a single cabinet.
For a long time the Internet was playing 'catch-up' now it is asymptotically approaching 'caught up.'
Different problems and different opportunities for folks now who are basing their endeavors on the net.
Storing data and using data are two completely different things. You can store all of FB and lots of other companies in a fairly small volume these days. But as soon as you want to access that data to process or update it the game changes rapidly. Suddenly that one cabinet explodes into a datacenter full of cabinets, or even several data centers.
Storage is a solved problem for just about any amount that an ordinary company might need. Getting that data delivered to a CPU at speeds that are still usable in a practical sense if you want to say something about all of that data is a completely unrelated problem which changes amount of technology and funds required from the easy level to the extremely hard and beyond level.
"Storing data and using data are two completely different things."
That is so true. When people ask "How hard can it really be to write a search engine these days?" I have been known to ask them to speculate on how they might go about it and then point out the challenges of knowing what data you have vs what data is asked for. Search is particularly interesting because the more time you spend the better your answer can be, and its always challenging to 'draw the line' between fast and relevant. But that is also what makes it so fun :-)
How many companies are there in the world? 100M? Give them each a a megabyte for a pictures and their information, that is 100 TB of data, 50 disk drives, 150 if they are triply replicated. And what are the net new businesses in a given year? 2% annual growth rate for the worlds economies over the long scale, that's maybe 2M new businesses a year, or 2TB of data, or 1 (or 3) new disks a year?
A billion people on Facebook? 1/7th of the world population? What a megabyte for each of them is PB of data? The Backblaze guys can put a petabyte of storage in a single cabinet.
For a long time the Internet was playing 'catch-up' now it is asymptotically approaching 'caught up.'
Different problems and different opportunities for folks now who are basing their endeavors on the net.