
Google: We knew the web was big... - gaika
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
======
sratner
"Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times
as many roads and intersections..."

Great! So now, all you have to do is give some sort of a comparison that lets
people visualise exactly how big an area 50,000 times the size of U.S. is. May
I suggest a metric based on whales?

~~~
jmatt
Average whale length = 12.8m (from top google search for average whale length)

Area of the US = 9.826630x10^6 km^2 or 9.826630x10^12 m^2

Area in whales of the US = 5.99769897x10^10 whales^2

So that would be 50000 times the size of 6 x 10^10 whales^2! Thats a lot of
whale!

------
hhm
"Strictly speaking, the number of pages out there is infinite -- for example,
web calendars may have a "next day" link, and we could follow that link
forever, each time finding a "new" page."

Strictly speaking, even that would keep the number of pages finite, even if
very, very big. Any amount of data you can produce with any number of finite
computers (with finite and bounded memory -there are practical bounds for
memory sizes), will always be finite.

~~~
ntoshev
Given infinite time, the number of pages that could be generated _is_
infinite.

~~~
hhm
With a finite amount of memory, the amount of information you can generate in
infinite time is finite.

~~~
ntoshev
You don't have to store every page you generate.

~~~
hhm
No, but at least I suppose that pages are bounded by size, because of the
memory of the generator, or of the viewer. Say the maximum size of a page you
can build is N, then there is a maximum finite value of different pages / urls
you can fit in N too. So, even if you have infinite time, with your finite
computers you'll have finite amounts of pages.

Now, if you are going to have pages and urls that are bigger in size than the
memory of even the biggest computers in the world, so that you can't hold them
in memory at any time, then the point above isn't useful for you (and in this
case you are right). But urls bigger than memory are probably useless, even if
webpages bigger than memory aren't.

And if you have to have urls that are finite... then my point holds: even in
infinite time, you won't have infinite amounts of urls.

~~~
ntoshev
You don't need to keep in memory the page you generate either.

~~~
hhm
Well if you don't keep in memory the page nor the url you generate ever
(either in the publisher or in the viewer), can it be used in any meaningful
way? But if you don't keep urls in memory you could possibly generate as many
urls as you wish in infinite time...

------
aneesh
TechCrunch is being cryptic ([http://www.techcrunch.com/2008/07/25/googles-
misleading-blog...](http://www.techcrunch.com/2008/07/25/googles-misleading-
blog-post-on-the-size-of-the-web/))

“Google also says 'But we’re proud to have the most comprehensive index of any
search engine.'

That may be true today, but _it probably won’t be true next week_ (check back
here then). _Google knows that as well as we do, and that’s why they posted
this today._ "

So, is it Yahoo or Live? If it's one of them, why would Google know it so
well?. Any thoughts on what this could be? One of the TC commenters thinks its
MSFT indexing facebook ...

~~~
jrockway
_TechCrunch is being cryptic_

Yeah, this is surprising... especially because Arrington would never lie to
drive traffic to his site. Oh wait. _TechCrunch_.

------
cypress-hill
well allow me to modify my robots.txt so you can be at infinity-1. since you
pricks decided to hijack your role as impartial algorithmic search to walled-
garden with knol, i want nothing to do with you.

every major website that handles referrals seems to bite the poison fruit of
capturing traffic...now google has to. oh well, maybe clusty search will have
to do

~~~
jrockway
I don't think you understand what Google is. Google is a company that owns a
popular search engine called "Google Search". They also have other properties,
like GMail and Knol. What exactly is the problem with that?

