
Exploring a ‘Deep Web’ That Google Can't Grasp - nickb
http://www.nytimes.com/2009/02/23/technology/internet/23search.html?partner=rss&emc=rss
======
smanek
_That haystack is infinitely large._

Gah. I really hate it when people misuse "infinite".

~~~
kristiandupont
While it is not really infinite, I think that in this situation, the term is
justified as it is semantically very close to it.

~~~
smanek
It really isn't ... the semantics of infinity are completely different than
the semantics of 'really, really, big.'

An infinite data store would, by definition, contain my DNA, correct (and
incorrect) predictions about the universe (down to the molecule) for all time,
the true value of Pi, and every piece of knowledge that has ever or will ever
exist. An actual infinite is just a ludicrous concept.

------
rozim
Greg Linden's (MSFT) comments on a recent Google paper on this:

[http://glinden.blogspot.com/2009/01/how-google-crawls-
deep-w...](http://glinden.blogspot.com/2009/01/how-google-crawls-deep-
web.html)

------
jaspertheghost
There's many startups attempting to do this including pipl.com,
<http://cazoodle.com/> among others. Here's some research about it:
<http://www-sal.cs.uiuc.edu/~kcchang/>

------
sam_in_nyc
I believe I've seen this type of crawling in action in request logs. For
example, Yahoo might try to request
"news.ycombinator.com/user?id=britney_spears", even though it's not linked to
from anywhere.

