
How Google crawls the deep web - prakash
http://glinden.blogspot.com/2009/01/how-google-crawls-deep-web.html
======
mickt
I was at NEDBDay (New England Database Day) on Friday where Alon Halevy from
Google was talking about this.

Basically there's a lot of information on the web that are in tables that cab
be mined. As the data is already ordered or structured you can use the data to
look for or mine information. The problem is trying to figure out what this
data is and how it's organised and understanding the types of data in a table.
One example he gave of someone doing something like this is www.kosmix.com .

An interesting snippet he pointed out, that kinda unrelated to this, is that
"the format of a schema can be classed as intellectual propery".

NEDB link: <http://db.csail.mit.edu/nedbday09/>

