Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Google crawls the deep web (glinden.blogspot.com)
24 points by prakash on Jan 31, 2009 | hide | past | favorite | 1 comment


I was at NEDBDay (New England Database Day) on Friday where Alon Halevy from Google was talking about this.

Basically there's a lot of information on the web that are in tables that cab be mined. As the data is already ordered or structured you can use the data to look for or mine information. The problem is trying to figure out what this data is and how it's organised and understanding the types of data in a table. One example he gave of someone doing something like this is www.kosmix.com .

An interesting snippet he pointed out, that kinda unrelated to this, is that "the format of a schema can be classed as intellectual propery".

NEDB link: http://db.csail.mit.edu/nedbday09/




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: