

To Yahoo: Open up your search index to gain market share - angie
http://ihobbes.wordpress.com/2008/06/23/yahoo-open-up-your-search-index-to-gain-market-share/

======
pina
I would use this for my people search application. This would definitely bring
costs down significantly. The economics actually work the same if you want to
do a really good job (be comprehensive) with your search application, you need
to crawl the entire web to find the documents (part of the vertical) your
search application is interested in. You could use a white list of sites, but
then that would not be comprehensive.

So, building a good vertical search engine is really hard, you have to crawl
the entire web first to find documents that look like the ones you are
interested in. At 100 billion documents, 20k bytes each:

Bandwidth = 100 Billion documents * 20k bytes each = 1.8PB To download the
documents at wire speed (gigabyte speed) it would take over 230 days. And of
course processing, analysis etc. would take more compute time.

It would cost over $1M a month to process all web data on AWS. This is
assuming we go at wire speed and ignore any kind of politeness.

If Yahoo were to open it's crawl, we would be able to just write our specific
applications on top of Yahoo's crawl without needing to download the entire
web on our servers.

~~~
Anon84
You might want to check out "Nutch"/"Wikia Search" (
<http://re.search.wikia.com/about/get_involved.html> )... They let you
download their copy of the web ( <http://search.isc.org/download/> ) Not
exactly Yahoo quality, but it might be enough to get started on some smaller
scale projects.

------
sadiq
This isn't actually a half bad idea.

Letting interested developers play with a subset of their index and then
giving access to the full index for promising applications could potentially
help them innovate against Google.

------
kola
I couldn't agree with you more sadiq. Yahoo has an opportunity to disrupt the
vertical search space by opening up. New interesting apps could be better
integrated with Yahoo's search platform. I think it would definitely create a
dent in Google's search market share. Facebook, AWS, Android are testimony to
this. Opening up would definitely not hurt Yahoo, and would very likely help
it earn trust from developers and be part of the ecosystem.

