

Using external APIs to improve search - jordanmessina
http://www.gabrielweinberg.com/blog/2011/01/using-external-apis-to-improve-search.html

======
coderdude
@Gabriel: How long do you think this will be a sustainable approach to
powering Web search? What I mean is, after a certain amount of traffic will
you be able to continue using all these external APIs or do you expect that in
the future you will have to augment some or all of this data with something
in-house like moving to full-scale Web crawls?

~~~
epi0Bauqu
It is quite sustainable. These APIs are central to the business models of the
companies and they should generally scale up as needed. For example, SeatGeek
already powers WSJ and Yahoo Sports.

~~~
coderdude
Good to know. What are your plans for crawling in the future? Do you think
that you will eventually move to crawling as a primary source of result data,
and if so, is that time drawing near or do you plan on continuing along the
current path for some time?

~~~
epi0Bauqu
We never stopped crawling; we just scaled it back for specific purposes,
mainly now for spam detection and zero-click info. For the foreseeable future
this status quo works well because we can concentrate on our value-ads.

------
danielh
Thanks for that post, I didn't know Qwerly, that might come handy.

I wonder how DuckDuckGo decides which API to query. Gabriel, could you shed
some light on this?

~~~
epi0Bauqu
Generally we have some kind of entity detection, be it a list or some regex or
something more fuzzy. For example, check out the WolframAlpha stuff:
<http://weinbergalpha.com/>

~~~
lionheart
That's great. What's your algorithm for deciding which zero-click info to
display when there seem to be multiple options?

For example, I notice that <http://duckduckgo.com/?q=fathers+day+2011> doesn't
show a WolframAlpha result, but an Amazon result for me.

~~~
epi0Bauqu
There is a precedence list that can be tweaked per query, and Amazon is at the
bottom, so that is a bug :). In some cases, multiple things are called and the
first to answer gets the spot.

Edit: bug fixed.

------
reubenyeah
Gabriel, do you have any stats on proportion of zero click clicks (so probably
page views where they don't click any results + clickthroughs from zeroclicks)
vs use of standard search?

~~~
epi0Bauqu
I do not, sorry.

