

We're all guinea pigs in Google's search experiment - markbao
http://news.cnet.com/8301-10784_3-9954972-7.html

======
neovive
One of the interesting take-aways from the story is to that what your users
"say" they want and what they actually do are two different things. I wonder
if this further proves the diminishing value of old-fashioned surveys in the
context of web usability. Beyond the very interesting eye-tracking tools,
being able to run live A/B tests seems very effective -- especially if users
never even know they are being surveyed.

~~~
neilc
Yeah, I think this approach makes a lot of sense (and it is another instance
of "more data beats better algorithms", in a sense). It's also similar to some
of the themes Paul Buchheit talked about his StartupSchool talk -- you want to
listen to your users, but don't just blindly do what they ask for.

------
DougBTX
An interesting comment, particularly wrt the "natural language" search engines
trying to out-algorithm Google:

 _"The learning curve on search is really fast," she said. "People go from
'Where can I get spaghetti and meatballs in Silicon Valley' to 'italian food
san jose' really fast," she said._

------
redorb
each query uses 700-1000 machines, am I the only one that finds that (1) hard
to believe and (2) amazing?

~~~
diego
Why is it hard to believe? You are searching billions of web pages. A typical
index containing millions of web pages takes up gigabytes of space and fits in
RAM on a commodity server. In order to run one search really fast over
billions of pages, you need an instance of a distributed index that is three
orders of magnitude larger than what fits into one machine.

------
neilc
_Google found that when the results increased to 30 per page ... it took about
twice as long to display the longer results list for the user._

I'm surprised that increasing from 10 to 30 results increased total latency by
~100%. I would have expected that other factors (one-time costs like session
establishment) would dominate the overall latency, and the marginal cost of
fetching 20 additional query results would be fairly small.

~~~
cpr
The clue is probably in the 700-1000 machine interaction. If they have to
display 200% more results (10 vs 30), they're probably interacting with a lot
more machines in their cloud.

~~~
neilc
Perhaps -- but even if that is true, interacting with more machines is
presumably a trivially parallelizable operation, so it shouldn't double the
overall response latency.

