

Evaluating Bing with Mechanical Turk - lukas
http://blog.doloreslabs.com/2009/06/bing-an-improvement-over-live-but-still-not-google-quality-evaluating-bing-with-mechanical-turk/

======
mikeliu
As the article concluded, the difference was small (to me they're pretty much
insignificant). It doesn't really matter to me if the target is the first
link, as long as it's on the first page.

I think better comparisons would be who has the better infrastructure, who can
deliver results faster, serve more queries, use less energy, and crawl faster?
I think google is hard to beat here, and i think it's where the others have to
think harder about.

------
whughes
To me, the most interesting aspect of the post was the comparison between Bing
and the old Live. I'd have liked to see a Live-Google comparison and perhaps
Yahoo! as well, since it's often thrown around as a comparable alternative.

This seems to confirm my idea that Bing is less a revamp to the search engine
and more a rebranding for Microsoft. Bing is certainly more memorable than
Live or MSN were, and it's replacing Live and other MS brands in several non-
search areas (Virtual Earth -> Bing Maps for Enterprise, for example). That's
certainly nothing new for Microsoft, but this time they seem to be marketing
it as a search engine change to get people to try the engine and hopefully
switch from the big G.

~~~
jeremymcanally
I think they realized after the Project Mojave (i.e., calling Vista not Vista
made people like it at least a little bit more for 30 seconds) that branding
things is kinda important. Of course, if the tech sucks, people will still
hate it, but at least they can say they gave it a really half-hearted effort
if anyone ever says they didn't try.

~~~
mahmud
Microsoft better find another brand other than "Microsoft". A few holdings,
subsidiaries and spin-offs couldn't hurt.

Search, Gaming and Advertising could be the first ones to benefit from
distancing themselves from Redmond.

------
shalmanese
I thought this was an interesting study but, after spending a few minutes
trying to find patterns in the data, I finally hacked up a quick null
hypothesis graph in Excel and it looked virtually indistinguishable:

[http://blog.figuringshitout.com/another-way-to-lie-with-
stat...](http://blog.figuringshitout.com/another-way-to-lie-with-statistics)

~~~
etal
The shape of this kind of graph usually isn't very meaningful by itself for
showing subtle patterns -- you're right that the extremes of the graph always
look like that. In the original article, if you line up the Bing-vs-Google and
Bing-vs-Live graphs, the intercept near the middle of the graph is a little
further left. That's all I got from it. I'm assuming there's an ANOVA table
associated with this study that we're not seeing, and the probabilities there
must be a little more compelling.

Maybe a two-phase Turking might be interesting, too: first have the turkers
fill come up with a few parameters for each query (e.g. current events,
general info, celebrities, pr0n), then compare search engine results for those
queries. That would help pick out the more subtle patterns that you were
looking for in the original graph.

~~~
shalmanese
With only 6 to 8 participants, the study definitely didn't have the power to
find all but the grossest differences between query types. It's not about
study design so much as the need to recruit more people.

------
gojomo
I trust their statistical significance calculations... but at a glance, the
distribution barely looks different from what I'd expect if the turkers picked
at random (either on purpose or because search tastes are at some point
arbitrary).

It'd be interesting to include such a graph, where all ratings are drawn at
random (but with the same slightly-vs.-much proportions) for visual
comparison.

Also: what would happen if all the individual queries on which the preference
isn't statistically significant were discarded, or repeated until the
preference becomes significant? (Or is "6-8 workers" enough for significance?)

~~~
sker
You have a point there. Also, what would happen if they show them both lists
with results from Google? They would probably get similar results. Same for
Bing, same for Yahoo.

From the test site that compared results from G, B and Y a few days ago, I
felt the results were so similar that I couldn't bring myself to click one as
the best engine because I didn't think I was making an objective choice.

