

A surprising opinion on the ‘Bing is copying Google’ controversy - bensummers
http://fury.com/2011/02/a-surprising-opinion-on-the-bing-is-copying-google-controversy/

======
grovulent
"Bing isn’t mining the first pages on Google search result pages as Google
claims; they’re mining the pages that users click on the most."

Hasn't this been at the core of the controversy all along?

Counterargument:

It's user action on a page created by google. Say I had nothing but this user
data - and say I had enough of a sample. I could build a pretty damn good
search engine out of that. But I know nothing about how to provide search
results. I haven't even got a crawler. Yet here I am with a search engine.

edit (to finish the follow through):

If I can with this method build something about which I have no actual
knowledge - then it proves the method is piggy backing on someone else's hard
work. And THAT is what matters. It doesn't matter if it is only one of many
signals - it doesn't matter if it's not copying Google's algo directly. What
matters is that it is still relying on someone else's knowledge to improve
Bing's product. That's shitty.

This argument clinches it for me.

The fact that Bing can't even acknowledge the opposing side - given that at
the very least intuitions are going both ways here - is a clear sign of bad
faith to me.

~~~
dilap
Oh, come on! The whole idea of search engines is to rely on someone else's
knowledge to make a product, isn't it? Isn't this how Google makes, like, all
its money?

What's skeevy in the MS case is that the relied-upon-knowledge and the product
are so similar...but still, Google's uber-umbrage strikes me as odd and tone-
deaf, especially given its dominant position in the market.

~~~
ghshephard
Re: using someone elses's knowledge - Google honors robots.txt, so, if you
don't want them to use your link knowledge, you can tell them not to.

I'm not sure there is any way for Google to Tell MSFT to not use their search
results in the Bing search engine.

If Microsoft had just been a bit more upfront with the fact that they were
using IE user's google clicking behavior to improve their search engine, I
suspect there would have been much less furor.

~~~
dilap
I don't know, doesn't Google harvest the content of anything I send to an
@gmail.com address? Or how about the massive scanning of books over the
objections of book publishers a while back (though I think this ultimately got
resolved)? My general impression of Google, and part of why I find their
crying foul so jarring, is that they will harvest as much information as they
possibly can, to whatever end, until restrained by public outcry. (Which, BTW,
I'm fine with!)

But it seems very hypocritical to get butt-hurt on behalf of Bing toolbar
users who are having their movements tracked.

(BTW, I don't find Google's observance of robots.txt to be particularly
compelling or telling, because they've never been in a position where ignoring
it would be significantly beneficial to them, as far as I know.)

~~~
ghshephard
Google was pretty clear that they would be harvesting the content of your
gmail to target ads. The different here, is that MSFT did this "Google-search-
Click-Tracking" thing on the down low.

I agree with you though, if I, as an IE9 user wish to submit my click track
results to MSFT for analysis so they can improve search results - that's fair
game.

But, It's not clear to me that MSFT should be able to review what the user was
searching on before they clicked on that data. Now they are actually using
Google's search Data + the user's click traffic. I think they cross a line
there, particularly if they aren't willing to come clean and admit that's what
they are doing, and make it clear that they are sending your Google Search
queries + your click traffic back to Redmond.

How many people on HN were aware that Microsoft was doing that with IE? Click
Traffic, sure - But I didn't know they were sending my Google Search queries
back to HQ.

------
scottmp10
The fact that Microsoft is able to take only the relevant results from Google
doesn't change the fact that the results are coming from Google. The problem
is that if search engines start using the technique that Microsoft is using
then there is substantially less value to Google, Microsoft and other search
engines for innovating on rare search term relevancy since the other search
engines get the benefits for free.

~~~
notahacker
Meta search depending entirely on other search engines' output has been around
in the form of Dogpile since the days Google was a student project.

During that time Google has managed to innovate fast enough and implement well
enough to grow into a multi-billion dollar corporation.

------
dhruvasagar
Monitoring users clicks over the web is not the same as monitoring users
clicks on a competitors web site (google search results).

The whole point of generating 100 unlikely search terms is the fact that these
wouldn't be items that exist in any web page on the web. Hence if one searches
for them it should not return any results!

How is monitoring people's clicks on google's search results not copying ?

On a different note, I am of the opinion that this practice is acceptable,
only as long as one acknowledges it. The fact that M$ is not, to me is
appalling.

------
mooism2
> This morning I had a different thought though, one that’s completely
> reversed my thinking on the issue: People aren’t robots, and they don’t
> always click on the first result.

But people do have a bias to click on the first result _because_ it's the
first result. From <http://www.useit.com/alertbox/defaults.html> :

> 42% of users clicked the top search hit, and 8% of users clicked the second
> hit. ...

> [When the researchers] swapped the order of the top two search hits ...
> users still clicked on the top entry 34% of the time and on the second hit
> 12% of the time.

------
wippler
"If Bing is using the techniques descriped in this Microsoft technical paper
then they should stop. Full stop."
<http://aclweb.org/anthology/P/P10/P10-1028.pdf>

I didn't read the paper (just read the abstract, I know..), could anyone
explain what are they doing so bad in that? Thanks.

------
InclinedPlane
The whole point of crawling and indexing websites to begin with is to simulate
what users can and do click on. The fact that it is now technologically
possible to actually track users navigation of the web doesn't change the
intent of the robots.txt convention in controlling and limiting the way that
search engines collect data from a site.

Bing/MS can choose to dishonor that convention, either directly or indirectly
as they are now with the Bing toolbar, but I think that's a mistake on their
part. Google (and Bing.com for that matter) has used the most widely accepted
mechanism, robots.txt, for ensuring that content it does not believe should be
indexed is not. It abides by that convention with other sites and expects
other search engines to abide by that convention with respect to google.com in
return.

From here the following things can happen:

1\. Microsoft agrees that robots.txt should govern clickstream data from the
Bing toolbar. The world returns to sanity once again.

2\. The search engine community (MS, Google, etc.) agree on a new standard
similar to robots.txt specifically for governing the use of clickstream data,
sites are updated with new directives allowing/disallowing such use and the
world returns to sanity.

3\. Microsoft specifically denies that they should be limited in using
clickstream data from _any_ source, through convention or through any means.
Search companies and other sites are forced to fall back on other means to
achieve the same results and things get messy (for example, google blocks all
users who have the Bing toolbar installed, they prevent the Bing toolbar from
being installed in Chrome, MS retaliates, etc.)

For the life of me I cannot think of a sane reason why MS would choose option
number 3 other than that they are insane, horribly myopic, or just plain dumb.

~~~
kd5bjo
I was always under the impression that robots.txt was intended not to control
search results, but to help prevent overzealous crawlers from taking down your
website.

If you want to control what a search engine does with a page, you should use
<meta name="ROBOTS"> and rel=nofollow, neither of which Google includes in its
results pages. It would be reasonable for Bing to look for and honor these
directives when collecting their clickstream data, and Google isn't availing
itself of the option.

~~~
beoba
It's for controlling search results; blocking overzealous crawlers is a
convenient side-effect.

robots.txt also blocks indexing of non-html files.

------
gojomo
Kevin Fox, the author, makes the interesting point that what Microsoft is
mining via clicks isn't directly Google's results, but the users' estimation
of those results, when they choose which ones to visit. This data is to some
extent a new creation, although definitely derived from the Google results.
Custom analysis of such user activity doesn't necessarily just port over
results, but could also result in an even better ranking than the original
Google presentation.

Fox also suggests there could be a robots.txt-like standard where sites
declare they want to opt their users' activity out of any such analysis. That
strikes me as a bad idea: users ought to own their own interaction trails.

