

Matt Cutt's thoughts on Google Bing Debate - ohashi
http://www.mattcutts.com/blog/google-bing/

======
kenjackson
_Let’s take that thought to its conclusion. If clicks on Google really account
for only 1/1000th (or some other trivial fraction) of Microsoft’s relevancy,
why not just stop using those clicks and reduce the negative coverage and
perception of this? And if Microsoft is unwilling to stop incorporating
Google’s clicks in Bing’s rankings, doesn’t that argue that Google’s clicks
account for much more than 1/1000th of Bing’s rankings?_

My take on this is quite the opposite. If MS thinks they're doing nothing
wrong, they shouldn't stop. They should do what is best for their customers.

Reversing what they do now because Google "caught" them, would IMO, imply they
were doing something wrong and got caught.

I personally think what Bing is doing is great. I regularly use Bing and
Google. I wish there were a streamlined way for me to broadcast to both of
them... "for search term X this is the relevant link!"

MS has found a way to do this, and as long as its opt-in, I like it. I wish
Google had the same (although I do realize that since Google owns so much of
the traffic it is less of an issue for them -- its a net loss in terms of the
flow of information).

But maybe a question for Matt... Can you and MS work together for a toolbar
that does just this for both engines?

As a user this isn't a matter of Bing copying Google or not. But its about the
fact that search relevance still kind of sucks. I feel like I can make the
search experience better, especially in those occassions when I go to your
competitors site.

~~~
pessimist
I wonder if you would be ok if Bing did the same thing to amazon. That is,
imagine they used toolbar/IE logs to infer that people went to amazon,
searched for LCD TVs and then purchased model X. Then they could boost pages
about X in search, or implement a "bestselling" feature. After all, they are
"just" using clickstream data here. Similarly, they could track Netflix, and
so on.

IMHO, there's a loophole in Bing's argument that the user click data is free
for them to use. What about the site - do they have no ability to
consent/dissent to being tracked?

Maybe robots.txt needs to be updated for the toolbar.

~~~
kenjackson
Absolutely on all of those sites. And Wikipedia. Why would I not want them to?
The only reason I could think of that I would not want them to is if I thought
it would create worse search links as a result.

Although being able to personallize the search queries would be incredible. So
when I search for "Movie XYZ" -- it can also look at my clickstream and see
that I spend a lot of time in Netflix and Netflix has that movie, and they can
point me to that movie directly in my search results ("You can stream Movie
XYZ now!") -- kind of like a built in Clicker -- I'd love it.

But the problem is search relevance sucks. I'm in my freakin web browser all
day, yet it contributes nothing to my search relevance. WTF!?

But again, key to me is opt-in. If anything, the one takeaway I would take if
I were MS is to make it more apparent. But besides that. Everything you said,
and more, please. And from Google too -- Bing shouldn't be the only one to
benefit.

~~~
pessimist
I'm not talking about using just visits to improve search though, I'm talking
about using the pattern of interaction on a site to basically replicate a
site's data. That is, imagine netflix recommends movie X to you, now Bing
infers that Netflix has done so using custom code to parse their IE logs, and
then recommend movie X to you.

IMHO, this would be completely unethical.

That's exactly what Bing has admitted to doing with google:
[http://online.wsj.com/article/SB1000142405274870412450457611...](http://online.wsj.com/article/SB10001424052748704124504576118510340787364.html)

"Stefan Weitz, director of the Bing search engine at Microsoft, said in an
interview the company studies how certain users interact with Google in order
to improve Bing. ..."

~~~
kenjackson
I'm not sure I get your example (not clear when Bing would recommend a movie
to me). So let me give that I do understand.

I go to Netflix and search for the move "Network". I end up clicking on "The
Social Network". Later on Bing, if I search for "movie Network" -- I'd hope
that one of the movies that comes back is "The Social Network", based on that
clickthrough data from Netflix..

In my mind the only thing that is borderline unethical is that we've
artificially limited this clickstream data to one company. I'd like to give it
out more broadly. It's my data right?

~~~
pessimist
I was considering the case when you go to NetFlix, and they recomment the
movie "Social Network" to you, and you click on it. Surely the url will have
sufficient info to claim that its a recommendation.

Now bing implements a movie recommendation service, which also magically
recommends for you "Social Network", based on clicktracking of Netflix.

IMHO, what Bing is doing with Google is exactly the same scenario as above
(extract click data from a competitor's service, and surface the exact same
data in their own product).

~~~
kenjackson
OK, you're talking about a new service. Gotcha.

Why wouldn't I want Bing to recommend the movie? The first thing Bing should
do is use Netflix's APIs to suck down all of my ratings of movies. Then it
should use my clickstream data for Netflix, IMDB, Hulu, Amazon, Facebook
etc... And if I'm using Media Center or SageTV, I want those things to also
get consolidated together.

Are you saying you want your Netflix data in one silo, your Hulu data
elsewhere, your TV data elsewhere? Don't get me wrong, I want this to all be
opt-in, but when I opt-in on MY behavior, I want them to use it.

------
webwright
"Copying" was a pretty brutal word to use-- not surprising that it raised
MSFT's hackles a bit.

MS clearly uses toolbar users' clickstreams (on and off Google) to improve
their own search efforts. Google created an artificial scenario where the ONLY
input was Google search behavior and lo, the search results are exactly the
same. Whether or not that steps over a line (I don't feel that it does), it's
not "copying" in my book.

Another interesting point is that Google has been beating the "open" drum for
a long time now. No walled gardens, right? If a Facebook user should have the
right to take his data with him wherever he wants to go, shouldn't a Google
user be able to fork over their behavior data to Bing?

Matt's point about MS' lack of clarity when getting folks permission to grab
their clickstream was dead-on. THAT is pretty outrageous and MS should be
ashamed of that.

Regardless of all that, hats off to Matt for keeping a cool head and stating
his position in a respectful way.

~~~
boredguy8
I don't understand how this isn't a settled issue. If clickstream data is 1 of
1000 signals, and you create clickstream data for a specialized query that
will never trigger off another signal, then your created data will be
reflected. That sounds exactly like what happened.

You'd have to make the argument that using this data is wrong, somehow. But to
make that argument, you'd basically have to argue that users shouldn't be able
to share their habits with whomever they want to. I doubt that argument can be
made in a compelling way.

I'm surprised Google is pushing this further, and a little disappointed.

~~~
thezilch
Accept in the "original" [torsoraphy] search. Forget Bing having to compete
with Google's "spell correction team;" Bing need only use Google [copy] as a
high frequency signal on tailing queries.

That sounds like exactly what happened, and it's wrong.

~~~
DenisM
Look at it this way:

Google is setting trends on tail terms due to its massive market share. Once
the trend is set, Bing captures the trend by monitoring user's behavior and
provides that to the other users. By reflecting user's behavior in its own
index, Bing seems to replicate the Google's index structure which created that
behavior. Had there been no Google, there would have been different trends and
Bing would capture those instead.

It's not Bing's fault that Google is so influential, and certainly not a
reason for them to stop dedcuing relevance frm user behavior. Just because
Google creates trends, does not mean they own trends. The users own their own
behavior and are free to share it.

~~~
hackinthebochs
This comment really gets to the meat of the issue here. Search is about
discovering and then predicting user trends. Google in fact is creating user
trends through its position in the market. Should Bing be locked out of the
market because Google is currently in the dominant position?

If Bing is allowed to discover user trends then it will necessarily end up
being a replica of Google's results in some cases. There's just no way around
it.

------
angusgr
Matt's assertion about "Suggested Sites" sending this data seems to be
conjecture. I ran some packet captures and didn't find anything of that kind.

However, if you install the Bing Toolbar then it does send URL clickstream
data. It explicitly asks you beforehand if you want to send info about "the
searches you do, websites you visit..." though.

Full post: [http://projectgus.com/2011/02/bing-google-finding-some-
facts...](http://projectgus.com/2011/02/bing-google-finding-some-facts/)

~~~
mayank
Excellent analysis, thank you, especially in the distinction between
"suggested sites" and the Bing toolbar's behavior. I can't argue with your
methods. However, I do differ with some of your conclusions:

> The behaviour I’ve seen explains Google’s experiments, but does not support
> the accusation that Bing set out to copy Google.

I don't think it's so much about "set out to copy Google" necessarily, as it
is that they are explicitly parsing Google queries and results from the
clickthrough data and using it (quite directly) for their own results. What
they set out to do is immaterial given what is provably happening.

> Bing Toolbar is tracking user clicks and Bing could use the result to
> improve search results. I don’t personally see any great distinction between
> this behaviour and Google’s many tracking, indexing and scraping endeavours
> which they use to improve their own search results.

The difference is that Google has _proven_ that the results of certain queries
are being directly fed as Bing results. If Microsoft does the same with Google
rankings, I'd see your point, but right now the evidence only points in one
direction.

> While I personally dislike the privacy implications, Bing Toolbar is pretty
> upfront about it when it gets installed (unlike much web page user
> tracking.) The fact that the tracking is plain HTTP not HTTPS, with the
> content in plaintext, would seem to indicate that they weren’t seeking to
> hide anything.

I'd be interested to see if Google over HTTPS queries are being transmitted by
the toolbar over HTTP. That _would_ be a pretty serious privacy violation IMO,
especially when you pair that with unencrypted wifi at Starbucks. See the AOL
search log fiasco: [http://www.somethingawful.com/d/weekend-web/aol-search-
log.p...](http://www.somethingawful.com/d/weekend-web/aol-search-log.php)

~~~
angusgr
_The difference is that Google has proven that the results of certain queries
are being directly fed as Bing results. If Microsoft does the same with Google
rankings, I'd see your point, but right now the evidence only points in one
direction._

However, these are also the only experiments that have been run - outlier
data, gaming the algorithms. If you only test one possible outlier scenario
and don't control against any other, it's fairly ambitious to stand up and say
"this is exactly what is happening!"

IMHO it would have been better for Microsoft to respond along these lines,
instead of going into counterspin mode, though.

~~~
magicalist
what control are you suggesting? I read a suggestion elsewhere of using only
Firefox for some queries and seeing if they show up, which is silly, of couse.

without a possible mechanism of action there's no point of using something as
a control; I might as well wait to see if the files on my disabled usb drive
show up on bing. they were "gaming the algorithms" because they suspected that
the algorithms existed, and it wouldn't have worked if there was nothing to
game.

~~~
angusgr
Set up some other dynamic sites (not indexed or linked anywhere) with random
nonsense words in their URL strings, and a single link to some unrelated site.
Click the link. Repeat, wait two weeks.

Same as what google did, just not on google.com.

Of course, if Bing has other "relevance" indicators (ala BingPageRank) then
this might not work because it will rule the nonsense control site irrelevant
linkspam, and not put it in (even Google's experiment only got a 9% success
rate.) It would be better if someone like Facebook injected the test nonsense
strings in their URLs.

Something of this sort would at least show they tried to differentiate between
"scrapes all clickstream data" and "uses Google's search results".

------
user24
Oh man. That research paper he quoted strongly indicates that MS is
specifically targeting Google.

I was totally wrong. Bing _are_ copying Google. I'm sorry (I am the "What on
earth are Google thinking" author).

I still think the honeypot experiments didn't support the conclusion. But this
paper coupled with Bing's lacklustre pseudo-denial strongly indicates that my
view of events was not accurate and that Bing were indeed blatantly
piggybacking off Google's hard work.

I'm really disappointed. I gave Bing the benefit of the doubt, saw that there
was another conclusion which explained Google's observations and I advocated
that. But now it really looks like MS overstepped the mark and were
deliberately picking out Google URLs in order to use signals from Google's
algorithms to improve Bing.

That really is cheating.

edit: although still it's not conclusive. I don't know what to think now. I
think I should stick to writing code :)

~~~
kenjackson
In the spirit of completeness, the paper says they use Google, Bing, and
Yahoo.

But the paper doesn't say this is what Bing uses. Looks like it, but we should
be clear. This is MSR, not Bing.

Good paper though. But it is another example of MS actually not thinking its
bad. They published a paper on this where they spell out how MSR would do
this. Did people in the academic community protest that this would be copying
if implemented?

~~~
user24
Yeah, I mean you could make the argument that it's just one piece of MS
research among thousands of others, etc etc, but the fact that they've sunk
resource into literally spying on Google (et al) is just too much. It could
still turn out that Bing don't use that technique... but it's looking a lot
less likely. Especially given this whole thing started based on Google
noticing their spelling corrections popping up on Bing, and stealing Google's
spelling corrections is _exactly what this paper was about_.

------
tristanperry
I haven't got much to add since I've expressed my views previously, plus this
blog post does a good job of calmly pointing out the issues. It's worth
emphasising the penultimate paragraph from the blog post since I feel it's key
here though:

 _"Since people at Microsoft might not like this post, I want to reiterate
that I know the people (especially the engineers) at Bing work incredibly hard
to compete with Google, and I have huge respect for that. It's because of how
hard those engineers work that I think Microsoft should stop using clicks on
Google in Bing's rankings. If Bing does better on a search query than Google
does, that’s fantastic. But an asterisk that says "we don't know how much of
this win came from Google" does a disservice to everyone. I think Bing's
engineers deserve to know that when they beat Google on a query, it's due
entirely to their hard work. Unless Microsoft changes its practices, there
will always be a question mark"_

I agree with this entirely. Given Bing's resources, it seems bizarre that they
were relying on Google's results even a little bit. And until they sort this
issue out, I definitely agree that it's difficult to know when to give credit
to Bing.

(Plus it's made all the more difficult considering that Bing keep sort of
denying this whole mess, despite the quite conclusive proof. I suspect that if
they do stop using Google's results, we won't know about it considering the
hole that Bing have dug themselves with their denials. Ah well, que sera
sera).

------
whiletruefork
Matt Cutts should know better. Being part of '1000 signals' does not mean all
signals are weighted evenly. It does not even mean the signals are weighted
the same across all query types. This is machine learning - the actual
weighting is learned and dynamic (always shifting) and not controlled. And
there is absolutely no reason for Microsoft to take out a particular signal
just because Google asked. There needs to be proof of unethical behavior, of
which there is none.

The Chrome and Gmail EULA's are as bad as the IE8 Suggested Sites bit
mentioned in this article. GMail basically reads your email to provide
contextually relevant ads. Does my mom know that? No.

Not a plug, but I blogged about this @ [http://roshank.posterous.com/google-
versus-bing-no-one-is-be...](http://roshank.posterous.com/google-versus-bing-
no-one-is-behaving-unethic) . I believe this should be a discussion on ethics
- and feel it is ethical for a company to do whatever it wants with data
contained in its own software application.

~~~
andrewljohnson
I don't think you read the whole article, because Matt quotes Nate Silver in
saying that exact thing

You said: " -Being part of '1000 signals' does not mean all signals are
weighted evenly."

Matt quotes Nate: "First, not all of the inputs are necessarily equal. It
could be, for instance, that the Google results are weighted so heavily that
they are as important as the other 999 inputs combined."

~~~
ac2u
"It could be" is not evidence.

------
zaidf
Google should also quit trying to build the invite-your-facebook-friends
feature in order to get virality for their services. Because, you know, it
does a big disservice to their engineers and marketing people that they must
rely on facebook.

And if they are not relying on facebook, why not just get rid of it all
together?

"Because it helps the users". Oh I see. But so does Bing's actions.

"Because its individual users giving permission" Same with Bing.

------
davidu
I think Matt lays out a strong argument for why they have made such a stink.
And it sounds like a pretty compelling defense for their actions to go public.

But...

1) I think they will regret it bigtime when all the attention they are causing
makes the bored government officials poke their head in and realize that at a
macro level, neither _Bing nor Google_ does anything to protect user search
privacy.

2) I think Google has more to lose by bringing this to light than they have to
win. Despite Matt's defense, it's hard to see it as anything other than being
petty and pedantic. But this is there response to anything that threatens
their search pageviews, and that's understandable even if erroneous.

They should focus on trying to be innovative again, that was the Google I
respected.

~~~
moultano
> _They should focus on trying to be innovative again, that was the Google I
> respected._

I work in search quality at Google. I'm busting my ass every day working on
fundamental reimaginings of how results get ranked. I'm going to keep doing
that regardless of what bing does, because it makes the world a better place,
and it's fun.

But, suppose the stuff I'm working on works out, and tomorrow Google shows up
with a wholly new set of awesome results. This is very possible, there is a
ton of headroom left in search quality, I've seen the experiments myself.

Then after a few weeks of sniffing clicks, Bing comes up with the same set of
revolutionary results, but they have no idea how they got there, they have no
idea what the evidence is to rank them there, all they know is that people
like those results on Google. Is this fair? Is this ethical? Is this even
legal?

People in this whole debate have the idea that the user is creating this
association between the result and the query, like the user searched through
the whole web and came back saying "hey microsoft, check out this great result
for this query! I found it! Isn't is awesome?" They aren't. Users click on
whatever results you put in front of them, generally starting with the top and
working their way down.

Ranking results is not a science with some objective optimal conclusion.
Ranking results is fundamentally subjective, and while data-driven, is
ultimately an opinion. The user does have _some_ discriminating power in this
whole feedback loop, but it's miniscule compared to Google figuring out how to
show them that result in the first place. Bing is taking the closest proxy
that they can practically acquire for Google's opinion, and using it directly
in their ranking.

~~~
kenjackson
_Then after a few weeks of sniffing clicks, Bing comes up with the same set of
revolutionary results, but they have no idea how they got there, they have no
idea what the evidence is to rank them there, all they know is that people
like those results on Google._

But this whole "no idea how they got there" is kind of bogus. The whole notion
of link analysis is to infer from a link that a page is important. Now they're
infering from a click that page is important. The relative importance of that
page is a function of the search query, like the relative importance of a link
is a function of the page from which it originated.

Link analysis and click throughs are both second order effects, right? If
CNN.com starts linking to a page, you don't really know why that page is
important, except for the fact that CNN.com now points to it.

So lets be clear MS knows how it got there as much as link analysis does. Its
because the second order effect of a click implies relevance, just as a link
does.

~~~
moultano
> _Its because the second order effect of a click implies relevance, just as a
> link does._

There are two pieces of information in a click, one provided by google, and
the other provided by the user.

Google says, "'foo.com' is a good result for [foo]."

The user says, "yup."

Which is providing more information here?

Everyone at google is happy to admit that if the internet didn't exist, we
would have a very hard time ranking it. Is bing ready to admit that if google
didn't exist, they'd have a hard time ranking the internet?

~~~
kenjackson
Your first statement is wrong.

The user provides at least two pieces of information.

1) The search query (which you attmpted to hide in Google's information)

2) A URL they retrieve from Google

Google provides one piece:

1) An ordered list

By FAR the most important piece of information is the search query.

In fact, given a choice between having the search query or an arbitrary
ordered list, I'm sure Bing would value the search query the most.

Clickstream data from Google is almost certainly but one piece of clickstream
data (even if you are special-cased). Just as Microsoft.com is just one page
you index. Now you wouldn't say that you'd have time ranking the internet if
Microsoft.com didn't exist would you? Of course not.

And I think this is where Google is going astray. It's fine to call a spade a
spade, but your overreaching. Trying to argue that MS couldn't index w/o
Google is simply absurd. And if Matt was trying to be diplomatic, you're
certainly not being by implying that MS would have a hard time indexing w/o
Google.

~~~
moultano
> _Google provides one piece: 1) An ordered list_

That's not an accurate view of how search works. The user provides the query,
but the results that are returned don't need to have anything obvious to do
with the query. The terms might not be on the page, and might not be found in
any data remotely associated with the page. We don't just intersect posting
lists here.

Furthermore, with suggest and instant, often the query itself is provided by
google.

~~~
kenjackson
I didn't say the ordered list was obvious. But it is an ordered list (at least
to the user -- it may not be ordered all the way down or partially so).

Fair enough point about suggest and instant. In those cases Google is adding
value to the query, but I think we'd both agree that it is incrementally so.
Instant generally just helps you get to the eventual query faster. Spell
checking does add value, but its a +epsilon. In either case, your right, we
should add this to our function.

In any case I think my point still holds. The query is the most important
aspect. The browsed page, for indexing purposes is the second most important
aspect. The association between the query and the URL is the least important
aspect.

IOW, the least value for Bing is the search algorithm. The query and page to
add to the index (as a browsed page) are the most important.

~~~
moultano
> _The association between the query and the URL is the least important
> aspect._

I don't understand why you believe this. This is the entire basis for why
search is hard. It is _the_ hard problem that search tries to solve. Many
companies have spent in aggregate, billions of dollars in R&D trying to solve
this, and all but a few have folded. It's an "AI-complete" problem in that
solving it perfectly would be a sufficient demonstration of strong AI. It's
the whole reason why Google exists.

~~~
kenjackson
The reason is that with the URL and the query, there's a decent chance you can
derive the association.

Now the reason I say this is that to my best approximation the key difference
that makes Google better than Bing (when it is) is the index size, and not the
search algorithm.

So the two key pieces of value are: A) queries that people don't do on your
site, which would have generated poor relevance

B) Pages that are actually browsed that you don't have indexed or up to date.

Now don't get me wrong. There is value in the association, but I think I'd
capture 80% of the value with the above. Now as you point out, these lists are
often non-obvious, but Bing does a near equal job of creating the lists. And
if you give them the search terms where they have gaps. And fill the index,
then I think the gap closes most of the way.

And lets be clear, because they clicked the link doesn't mean its a good link.
See ehow.com or expertsexchange. But it does have some value.

And frankly I think MS would have been willing to let it go if Google had gone
straight to MS with it and said, "We have this data. Even though its not
unethical we can spin it to the media to make it look bad. Kill the
association." MS probably would have as the net win isn't that huge. Now the
PR loss in pulling it would be worse than the PR loss in keeping it (with no
integrity loss, since they [and I] think it is perfectly ethical).

~~~
moultano
>is the index size, and not the search algorithm

I'm pretty thoroughly sure this is false. I'm sorry I can't show data to back
it up, but Google has many many systems that are years ahead of bing's
technology. I'm kinda hamstrung here by being unable to reveal anything about
them. To a decent approximation, both Google and Bing likely have the whole
internet that matters (and that they are allowed to crawl) in their indexes.
Bing is just unable to return this data for as many queries as Google is.

>but Bing does a near equal job of creating the lists.

To the extent that this is true, how much of it is due to data harvested from
Google search? This is something that google can't really demonstrate with
evidence, and what I would have hoped that bing would clarify in a public
statement if they had anything defensible to say.

~~~
kenjackson
_To a decent approximation, both Google and Bing likely have the whole
internet that matters (and that they are allowed to crawl) in their indexes.
Bing is just unable to return this data for as many queries as Google is._

Maybe this is true, but its not apparent. I had commented on this a month or
so back that I thought Bing was better at "vague" queries, where I don't know
exactly what I'm looking for. But I'll know it when I see it.

Whereas Google is really good at targeted queries. I need info on the HP
battery model number 003D434F90. These searches in Bing will often bring back
literally nothing, while Google will often have one or two links, but they
happen to be the link that I want. The text of the query is almost always
found in these pages.

From that I infer it is index size, since the text is in the page.

 _To the extent that this is true, how much of it is due to data harvested
from Google search?_

While I find the quality of searches similar I don't find the results to be
similar, if that makes sense. If Bing is harvesting your results, they're
still using other very clever methods to surface other equally good, yet
different results.

------
noibl
I found the discussion video on that page fascinating, in how Harry Shum's
take on the other topics echoed his defence of Bing on this issue.

On the copying: Bing has dramatically improved over recent history because we
use lots of inputs and we would have preferred if Google had talked to us
privately so we could figure out how to make this less obvious on the long
tail.

On web spam: Google is the industry leader and needs to share more with us
little guys so we can all work together to beat this.

On search quality: Google needs to disclose their quality metrics so the
industry as a whole can understand what users want. Then we can all make
search better for everyone.

I thought Cutts was very gracious in his responses despite looking incredulous
at hearing some of this stuff.

Look, this "let's all get along and work together as an industry to fix
problems for the user"... it's bullshit. Either compete fairly or don't, but
don't pretend that Google owes you data so you can get a leg up.

------
KevBurnsJr
2 minutes of hearing Matt Cutts talk and I understood the problem.

5 minutes of hearing Bing's VP of Search talk and all I hear is distancing,
deflection and double speak.

Bing does copy Google's results indirectly through click data gathered by the
Bing toolbar.

------
alain94040
That shows a lot of bad faith on the part of Matt Cutts. It has been explained
to death, here and elsewhere, what most likely happened. The fact that he
continues to make the same accusation... well frankly he lost a lot of
credibility with me. One more manipulative corporate drone, one less genuine
hacker.

~~~
Matt_Cutts
Sorry you felt that way; was there a part of the post that struck you as
especially off-base?

------
amalcon
Bravo.

This is, by far, the best treatment of the issue I've seen anywhere, on
account of how it's the only level-headed one apart from Nate Silver's.
There's a little needless inflation at the beginning with the screenshot
comparison. It does have the obvious and expected slant. On the other hand,
there's no enormous hyperbole or anything like that. There are no vacuous
statements. The word "copying" appears a few times, but that's at least
descriptive, and there's no use of other loaded terms like "cheating",
"stealing", and "unethical".

Also, for what it's worth, it moved me from 80/20 certain that nothing fishy
is going on to 50/50: the spell correction paper shows to a certainty that
this has been _considered_. I'm sure Googlers know as well as I do that not
everything in a research paper finds its way into the product, and this one in
particular may have been nixed by management types for this very reason. It's
still awfully suspicious.

I still think the original experiment completely fails to demonstrate anything
unethical, and I still think the original info release was both hyperbolic and
needlessly inflammatory. It does demonstrate a need for some more information,
which seems to be all this post is asking for. If it had looked more like this
post, I think the 'net could have been spared a lot of controversy. Maybe Matt
Cutts should be writing these things, though far be it from me to decide that.

------
lamdanman
I'm shocked at the amount of suppport on HN (thus far) for Bing's attitude on
this issue.

I understand that they may well not target Google's SERPs specifically in
their clickstream analysis but they should certainly have _excluded_ Google
from it, for ethical reasons.

Google state that Bing created associations from clickstreams through Google's
SERPs on common queries (e.g. the tarsorrhaphy spell check test), not just
long tail queries. Given that Google is extremely popular, this must have
given a lot of weight to clickstream signals resulting from Google SERPs on
many occasions, for common queries. That is entirely unethical and I'm shocked
so many here don't find a problem with this.

Take the case of a highly-ranked great result on Google for a particular term,
which Bing rates lower due to inferior algorithms. Bing's analysis of Google
users would send that result higher in the Bing SERPs, mainly due to Google's
expertise in highlighting that site, and only to a small degree due to the
user's choice of clicking on it. That to be does fall under "copying" Google's
results, it may not be illegal, or intentional, but copying it remains.

Bing should have excluded Google from their clickstreams, and I certainly hope
Google exclude Bing from their's. (Matt Cutts stated they do in the video.)

~~~
alain94040
Let's see. You're ok with Bing doing click analysis, but you want them to
specifically throw the clicks away when the user stopped by Google, because it
would be unethical. And also probably remove clickstreams that involve Yahoo's
directory. And StackOverflow answers, because it would be cheating. And blogs
referencing interesting links. And... or just accept that clickstream analysis
is a great source of data that will be an aggregation of many, many other
things on the Internet. Including Google.

------
greendestiny
A quick thought experiment on this subject: say I search for a term on Bing,
find a link I want and then put that link on my blog - when Google indexes
that blog have they copied Bing?

Sure its different, but is it meaningfully different? I made the link between
the two terms, I also consented for that data to be used in both cases
(assuming the data comes from the Bing toolbar and an agreeable robots.txt). I
just don't see how the data that Microsoft is using is off limits.

~~~
zmmmmm
Just putting the link on your blog doesn't establish the connection between
the keywords and the link. If you put both the keywords and the link on your
blog then it allows Google to independently establish that connection using
the merits of their own algorithms.

The "incrimminating" part here is that Microsoft _appears_ to be intentionally
parsing the keywords out of the search. Which means they are intentionally
looking at a click and saying a) this is a google search and here are the
terms used and then b) this is a result that Google returned, and they are
then using that to fill their own index. If they generically parsed the URL
for terms then you might argue they are not giving Google special treatment,
they are just doing this for every page the user goes to. However that's a bit
hard to buy - if they really did that they would end up with all kinds of
garbage associations from opaque URLs. So they must have a signal saying "this
was a google search, treat it better than the others". Or they are somewhere
in between the two. It's not clear to me where they fall on this scale - it's
generally murky. At worst, I'd say they are copying, at best, I'd say it's
sneaky but clever and fair game. The minute they single out Google and say
"hey, this must be a good result" I think they crossed a line.

~~~
greendestiny
Yeah you'd have to title the link with the search term. Personally I think the
analogy someone else made to web directories is perfect, and that discussions
is more interesting.

It seems entirely appropriate if you have a legitimate right to use that link
to parse it for structure.

------
_flag
I still don't understand why Google hasn't done the exact same test with a
control variable. Run the exact same test again on a domain that isn't
google.com.

~~~
blahedo
Do we know they haven't? We only know they haven't _reported_ on such a test.

~~~
_flag
Why would they purposefully make their case inconclusive?

~~~
blahedo
There are three reasons for Google not to report results of a control-case
test:

1) It didn't occur to them to run a control in their experiment.

2) It did occur to them but they decided not to do it for some reason.

3) They did run the control but it did not bolster their case.

As you point out, if they ran it and it bolstered their case there would be no
reason not to report it. Of the above reasons, #2 seems the least likely
(because it's easy to set up a control and it would better clarify the
situation). I do not have enough evidence to judge whether #1 or #3 is more
likely.

------
blahedo
Most of my thoughts have already been echoed elsewhere in this thread, but:

> _"To me, what the experiment proved was that clicks on Google are being
> incorporated in Bing’s rankings."_

It proved that clicks on other websites are being incorporated in Bing's
rankings, which had already been public knowledge, I think. It didn't prove
that _only_ or _disproportionately_ clicks on Google are being thus
incorporated, although that is what Google is repeatedly claiming.

> _"If clicks on Google really account for only 1/1000th (or some other
> trivial fraction) of Microsoft’s relevancy, why not just stop using those
> clicks and reduce the negative coverage and perception of this?"_

Is Matt Cutts suggesting that Bing special-case an exclusion for Google
results?

~~~
pkamb
Special-case an exclusion for any site that contains robots.txt?

------
tel
The technical aspect is plain. Copying Google is a great idea. If you're tying
to predict the weather and are given a general bag of indicators, the one that
is a well-reasoned expert opinion of the weather is probably a very important
source. It may even account for a majority of your own predictive power. It
would be stupid to ignore it if your goal is to simply make the best
prediction.

Likewise, aggressively scraping Google is a smart move. Then you add some new
innovation atop it and have a real opportunity to return more informed
responses. This is done all the time in science.

In some sense Google simply has to acknowledge that they are a pretty
important segment of the web, not some separate entity from it.

So only the legal/ethical question remains. In science it's unethical to work
atop someone else's project without crediting them. I doubt Bing would be
interested in adding a Powered by Google bar. Moreover, since Bing could
directly profit off Google's work undercutting actual algorithmic progress
through pure marketing competition (hypothetically, anyway, I am sure that
Bing has added tech too) I feel like it's better to restrict this sort of
thing.

I think it's fair to say that much like commercial images on Flickr or sample
songs, it is unethical and illegal to copy digital services and goods then
either claim them your own or profit off of them. I think Google results are
suitably close in spirit to this.

So maybe Bing and Girl Talk need to team up and discover and defend the
ethical rights of sampling digital goods.

------
o_nate
I totally agree with this. There's not really any good reason why Microsoft
should piggyback off of Google's search results. The fact that they're able to
observe Google's results by observing user clicks in a browser rather than by
simply harvesting results directly off of Google's website confuses the
matter, but ultimately it's irrelevant. This is just a sneaky way for
Microsoft to observe a user's interaction with an information service
(Google), to see what information the user obtained, and then offer the same
information themselves.

------
dminor
Seems like Bing is having trouble explaining exactly what they are doing, and
Google is having trouble explaining why they should stop.

Is it illegal? If not, why should Bing stop doing it?

> I think Bing’s engineers deserve to know that when they beat Google on a
> query, it’s due entirely to their hard work. Unless Microsoft changes its
> practices, there will always be a question mark.

Kind of rings hollow to me. If I were Bing, I'd want to do what's best for my
users.

~~~
lallysingh
Well, the whole thing's kinda meta, isn't it? The content isn't on either
site, it's just, really, the content identifiers. Not even that, this is about
a sorting of identifiers against arbitrary strings of text. Whatever
distinctions we have here are gonna appear incredibly small (and petty), but
it's really the bread and butter of search.

Both are map makers in a sense, providing guides to what they didn't create,
but still spent plenty of effort to make that guide. In the literal, map-
making world, the artifact is the printed map. In that world, the way to check
for copyright infringement is to see if mapping errors are duplicated as well.

The presumption being that if the errors were copied, then so was the good
data.

Short-term, the benefit was that users would get discounts on high-quality
data, as they only had to pay for the efforts of the map-copy, not the
original map data acquisition. Of course, then you're just waiting for the
quality to drop, as there's less and less incentive to actually do the map-
making work. The margins go down, and the original sources have to update
their maps less often to keep their costs low enough to be competitive.

There isn't a printed page with web search; the product is the output of a
continuously-running dataset & algorithm.

But, I'm gonna ask, in the web-search world, how do you define copying and how
do you test for it? If you don't think there is a valid definition, please
don't count yourself the same as the group who thinks that there is a valid
definition and this isn't it. They're two separate things.

(I've framed the question how I see it, and I work for Google, but I'm
obviously no official speaker -- I've only been here a few months, and don't
work in search quality. This is (almost definitionally) a fanboi war of sorts,
and I wanted to stay out. I probably should have :( )

------
spaghetti
Google should do more testing. For example: repeat the same tests (different
nonsense search terms and results just to be sure) where none of the data sent
to/from the browser contains any google specific terminology. If the Bing
toolbar isn't special-casing google then I think the test results should be
the same.

------
Keyframe
That video was way more interesting than this semi-fabricated drama. Search
result spam, adsense role in it, and how to (not) tackle it is what I'd like
to hear more about. Google search results degraded over the years to the point
where I have a bing search on my wp7 phone as default search and I actually
don't mind/don't care, because search results from google are not an
imperative for me anymore (and I'm not cheerleading for anyone).

------
ot
As I already said in my reply to Matt
(<http://news.ycombinator.com/item?id=2169585>)

> Well, no, that's a research paper that says that they have made experiments
> in that direction, but this doesn't imply that this is currently done in
> Bing.

I answered Matt's questions and raised some other points, still waiting for an
answer. Matt, could you please comment?

------
c2
Either bring a lawsuit, or be quiet. You aren't going to shame Microsoft into
changing their current policy, and even if it does improve results by only
1/1000, as long as they are legally within their right to do it, why wouldn't
they?

It's laughable to think this "controversy" is putting a black eye on to Bing
in any way shape or form, except among perhaps the most technical and anal
retentive among the population.

~~~
c2
I understand the down votes, my comment wasn't all that constructive. But
seriously, the amount of attention paid to this issue from Google might be
better spent making their own products better. At best this is an issue for
lawyers. The low road Google is taking by making a big stink in the
'blogosphere' is asinine to me, but clearly many of you disagree by
continually voting up these stories.

------
jarin
As to how strongly Bing uses Google as an "indicator", I think the results
from this definitely-not-long-tail search are pretty telling:

<http://www.google.com/search?q=how+old+are+you>

<http://www.bing.com/search?q=how+old+are+you>

~~~
angusgr
How is that telling, exactly?

The top 4 results (including one which has the domain name of the query) are
the same but in a different order. The rest of the results appear totally
different.

Even if one random query you ran returned an identical first 10 results,
exactly what would that prove? Statistically significant how?

~~~
jarin
How likely is it that completely different ranking algorithms would turn up
the solarviews.com and ruthannzaroff.com results on the first page?

~~~
angusgr
I'd start with the fact that they both have the keywords of the search string
in the page title and in the URL string.

Look, I'm not saying it's impossible, I'm just saying that two different
algorithms can yield the same results without one having used the other as an
"indicator". To trot out a tired phrase, correlation does not imply causation
(and I'd argue you don't even have correlation at this point.)

~~~
jarin
That's a good point, I'm downgrading my opinion from "suspicious" to
"curious".

------
BlazingFrog
Am I the only one getting tired of hearing Google whine about this? In my view
it's simple, just do something about it or shut up.

~~~
Kylekramer
While I don't like how Google handled this situation, whining about it is
pretty much all they can do.

------
l0nwlf
A lot of people are talking about user freedom here. How many users knew they
are giving their data to Microsoft ?

------
timtim
I am still amazed that Goggle even looks at or cares about Bing results. That
is not a good sign. It makes them look vulnerable.

PS: I haven't checked out our competitors pages for months, too many ideas of
my own.

~~~
kenjackson
Check your competitors page right now. It will take you less time than reading
the next post on Hacker News, and what you find may (or may not) surprise you.
That consulting tip is on the house. :-)

------
ddemchuk
My two favorite takeaways from this blog post are:

1) You can increase your site's rankings by installing the Bing Toolbar
amongst a group of people and have said group search google for your target
keyword and click through on your result.

2) Google is able and willing to manually screw around with search results in
a seemingly easy manner.

As to the issue of Bing slurping up google's data for their own purposes:
Scraping the web is a double edged sword. They aren't doing anything illegal.
If anything, this whole ordeal has made both Search Engines look like a bunch
of teenagers fighting over a who stole who's boyfriend.

~~~
kenjackson
_You can increase your site's rankings by installing the Bing Toolbar amongst
a group of people and have said group search google for your target keyword
and click through on your result._

But you can just skip the toolbar altogether. Just go to Bing and do the
search and click your site. Likewise, if you go to Google and do the same,
you'll increase your ranking on Google. There is nothing new here, except if
you install the toolbar and go to Google you can increase it on Bing also.

This is something people have been doing for a while.

 _Google is able and willing to manually screw around with search results in a
seemingly easy manner._

Google can do this even if Bing stops this current practice. Just go to Bing,
do a search, go the last page and click a link.

Fundamentally, with respect to fraud, nothing has really changed.

~~~
msg
Unless Bing weighs clicks on Google SERPs more strongly than clicks on Bing
SERPs! After all, Google is the market leader...

This is the problem, we really don't know how hollow Bing is any more. If it's
a thin UI over results that grow closer to Google, it's intellectually
shallow. I wouldn't trust Bing search to combat, e.g., click fraud and SEO
better than Google. If anything I would expect the edge cases and tricky
behavior to derange Bing further.

------
Charuru
Just watched the video. It's evident that Bing uses clicks as one of its
signals. So a possible SEO tactic now would be to spam clicks on your link?

~~~
whiletruefork
Easy to catch and filter. But yes, in theory, this will improve results. The
proportion of users clicking the 10th link on a page is much lower than those
who click the 1st. If users clicking the 10th > 1st, then there is an assumed
ranking problem. So clicks help fix ranking.

------
drivebyacct2
My question is... did Google install the Bing toolbar and then performing
these searches on Google and click the links? If not, is Bing suggesting that
users searched for those convoluted strings, in IE, with the Bing toolbar, on
Google's website, and then clicked those links?

I hope this doesn't sound hyperbolic, that is really my understanding of the
issue. Sorry, I'm not a big SEO guy, it's very possible my understanding or
comprehension here is just flawed.

~~~
drivebyacct2
Really, -1? When I asked a question and prefaced it with a huge disclaimer
stating that I might not understand the full implications of how Bing is using
their toolbar metrics?

I just don't understand how Bing can claim they were using toolbar metrics
when it seems highly unlikely that any user would be searching for these
keywords. Let alone on Google, when they have the Bing toolbar installed.

But really, thanks for the anonymous downvotes.

