
Bing sets the record straight on recent accusations - thankuz
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/02/setting-the-record-straight.aspx
======
axiom
From the Google blog post:

"This [torsorophy] example opened our eyes, and over the next few months we
noticed that URLs from Google search results would later appear in Bing with
increasing frequency for all kinds of queries: popular queries, rare or
unusual queries and misspelled queries. Even search results that we would
consider mistakes of our algorithms started showing up on Bing."

That's the key. Not the honeypot. The honeypot was just a test to see if they
could catch them red handed.

Bottom line is that Google looked at the statistics and found that Bing
results were improbably similar to Google results. Maybe they're lying about
this, but I doubt it.

There's not enough information to figure out exactly to what extent Google
impacts Bing results, but I would bet a lot of money that in the Bing code
base there is Google specific code that behaves along the lines of "if on
google, do x" rather than some generic code that just targets all sites.

~~~
blahedo
I don't think anything about this is clear-cut. I can imagine a very simple
algorithm, as follows:

* When query Q gets made more than N times at bing.com,

* Mine clickstream data for the next M urls requested after searches for Q

* Any url that appears more than T times (possibly spread across some number of users) is presumed to have been found relevant to Q, and derived either from later searches (corrected spelling) or curated sites or other search engines. Add to mapping of valid responses to Q.

It's not a very _good_ algorithm, of course, and if you have _any_ other
source of information about Q you're probably better off using that instead.
But it or something like it could explain the torsorophy example and every
other part of Google's narrative, and it's not particularly suspicious or
questionable, and it certainly doesn't involve targeting Google.

~~~
moultano
Except that in this case, the honeypot searches were not done at bing.com

~~~
blahedo
No, but they must have also been doing regular searches at bing.com in order
to test their claim. Where do you think they got their bing results from?

~~~
tghw
It doesn't matter, because the results (from the honeypot) had nothing to do
with the searches, so it would have been impossible to clue Bing in directly
by searching on Bing a lot.

~~~
blahedo
Are you claiming that they made the queries without actually making the
queries? Reread the first line of my algorithm: once you identify a query Q as
more than just a one-off mistake—whether it's an actual new item or a common
misspelling or, maybe, a trap—then you decide it's worth looking into.

Put another way, it's impossible _not_ to clue Bing in on at _least_ the fact
that you are making these searches.

~~~
broofa
That is _exactly_ what's being claimed. The queries were not made on bing.com,
they were made on google.com. The only way Bing can become aware of the
results of these google.com queries is if they're "spying" on the user's
activity via the Bing Toolbar and IE8 suggested search features.

From <http://goo.gl/Bi0JH> (Google blog):

"We gave 20 of our engineers laptops with a fresh install of Microsoft Windows
running Internet Explorer 8 with Bing Toolbar installed. As part of the
install process, we opted in to the “Suggested Sites” feature of IE8, and we
accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on
the Google home page, and click on the results, i.e., the results we inserted.
We were surprised that within a couple weeks of starting this experiment, our
inserted results started appearing in Bing. Below is an example: a search for
[hiybbprqag] on Bing returned a page about seating at a theater in Los
Angeles. As far as we know, the only connection between the query and result
is Google’s result page (shown above)."

------
moultano
This seems to be mostly ad-hominem arguments in the classic definition of it:
"You're wrong because there's something bad about you," rather than addressing
any particular point. I don't see how questioning Google's motivations for
uncovering this is doing anything to "set the record straight."

A useful post would have addressed at least 3 things:

\- What are the specifics of the mechanism by which they ended up obviously
copying google's results?

\- Do they handle clicks from google differently than any other clicks?

\- How different would their results be if they didn't use clicks from
google's search as a signal?

~~~
earl
"- How different would their results be if they didn't use clicks from
google's search as a signal?"

That's the important one. These Bing people are happy to run their mouths
about 1k signals blah blah blah. Who gives a fuck? What is the _weight_ on
those signals.

Further, this case was made obvious because Bing couldn't create an answer to
a query so they copied Google's answer. Dude admits as much. Why should we
give Bing a pass on copying Google's results just because in this instance it
was too hard to find their own results?

~~~
brudgers
> _"What is the weight on those signals."_

Low enough that it took Google nearly three engineers per successfully
injected honeypot (7 honeypots per 20 engineers) and Google was only able to
achieve a 7% success rate despite their extensive in-house knowledge of SEO.

~~~
teach
I don't understand what you're trying to say here. The Google blog post says
that they inserted 100 honeypots into Google. Then it says "within a couple
weeks of starting this experiment, our inserted results started appearing in
Bing."

Where are you getting the 7% number?

~~~
contextfree
the more detailed searchengineland.com post mentioned that they only got 7-9
of the 100 nonsense terms they tried to inject to show up.

~~~
brudgers
Link: [http://searchengineland.com/google-bing-is-cheating-
copying-...](http://searchengineland.com/google-bing-is-cheating-copying-our-
search-results-62914)

------
trustfundbaby
> Google’s “experiment” was rigged to manipulate Bing search results through a
> type of attack also known as “click fraud.

hunh?

> So big and noticeable that we are told Google took notice and began to
> worry. Then a short time later, here come the honeypot attacks. Is the
> timing purely coincidence? Are industry discussions about search quality to
> be ignored? Is this simply a response to the fact that some people in the
> industry are beginning to ask whether Bing is as good or in some cases
> better than Google on core web relevance?

I can't believe this was written by someone with 'Vice President' in their
title ... how childish.

~~~
vash_stampy
Read google's "Bing Sting" experiment on Google's blog. Their test had no
control variable (ie: running the same test on another website that wasn't
Google). If they had done their expirement on another site and Bing results
didn't change then Google would have a very plausible case against Bing.

Google fell into the classic trap of confirmation bias with bad scientific
method. And thus the test was 'rigged', they never had a control variable.

~~~
moultano
Their assertion was not "bing is using clicks on google search and only google
search." The assertion was just "bing is using clicks on google search."
They've demonstrated that pretty conclusively.

~~~
amalcon
Bing has never denied using clicks on google search. They have, in fact,
admitted to using clicks in general before Google even began their
experiments. Given that information, why should it be surprising that Google
Search, as one of the most-clicked sites on the Internet, has a big impact on
that?

~~~
moultano
Surprising or no, it's clearly pretty controversial. If you are using click
data like this, you have to know that you'll end up essentially copying
Google. People only click on the results that are there, and Google puts them
there.

~~~
amalcon
Of course it's controversial. Someone made a deliberate decision to stir up
controversy over this. It's easy to make some controversy if you just use the
right words. Words like "Cheating" and "Copying" are great for that.

If you really want examples, just watch Fox News or MSNBC for fifteen minutes.
You'll probably see at least one or two examples in there somewhere.

------
cryptoz
> We have been clear about this for a couple of years (see Directions on
> Microsoft report, June 15, 2009).

That's about as unclear as anyone can be. Seriously Microsoft, this is _the
Web_. When you want to point your readers to a document, don't fucking cite it
like some printed academic paper! LINK TO IT.

~~~
pygy_
It was published on _Direction on Microsoft_ [1], which is supposed to be an
_"Independant Analysis of Microsoft Technology & Strategy"_.

The yearly subscription is $1500.

If [2] can be transposed, non-subscribers can also get the report for a
whopping $750.

I couldn't find that report in their archive,though. The only article
published on june 15 2009 is [3], which is unrelated.

[1] <https://www.directionsonmicrosoft.com/>

[2]
[http://www.reuters.com/article/2009/04/21/idUS223966+21-Apr-...](http://www.reuters.com/article/2009/04/21/idUS223966+21-Apr-2009+PRN20090421)
:

[3]
[http://www.directionsonmicrosoft.com/update/50-june-2009/678...](http://www.directionsonmicrosoft.com/update/50-june-2009/678-hyper-
v-r2-gains-live-migration.html)

------
InclinedPlane
Here's the problem in a nutshell:

The Bing toolbar uses clickstream data to extract information about the
relatedness of urls, just like any search engine crawler does (page rank works
this way). When a user clicks on url B from url A the Bing toolbar sends
information back to MS about the relatedness of urls A and B including all the
meta-data involved. If url A is ".../search?q=torsorophy" then Bing will make
a note of that and will start showing results for B when you search for
"torsorophy".

In principle this isn't necessarily a bad thing, it allows Bing to index sites
that it wouldn't otherwise, letting its search index grow more organically.
However, when search engines come into play things get problematic, because
now the Bing toolbar is little more than an automated method for scraping
search results piecemeal. Given that a great deal of modern web surfing falls
into the "search for X, click links for X" pattern, this should have been
something that the Bing engineers anticipated (if they didn't anticipate it
that's bad enough, if they did and ignored the problem that's much worse).

Worse yet, the Bing toolbar is effectively a search indexer which _does not
respect robots.txt_. Let that sink in for a while. Google.com/robots.txt has
this line: "Disallow: /search", and yet apparently the Bing toolbar has
absolutely no compunction about effectively ignoring that.

tl;dr: MS has created a search indexer which ignores robots.txt, this is bad.

~~~
pedrocr
Yours is the first explanation I've seen that makes any case for Microsoft not
being intentionally cheating. All the responses I've seen from Microsoft are
in the form "we're not copying Google we're just using user click information"
which is poor since what Google showed is that they're associating click
information with the search terms that were put into Google. But since those
search terms are in the referrer URL it could conceivably be an innocent
general algorithm weighing the words in the URL.

As for ignoring robots.txt, that may not be the case. Conceivably you could
get the url A->B link, save the metadata for both, signaling them as related,
and then check both URLs against robots.txt to see if you should have them in
the index. Then if url A is ".../search?q=torsorophy" Google's robots.txt
disallows it from being indexed and only url B gets in but the link to
"torsorophy" is still there from the metadata.

~~~
InclinedPlane
Indeed. Certainly it is technologically possible for clickstream based
indexing to still abide by robots.txt rules. However, the Bing toolbar does
not. That is the key issue here.

~~~
pedrocr
How do you know it does not? I'm assuming robots.txt is about preventing the
page contents from being crawled and added to the index. If all they use the
click info for is to associate referrers (google URLs in this case) to pages
in the index and they don't crawl the google search itself I don't see how
that breaks the robots.txt contract.

~~~
las3rjock
The page contents are being crawled and added to the index, but by Bing
Toolbar users, not a computer program. I consider that to be an underhanded
way to circumvent robots.txt, but others might not.

~~~
pedrocr
What makes you say that? We haven't seen anything to indicate Google search
pages are in the Bing index.

------
lysium
This is the usual FUD we are used to from Microsoft.

First just claim that the allegations are wrong (use 'Period.' and 'Full
Stop.'). Don't try to explain!

Than redirect allegations as a personal attack (say it is 'insulting'). Don't
try to explain why, either!

Refer to a document that requires paid membership, so only very few people if
any can check (don't use a direct link). Gives you cheap credibility.

Claim that the experiment is fraud, although no one benefits financially.
Don't back up the claim with data.

Counterclaim that your competitor is actually copying from you, ignoring the
difference between ideas and data.

Now that the reader envisions you as the poor underdog that is insulted, is a
fraud victim, and is copied, spread the doubt that Google is doing this
because it starts to 'worry' (pose an open question like 'is it a
coincidence'?).

End with the pose that your are not affected by this and concentrate on your
business.

Never, ever answer the allegations with a simple explanation on why Bing shows
bogus results because some Google employees search while the Bing Toolbar was
active.

{Overlook that you admitted that your are prone to click fraud.}

~~~
billybob
Thank you for summing up why posts like this one from MS make me queasy. Lots
of spin and posturing and insinuation when a clear, technical answer is
needed.

It's like, "Son, did you steal those cookies?" "Mom, Justin always accuses me
of stealing cookies, and I've been getting good grades lately, and last week
Justin hit me for no reason, and don't you think he should be in trouble too?"

Answer the question, son.

------
trunnell
I have a much lower opinion of the Bing team after reading this non-denial
denial. It's disingenuous to act surprised that using clickstream data from
your main competitor might be called copying.

~~~
hackinthebochs
And I have much lower opinion of the Google team (and they had much further to
fall). They intentionally use ambiguous and loaded words like "copying"
specifically to stir up drama. Google is smart enough to use precise words
when they want, and not to when they want. The ambiguity was intentional.

~~~
moultano
I don't think it's ambiguous. They are contending that bing is intentionally
using this to recreate results similar to google's. I'm guessing you feel this
is ambiguous because you don't think it is supported by the data, not because
they are saying anything unclear.

~~~
hackinthebochs
Characterizing it as "copying" is what's ambiguous. This can easily be seen by
reading most discussions about this. People are arguing past each other, armed
with their own personal definition of "copying". The facts are that make-
believe search results showed up on Bing after Google intentionally tried to
train Bing to pick up the association. Declaring this to be "copying" is
begging the question.

~~~
Natsu
> The facts are that make-believe search results showed up on Bing after
> Google intentionally tried to train Bing to pick up the association.
> Declaring this to be "copying" is begging the question.

Well, it was trained by having the Bing toolbar see people clicking those
results on Google's search engine.

But I do think you're right that people are talking past each other and what
outrages one person could very well be something that another simply doesn't
care about.

------
thought_alarm
This is the record:

Searching for "torsorophy" on Google brings up the Wikipedia page for
Tarsorrhaphy because of Google's advanced spelling and error correction
algorithms.

Searching for "torsorophy" on Bing will (or used to) bring up the Wikipedia
page for Tarsorrhaphy because of Google's advanced spelling and error
correction algorithms, because Bing watches how people use Google.

Two different kinds of innovation at play, one much more legitimate and
impressive than the other. But it's not like Microsoft is in unfamiliar
territory here.

~~~
kenjackson
So does "fast forier twansform" -- did Google honeypot that term too? Or the
other 20 terms I just tried with arbitrary mispellings?

~~~
true_religion
When Bing adjusts the results it shows due to user click-ranking, it does so
without any notification. That's because its a ranking issue.

When Bing adjusts the results as due to a spellcheck done by their engine, it
notifies the user so they can correct it.

On your example, Bing search does exactly this: > We're including results for
fast fourier transform. Do you want results only for fast forier twansform?

~~~
kenjackson
I completely agree, but to quote the person I was replying to: "Searching for
"torsorophy" on Google brings up the Wikipedia page for Tarsorrhaphy because
of Google's advanced spelling and error correction algorithms."

Both companies have good algorithms for spelling and error correction, but
neither appears to be a superset of the other. (For example, Bing corrects
"Kransas" to "Kansas", but Google does not).

I was addressing the fact that Bing has at its disposal some advanced
techniques too. It's not just Google that does. And I illustrated this by
pointing out that there are a lot of queries that do the spell correction on
Bing that obviously weren't honeypotted -- as seemed to be implied by the
original comment.

~~~
true_religion
The difference there is probably a result of varying internationization.

"Kransas" is Swedish for wreath, which is likely why Googles' spellcheck
doesn't automatically correct it.

Bing however won't even show me proper results for Kransas even if I tell it
"+Kransas".

One example of where they both 'fail' on an mispelling is "munday" which isn't
autocorrected to Monday because munday is a proper noun existing in the
english corpus---e.g. the City of Munday, the Munday family, etc.

------
bertil
Now that we have confirmation that Bing uses clickstream data on sites that
they do not control, I'm surprised that no one mentioned this makes them
sensitive to click-fraud.

Copying a leader isn’t as objectionable as denying it and I would argue that
Google could split their search engine into independent entities—say:
crawling, algorithm, interface. That way, competitors could try to outrun them
and innovate on one or the other, separately: Firefox, Siri and many others on
the interface; Cloudera or Amazon on the crawl, DuckDuckGo and physics labs on
the algorithm (just spitballing there). What Bing did wasn’t that different
from a grease-monkey script that would supplement bing.com results with
Google‘s, it seems — or at least, they could have had the humility to present
it that way.

Now, their pride made them confess to being easily gamed.

~~~
gojomo
Do we have any information about whether Google uses clickstream data from
users traveling between non-Google sites?

As I noted in a previous thread
(<http://news.ycombinator.com/item?id=2166256>), Googler Amit Singhal's
wording was a bit vague about whether URL trail data from things like the
Google Toolbar, Analytics, Ads, or other systems ever affects rankings. (His
careful wording, "put any results on Google’s results page", could mean merely
that URL trail data never _adds_ results to the set of all possible results.
It could still affect rankings of pages found through other crawling.)

The use of such data is clearly allowed by Google's very broad privacy policy.
I would love a clear answer from Google on this. It should be easy to give.

~~~
mhansen
<http://twitter.com/mattcutts/status/32568287384571904>

~~~
gojomo
Interesting, thanks, but doesn't address my question. To be clear, I'm
wondering: are clicktrails from Google Toolbar, Google Analytics, Ad programs,
or other sources (beyond just Google Search outclicks) ever used to help
calculate search rankings?

Robots.txt-sensitive crawling is irrelevant to this question, and whether
their toolbar tracks clicks from other search engine result pages is only
tangentially relevant, as one small example of the general idea.

And I'm not asking if they've ever done exactly the click inference Bing has
done. Rather, I wonder if they're doing vaguely analogous indirect mining of
revealed web user preferences via clicktrails. For example, noticing which
sites were visited together or in certain order even without crawler-visible
links between them. Or noticing which pages were viewed for the
longest/shortest times. Or which pages seemed to 'end' a purposeful session.
Or other deep-science stuff I can't even imagine.

I don't want them to reveal any proprietary secrets – just whether they have
ever used (or would consider it legitimate to use) all the clicktrail data
from all their many non-search tools to help with search quality.

Because I've long assumed that they do, and would be surprised if they didn't.

------
tristanperry
>> "As we have said before and again in this post, we use click stream
optionally provided by consumers in an anonymous fashion as one of 1,000
signals to try and _determine whether a site might make sense to be in our
index_."

Erm, the Google test took pretty non-sensical keywords and put at #1 some
completely unrelated website.

And then Bing copied this unrelated website result from Google.

The above quote from Bing (especially the bit in italics) is therefore a
little confusing. Since, again, the entire point here is that Bing shown
completely unrelated websites taking straight from Google. The results weren't
relevant. Thus making the quote fairly contradictory in my opinion.

Yet another 'reply' by Bing, and yet again they seem to be trying to insult
their way out of it via ad hominem and strawman arguments.

And yet again, Bing still haven't addressed the key point here: why Google
results were taken and used by Bing.

~~~
zepolen
Bing sees that I searched for a term, 'iwejhoihfe' on google and ended up
clicking a link to foobar.com/page there, this gives foobar.com/page a
relevance with the term 'iwejhoihfe'. Since the 'iwejhoihfe' doesn't show up
anywhere else on the web, that single relevant site gains a lot of weight for
that term.

This is how Bing's algorithm may have worked, and I wouldn't call it
intentionally copying, flawed maybe and easily manipulated just like Google
Bombing in the old days.

------
bhavin
One difference between Google's vs MS's approach. While Google's first post
was much more rational (with proofs) and convincing to tech minds, this post
seems more a politician's attack on Google with vague and scary sounding words
like "click fraud". Only if some evidence for the above "click fraud" were
supplied!!

------
Kylekramer
This whole debacle is dragging both companies down. Overstatement and
intentionally framing the story by Google, vague answers and changing the
topic by Microsoft.

Though in the end, Google probably gets a net win cause they got "Microsoft
copied us" headlines from some major sources with probably little follow up
from the average reader. But the whole situation could have been handled
better.

~~~
georgemcbay
I can't speak for the masses but this is a net lose for Google in my eyes
because it paints them as being extremely hypocritical IMO.

Keep in mind that we're talking about Google here. The Google of Google News,
the web-corpus assisted translation engine, the Google book scanning project,
wifi scanning, etc. Not that I'm negative on any of those projects, but based
on the amazing degree to which their business piggybacks off the work of
others it does make it difficult for me to take them seriously when they come
out swinging at somebody else for piggybacking off of them, especially when
the piggybacking going on here appears to be fairly minor and incidental.

------
lysium
The post indirectly confirms Google's claims. Otherwise, it would have been
easy for Microsoft to point out where Google is wrong and everybody would
blame Google for being stupid.

Instead, Microsoft chose to attack Google 'personally' with non-substantiated
claims.

------
naner
Just once it would be great if tech companies could lay off the hyperbole and
posturing and just give it to us straight. Bing copying Google? Not really.
Honeypot _attack_? Give me a break.

Here's what appears to be the truth:

The Bing toolbar keeps track of what people click on after doing a search.
Even when it is from sites that aren't Bing. If it is a search for something
obscure (or made up) that Bing has no other data for, then that clickstream
data can affect Bing results for that term. Most of the time, however, that is
just one of thousands of inputs. It is useful data, but not a defining part of
the Bing engine.

This was obviously a huge PR move on Google's part. The information was
released to a massive search engine blog right before a big search engine
event with both Google and Bing in attendance. Bing could have come out of
this shooting straight and looking like the more mature party, instead the
come across like they are unsure of themselves and have been backed into a
corner.

Score one for the bullshit artists at Google.

------
draz
First, I have friends working for Microsoft as well as Google. I like and hate
both companies, for different reasons. Now, to the meat of what I wanted to
get to: "We have brought a number of things to market that we are very proud
of -- our daily home page photos, infinite scroll in image search, great
travel and shopping experiences, a new and more useful visual approach to
search, and partnerships..." Seriously?? Are you going to put "daily home page
photos" as (first!) point to be proud of? Is it such a technological feat?
Kudos for the rest of the items, but I just found it funny

------
jmaygarden
The first comment to the blog post is humorous.

    
    
      Bing, "Powered by Google". It does have a nice ring to it.

~~~
deno
Another, quite tongue-in-cheek, humorous app found in comments:

<http://createasearchengine.appspot.com/>

~~~
zzleeper
That's a nice one!!! Over 1000 thousand!

------
BoppreH
"anonymous click stream data"

It can't be only that. The page where the clicks happen must be _parsed_ for
the click be linked to the query and reused at their search engine.

They transformed: "from google.com/q=X, people tend to click on example.com"

Into: "when searching for X, people tend to click on example.com"

~~~
andyv
They not only parsed the search queries, they also copied Google's results and
added it to their own results.

~~~
BoppreH
Monitoring the clicks and parsing the data should be enough to account for the
results we see.

------
gvb
One really interesting thing about Bing watching google searches and changing
their page ranking based on users clicking links... it implies one way to
"SEO" (game) Bing page ranking would be to use Amazon Mechanical Turk to raise
a page's rank on Bing by having lots of Mechanical Turks doing google searches
and click-throughs on the target page.

It would be interesting experiment to estimate how heavily weighted the
clicking is vs. the other 999 inputs. I would think a very obscure page would
be fairly easily manipulated (Google basically showed that with their
"honeypot"), but a more popular page with lots of non-zero entries in the
other 999 inputs would be a lot harder to game.

------
brudgers
Whether it was _a type of attack also known as “click fraud,”_ or not, it does
appear to be an attack. One can argue that the attack was justified - and one
can argue that it's not - but it still was a deliberate effort to improve page
rank in a way similar to one which Google actively discourages.

The fact that only 7/100 of the honeypots were successful may mean that
Google's motives may have been more complex and slightly less akin to
righteous indignation - such as determining how and to what degree Bing
handles "click fraud" type attacks. I just have a hard time believing that
Google would turn 20 engineers loose on a task that can be so easily
automated.

~~~
rflrob
>Whether it was a type of attack also known as “click fraud,” or not, it does
appear to be an attack.

I disagree with describing this as an attack. Based on the kinds of search
terms Google has said they used, they aren't attempting to make the Bing
results any worse. There is simply no good search result for "hiybbprqag", so
Google pointing to the Wiltern Seating Chart is just as good as pointing to
nothing at all from the purposes of what any user who happened to Google
hiybbprqag randomly would see.

At best, you could make the argument that by revealing that Bing is
susceptible to these kinds of attacks, they are making it easier for others to
attack Bing. It's certainly illegal to rob a bank, but what about walking down
the street yelling, "The bank guards are all gone! The combination to the
vault is 1-2-3-4-5!"

~~~
brudgers
_"what any user who happened to Google hiybbprqag randomly"_

But it wasn't any user, it was a Google engineer and the circumstances were
not ordinary. Those engineers were actively trying to get the results into
Bing over the course of at least two weeks 12/17-12/31 according to
searchenginland.com.

~~~
rflrob
So Google, by doing this, has created up to a hundred or so bogus entries in
the Bing database, to which I say, "big whoop". If Google had to pay damages
for the extra infrastructure that Microsoft requires, the stamp to mail the
check would be more than the check itself.

What I would classify as an attack is if, through a similar process, Google
introduced bogus entries for _real_ search terms, which they don't claim to
have done.

~~~
brudgers
> _"which they don't claim to have done."_

IANAL, but if they have attacked Microsoft, that is probably a prudent legal
strategy. Keep in mind that Google claims to have discovered this episode by
noticing similarities between their results and Bing's which although it
suggests and active program to monitor Bing, is hardly surprising.

On the other hand, however plausible it appears their claim about how they
discovered it smells a bit of BS. It strains belief that Google never looked
at the packets the Bing toolbar was sending home during browser compatibility
testing. They lost their virginity a long time ago.

------
thankuz
Not only that, there's too many hypothetical, open ended questions - they
don't really address anything. Instead they leave it to the reader to imagine.
Not impressed.

------
dooq
Bing is admitting their results are affected by click fraud then??

~~~
pessimist
Not quite - they are admitting that click fraud on google.com affects their
results!

~~~
beej71
And those of us with our own DNS servers know exactly where google.com is.

I wonder how smart the Bing Toolbar really is.

~~~
Natsu
You probably don't need a DNS server. /etc/hosts should be plenty. Maybe
they're smarter than that, and I think they are if you try to mask Microsoft's
own domains, but I doubt they check to see if Google.com has been changed,
though that could certainly change.

------
425
To me, it seems quite simple. Someone at Google used Bing toolbar and IE8 in
following way:

a) Search from Bing toolbar for hiybbprqag.

b) Bing does not display anything.

c) Bing toolbar remembers that term internally.

d) Go to www.google.com and type hiybbprqag.

e) Google displays one intentionally seeded link.

f) Google engineer click on that link.

g) Bing toolbar notices that shortly after user searched for hiybbprqag, s/he
clicked on seeded link.

h) Bing toolbar sends that piece of that to mothership: There is relationship
between hiybbprqag and seeded link.

...

i) After some time, Google engineer searches for hiybbprqag from Bing toolbar.

j) Bing looks up its index and there only one piece of evidence regarding term
'hiybbprqag'. It is not much, but it is all it has, so it presents it to the
user.

Google accusations strongly imply (using words like 'stealing') that Bing
simply scraps google.com for results, while reality is not so simple.

Now, bigger Microsoft problem is that it employs VPs like Yusuf, who cannot
express simple facts and easily fall into corporate speak.

UPDATE: Here is good summary of what I wanted to say:
<http://directmatchmedia.com/google-proves-bing.php>

------
ecaron
Article appears to be down, fortunately you can still access it from Google's
cache:
[http://webcache.googleusercontent.com/search?q=cache:www.bin...](http://webcache.googleusercontent.com/search?q=cache:www.bing.com/community/site_blogs/b/search/archive/2011/02/02/setting-
the-record-straight.aspx)

------
blauwbilgorgel
From a logical debating point of view, this post is really weak. If I were an
engineer at Google, and I caught Bing at this, I would be upset, to the point
I couldn't sleep. This is one of the lowest things you can do in academia, and
even though this is business, the accomplishments of the engineering team at
Google is at the highest point of information technology. As a businessman I
could have taken this, not as a hacker/problem solver.

"It was interesting to watch the level of protest and feigned outrage from
Google. One wonders what brought them to a place where they would level these
kinds of accusations." Feigned outrage... one wonders... If you really not
understand this, you are not a hacker, you are not an engineer. Maybe it was
your actions that prompted two valued Google engineers (Singhal and Cutts) to
make these accusations?

"Before we explore that (so you gonna prove this point later on?), let me
clear up a few things once and for all.

We do not copy results from any of our competitors. Period. Full stop." The
end result is a copy of the results of a competitor. If you borrow it, steal
it from your users, copy it from a guy in a raincoat with binoculars,
observing Google results and jotting them down -- It doesn't matter! OK, let's
say you don't copy results. Your index still contains copied results _for a
fact_.

"We have some of the best minds in the world at work on search quality and
relevance, and for a competitor to accuse any one of these people of such
activity is just insulting." Bing copied Google's fake results. Your engineers
either deliberately or accidentally made Bing copy those results. If you want
less insulting: Google claims your minds are the best at copying others.

" We do look at anonymous click stream data as one of more than a thousand
inputs into our ranking algorithm. We learn from our customers as they
traverse the web, a common practice in helping to improve a wide array of
online services. We have been clear about this for a couple of years (see
Directions on Microsoft report, June 15, 2009). " That will make me trust your
index a lot less. If engineers clicking manually on some results can make a
difference in ranking, imagine what a botnet or dedicated spammer can do.

" Google engaged in a “honeypot” attack to trick Bing. In simple terms,
Google’s “experiment” was rigged to manipulate Bing search results through a
type of attack also known as “click fraud.” That’s right, the same type of
attack employed by spammers on the web to trick consumers and produce bogus
search results. What does all this cloak and dagger click fraud prove? Nothing
anyone in the industry doesn’t already know. As we have said before and again
in this post, we use click stream optionally provided by consumers in an
anonymous fashion as one of 1,000 signals to try and determine whether a site
might make sense to be in our index. " Ad hominems: Honeypot, trick,
quotes:experiment, rigged, manipulate, attack, click fraud, attack, spammers,
trick consumers, bogus, cloak dagger, click fraud, prove? nothing. If you need
those words in a paragraph to describe how you got caught with your hand in
the cookie jar, then you already lost.

" Now let’s move the conversation to what might really be going on behind the
scenes. " Yes, move the conversation away from your duplicate results, the key
issue at hand.

" Bing was launched nearly two years ago .to break new ground and help move
the search industry in new directions. We have brought a number of things to
market that we are very proud of -- our daily home page photos, infinite
scroll in image search, great travel and shopping experiences, a new and more
useful visual approach to search, and partnerships with key leaders like
Facebook and Twitter. If you are keeping tabs, you will notice Google has
“copied” a few of these. Whether they have done it well we leave to customers.
But more importantly, we take no issue and are glad we could help move the
industry to adopt some good ideas. " _copies marketing paragraph_. Rebuttal:
Google didn't use its toolbar to "copy" your pictures and add those to their
own background.

" At the same time, we have been making steady, quiet progress on core search
relevance. In October 2010 we released a series of big, noticeable
improvements to Bing’s relevance. So big and noticeable that we are told
Google took notice and began to worry. Then a short time later, here come the
honeypot attacks. Is the timing purely coincidence? Are industry discussions
about search quality to be ignored? Is this simply a response to the fact that
some people in the industry are beginning to ask whether Bing is as good or in
some cases better than Google on core web relevance? " Bing is absolutely
horrible for international searches. You serve up 40 different language
Wikipedia pages for some terms. But you are saying, that Google deliberately
timed your outings? That is like saying your brother saw you stealing cookies,
but waited till he got a bad report card, before telling mom. And I thought a
bruised academic ego was childish. So your rebutal amounts to: But mom! He got
a bad report card! And that while you are struggling making your grades
yourself.

" Clearly that’s a question that will continue in heated debate as long as
there is a search industry. Here at Bing we will continue to focus on our
customers, and try to provide some great innovation for consumers and the
industry. " That's not the question and certainly not core to this case, which
is your duplicate index. This is unprecedented in 10 years of search engine
competition, and you want to make it about the timing? Let's make it about the
1000 ranking factors vs. the 200 ranking factor from Google. Lets make it
about the million other results this could be happening for, but are
impossible to prove with a "honeypot attack" and specific keyword? Lets make
it about search relevance, how we are spoiled with relevant results, yet
complain about the most relevant index on the web, one you use to calibrate
long tail terms and spelling corrections?

~~~
hackinthebochs
>The end result is a copy of the results of a competitor. If you borrow it,
steal it from your users, ...Your index still contains copied results for a
fact.

This is pure nonsense, and so is the rest of your post. [edit: The business
of] search is not about academia, or algorithms, or engineering, its about
providing users with relevant responses to their queries. The best way
currently to do that is through crowdsourcing--which is exactly what Google
does through Pagerank. Now Bing took that a step further and creates an
association between what a user searches for and what they end up finding
relevant: This is _exactly_ what we currently call "search". Whether that
initial directory of sites came from Google or from some hand culled list is
inconsequential. This is not "copying", this is doing exactly what they should
be doing.

The point is Google doesn't own the link, or the form input, or the fact that
many users clicked on that link after posting that term. The only thing they
could be reasonably construed to "own" are the relative rankings themselves.
Bing did not "copy" this.

~~~
moultano
>Bing did not "copy" this.

People only click on the results that are there. Google put them there. People
tend to click on results in roughly the order they appear on the page, which
Google also determined. Getting the data through an indirect means does not
insulate them from culpability.

~~~
hackinthebochs
In my opinion it does. The whole purpose of this technique is to get better
results than just an algorithmically generated list. If people are simply
clicking on the first result, then it's not accomplishing what they want.
Having a mirror image of Google's results is _not_ what they want (otherwise
why would anyone switch?). We all know how gamed the first few results on
Google are these days anyways.

~~~
moultano
> _otherwise why would anyone switch?_

Marketing and flashy features, two areas where bing has been investing a lot
of money.

~~~
hackinthebochs
The value of a search engine isn't any particular result, or any set of
results. Its the quality of _all_ the results over time. If Microsoft's
algorithm picks up a _tiny_ amount of signal (ahem 1 of a 1000) indirectly
from Google's results, this does nothing to artificially inflate their
position off of Google's back. There's nothing inherently wrong about using
user signal for this.

There are many sites on the internet that generate a set of links based on
form data. Google is one of many in that respect. This technique is effective
in gathering search information on this "deep web". Special-casing Google
positively or negatively is the wrong approach here.

~~~
stanleydrew
The problem is that the small amount of signal that Bing picks up from Google
carries more weight for the more rare associations that Google has worked so
hard to help users find. Spelling mistakes stand out here quite a bit.

~~~
hackinthebochs
I don't really see this as a problem. This is in fact what search is all
about. Some human makes an association between A and B, and a major search
engine picks up on that. Google has algorithms for picking up on these
associations, and so does Bing. It just so happens that some of those
associations have passed through Google's servers before reaching Bing. The
point is that Google is not _creating_ these associations. They're
algorithmically picking them up from others, just as Bing is doing.

Btw good job with the disagree downvotes guys.

------
contextfree
Apart from his flustered tone, Mehdi said nothing that Harry Shum didn't say
in his post yesterday. I don't know why he bothered posting this.

------
bluesnowmonkey
Why are people suddenly using the word "honeypot"? A honeypot is something
intended to be broken into. Nobody broke into anything in this story. _There
were no honeypots involved._ You might call the search terms "tracers" or
something.

Another thing that did not occur is click fraud. That term already means
something, it involves pay-per-click advertising, and it doesn't have much to
do with Google's experiment.

Don't let this guy distort the language we use to discuss the issue.

------
bmccormack
Regardless of whether or not one agrees with Yusuf Mehdi's conclusions, I can
certainly understand his passion in defending his team. Google's accusations
yesterday _were_ very accusatory and polemic (e.g. "We look forward to
competing with genuinely new search algorithms out there—algorithms built on
core innovation, and not on recycled search results from a competitor. "
ouch).

------
sonoffett
I mentioned this in another article, but I have yet to see evidence that
demonstrates that bing is parsing google's queries either from a) the url of
the referrer, or b) directly on the google results page, yet many people on HN
have asserted as such, generally exhibiting a false dilemma fallacy.

An alternative is that the bing toolbar is collecting 2-tuples
"<search_string_in_toolbar, next_href_clicked>" and sending these back to
microsoft (regardless of the search provider selected). I would consider this
"click stream data" and seems to agree with statements in the above article.
Additionally, since it doesn't involve parsing hrefs it seems like the easier
solution (and the one I'm going to _tentatively_ assume by Occam's razor).

Just to be clear, it does _not_ have to be the case that the bing toolbar is
collecting data entered directly into the google search box. Indeed, google
could have tested for this by having a control group where they entered search
queries without using the toolbar search box, however from the details
released so far we cannot conclude they did this (read: google hasn't released
enough specifics about the test they conducted, and it's far from being
reproducible with the current details).

Additionally, why do many in the HN community think this google specific? I
understand that google isn't claiming "bing copied _just_ google" but that
seems to be the consensus within HN and the arguments of foul play (see the
countless posts asking if there exists code that specifies google: "if string
contains 'google' then..."). I'd like to see a test where users entered a
pathological string in the toolbar with an alternative search provider
specified (EDIT: not bing or google), clicked on a low ranking result (low
ranking, or not appearing at all on bing for those keywords), and see if it
pops up higher on bing at a later time.

------
fady
after reading the first few paragraphs, i closed the tab.. a waste of space.
all PR crap.

------
lazugod
It isn't clear whether Bing was taking special interest in click data off of
Google searches, or giving it special weight in its own results.

Google seemed to think so, but even if Microsoft was treating the data they
received as neutrally as possible, it doesn't change the request made of them.
Google wants an exemption.

------
yalogin
Going by what MS says, its clear why Google's experiment worked. They came up
with very unusual queries for which by definition there will be no data to
generate search results. So out of the 1000 inputs only the one input from
Google is present and hence the output.

~~~
blauwbilgorgel
Yes, but then this also means that for _every_ term in Bing's index, "how it
ranks on Google" is 1 of the 1000 factors. It takes a very obscure term to
show it, but we shouldn't conclude from that, it happens _only_ for obscure
terms.

BING!! uses GoogleRank as a ranking factor. Perhaps they even adjust the
weights of this factor, according to what the press/blogosphere is writing
about Google's index quality...

------
nigelsampson
What I'm interested in, does the Google Toolbar send back the click stream
when you use Bing?

~~~
noibl
Feel like experimenting?

------
bluelu
All I can say is, that google engineers must be really jealous for not coming
up with the same idea of ranking sites based on what people enter into search
boxes and what links they click on. Sounds like Pagerank 2.0 to me.

~~~
slig
Yes, but from what other relevant search engine would them copy that?

ps: FYI, Google does record the clickstream within their own search results.

------
jpwagner
Great response. I wouldn't have used the term "click fraud," but the analogy
is easy to grasp (ie, that the "test" purposely attempts to strengthen an
association between things that are not associated.)

~~~
lysium
How is that a great response? It does not address the allegations at all, just
claims the allegations are insulting.

The only great thing I see is that the blog post actually confirms what Google
said. Because otherwise, it would have been easy for Microsoft to put the
finger at the point where Google is wrong.

~~~
jpwagner
I think this response is convincing. Period. Full stop.

~~~
jongraehl
Why would anyone upvote a completely ambiguous comment (what is "this
response")? You thought the ambiguity was intentional?

~~~
jpwagner
Were you also confused by the previous comment when it asked "How is that a
great response?"

~~~
memetichazard
"How is that" clearly refers to Bing's response. I'll admit that I interpreted
your post "I think this response is convincing" as referring to lysium's
response to your post. It may have been clearer if you had written 'the
response', or more directly, 'Microsoft/Bing's response'.

~~~
lysium
Maybe jpwagner was just joking from the very beginning and we didn't get it?

------
hexis
Bing seems to want me to sign in to read this. Am I doing something wrong?

------
skarayan
It would be interesting to compare the search results of a million or so
keywords between Google and Bing using their APIs. I would like to see some
numbers behind the overlap.

------
motters
This doesn't seem like a terribly convincing response from Bing. The
"honeypot" test seems like a pretty conclusive result to me.

------
jeisc
'Imitation is the sincerest form of flattery.' But, 'Flattery is the worst and
falsest way of showing our esteem.' Jonathan Swift

------
ItsBilly
The article is titled "Setting the record straight", but it is not fair to say
that "Bing sets the record straight"

------
thenonhacker
Google, Google. Stop crying. Just continue copying Bing. Infinite scroll
images. And large image backgrounds on the search page. Copy the Maps
features, too.

~~~
Groxx
I boggle that people think infinite scroll came from (X). People have been
hacking infinite scroll onto pages that don't support it for _years_ ,
frequently through Greasemonkey scripts and there are a few very powerful
extensions for essentially every browser out there. If you want to claim
Google copied it from Bing, Bing copied it from creative users. I highly doubt
there has been _any_ significant amount (if any _at all_ ) of UI/UX original-
creativity on _any_ large website before someone else made a quick demo on
their personal site or hacked it on top of existing sites through browsing
tools.

