
Bing: Why Google’s Wrong In Its Accusations - HardyLeung
http://searchengineland.com/bing-why-googles-wrong-in-its-accusations-63279
======
ChuckMcM
Given that Bing clearly uses the clickstream to influence its results pages,
at what point to SEO folks run farms of machines in a network which resolve
'google.com' to a process that returns a fixed set of results for a given
query?

Imagine this scenario, I create 10,000 VM instances of windows running IE8
with the Bing toolbar. I create a local host to 'stand in' for Google such
that it emulates the actual Google site (one could even used scraped Google
content) and for my 'target' query it returns my spammy results, and then my
VM machine clicks on one of the spammy links.

It seems that Google's sting worked because their queries had small return
rates, but with some resources it would seem a viable way to inject SEO love
right into the bloodstream of Bing's ranking algorithm. As I see it, a whole
new front just opened up for spammers.

\--Chuck

~~~
encoderer
This is like gaming Google Suggest. You're going to need a lot of IP's because
they're not idiots in Redmond OR Mountain View and it's pretty simple to
detect behavior like that coming from a single IP.

If the spammer gets 10,000 IPs then more power to him. But there's probably
more cost effective ways of black-hat SEO.

~~~
Natsu
The spammers already control botnets of that size. They could duplicate the
clickstream data with a few lines of code (it's a simple HTTP request feeding
Bing the clicks; the only sticky parts are deciding which Bing IP to send them
to and harvesting the per-computer ID it sends to Bing).

In short, this gives them a new service to sell when they rent out their
existing botnet.

~~~
encoderer
Then by your logic, they're already gaming Google Suggest. And with Instant
Search, that's the holy grail. Who needs to optimize placement on a given
keyword if you can just nudge users towards your desired keyword.

Or, what's more likely, is that the respective companies are aware of this
very obvious vector and have attended to it.

~~~
Natsu
What makes you assume that no one is attempting that?

Botnets are run off of the computers of ordinary, clueless folks, who might be
real Bing users submitting real data in addition to whatever the botnet sends.

I've already linked to an analysis of the actual protocol data submitted by
the toolbar and I can see obvious ways to copy and fake it.

If you use only the computers in your set that already have Bing, copy that
unique ID and figure out what IP they're sending it to, your data will be
identical to that sent by a real user.

At that point, you have to harden the protocol and hope it stands up to
reverse engineering, or start spam filtering it (if they weren't already).
Maybe they can do a good job of that, but it really lowers the quality of the
data they're getting once enough people are feeding them garbage.

SEO types already set up thousands of spam websites to game PageRank. I don't
see why this would be any different.

~~~
ChuckMcM
I suspect that they are, and when you can simulate a million users doing what
ever activity you want them to be doing then the impact on Google and lots of
stuff will change.

Of course my favorite was the Amazon hack of repeatedly putting some item in a
shopping cart and then adding in a bit of pr0n (or some other weird
combination) so that Amazon would put the other item in their auto suggest
product combination.

------
jdp23
Very clear writeup. I thought this was an excellent point:

"PR is not leading this dispute. It’s following behind. This dispute is
happening because real engineers at Google felt there was a deep injustice
going on — as reflected in the quote from Google’s Amit Singhal in my original
article. I’ve known Singhal for years. I’ve never seen him speak like this
before. It’s not because Google PR told him to. It’s because he’s
fundamentally bothered by what he’s seen — as are members of his team.

This dispute is also happening because real engineers at Bing feel there’s a
deep injustice going on — as reflected in the quote from Harry Shum above.
Bing’s worked incredibly hard to build a search engine that’s worthy of
respect. Now here’s Google suggesting that Bing has simply cheated its way to
relevancy."

The conversation with moultano on a thread a couple days ago was a good
example of this. <http://news.ycombinator.com/item?id=2177354>

------
flatline
From this article:

"We’re not going to stop using that signal, unless it messes up relevancy. It
doesn’t make sense to exclude that large amount of traffic from our usage
set," Weitz said.

From the TechCrunch article a few days back [1]:

"Google had employees log onto ms customer feedback system and send results to
Microsoft."

(to which Matt Cutts replied: normal people call that "IE8")

Unlike many others, I do not think this is a cut-and-dry issue, but the
squirrelly responses from the MS folks on this have really made me think they
are just up to no good, true to the form of so much of their corporate
history. Arguing with vague technical terms and ad-hominem attacks are not a
good way to convince a highly technical crowd of your virtues.

[1] <http://techcrunch.com/2011/02/01/bing-google-fight/>

~~~
vash_stampy
From the article: "Meanwhile, I’m on my third day of waiting to hear back from
Google about just what exactly it does with its own toolbar. Now that the
company has fired off accusations against Bing about data collection, Google
loses the right to stay as tight-lipped as it has been in the past about how
the toolbar may be used in search results."

And Google's sudden silence you don't find suspicious. This is the same
company that invited the author of the original article to their headquarters
the day after he wrote it.

~~~
varjag
You suggest Google Toolbar collects Bing search results?

~~~
vash_stampy
Inadvertently like the Bing toolbar. It has been shown to log every pageview
like Bing toolbar. Whether they filter out the pageviews when a user goes to
Bing no one knows besides Google. (ie: The article below shows they definetly
do not filter it out at the application level as a Yahoo query is sent back by
the Google Toolbar and Yahoo now runs on Bing).

<http://www.benedelman.org/news/012610-1.html>

~~~
Herring
Well MS now knows how to perform this sting, so I'm sure we'll hear if their
results show up on google. Until then, you have zero evidence.

~~~
vash_stampy
Sigh....Two things: 1) Performing this sting and it working or failing
wouldn't prove if they did it in the past...The cat's out of the bag. Bing or
Google could have easily changed their systems. (and I'm sure the way the two
incorporate clickstreams into their search engine is VASTLY different)

2) More importantly, why would Microsoft need to prove google is also using
clickstreams? Microsoft does not believe using clickstreams is wrong.

~~~
zacharypinter
> 2) More importantly, why would Microsoft need to prove google is also using
> clickstreams? Microsoft does not believe using clickstreams is wrong.

If they can make Google a hypocrite, the issue goes away. that's motivation
enough for Bing to investigate the Google side (though I doubt they'd find
much).

------
dave777
I would wager most of the people commenting don't work in search and how to
use datamining to improve search. I think it is useful to see the perspective
of someone that actually thinks long and hard about building these types of
systems before commenting on whether it is fair or innovative or just plain
stealing. <http://hunch.net/?p=1660>

------
joblessjunkie
So Microsoft's defense is that they are not copying Google in particular --
they are copying _every_ search engine?

~~~
gniv
Exactly. <http://www.bonkersworld.net/2011/02/04/copying/>

------
moultano
So it somehow eluded them that most of the searches done anywhere are done on
Google? I don't buy it. If you have a "search signal" it's going to be
effectively a "Google signal," and they aren't being honest if they contend
otherwise.

~~~
nollidge
They didn't say "web search", they said "search". Nearly every website has a
search box. Most of my searches in any given day are not on Google - they're
on Amazon, Wikipedia, Netflix, StackOverflow, Facebook, etc. Bing's toolbar is
watching all of those.

~~~
rbanffy
Most of the searches I do are on the firefox search field. If I want to search
within a specific site, I use the "site:" parameter to restrict results by
domain.

~~~
endtime
The number of people who use that feature is not of statistical significance.

~~~
nowarninglabel
And your data showing is....where?

I can of course say, I use this feature, and of course point to others who do,
but I don't have data to say it is significant. However, you have stipulated
it is not, but have not provided any data to prove it.

~~~
endtime
I don't have any data showing that, but come on. Think about alllll the people
out there who log into Facebook by Googling "www.facebook.com". That's your
average internet user, not you and me.

~~~
nowarninglabel
I'm not sure how much you interact with the "average internet user", but when
I do help desk roles, as well as my role working for a floating university, I
interact with hundreds of non-technical end users (Given my job history, I've
interacted with thousands over the years). What I can tell you is that basing
any theory about statistical anomalies on your preconceived notions without
any hard data to back it up is usually an exercise in failure.

~~~
endtime
And how many of the people you talk to do you think use the site: operator?

------
rwaliany
I worked on the Bing AI team (pre-launch), they didn't copy Google. If
anything, both search engines copy wikipedia. I believe that this was the
click-stream research done by an intern researcher from UBC.

------
zzleeper
I think there is a DEEP flaw in this argument:

"Here’s another one. This time, it’s a misspelling of “bombilate,” a rare word
I cited above. I searched for “bombilete,” instead.."

In essence, they say that Google only pointed to the typo, but Bing redirected
to it. Thus, "it’s very unlikely it figured this out from Google". For me,
making that argument is insulting to the readers intelligence.

~~~
jeroen
You missed part of the argument:

"But it’s very unlikely it figured this out from Google, given that for the
misspelling, Google doesn’t auto-correct the word nor provide the same
answer."

Bing's nr 1 result is not in Google's results at all.

~~~
moultano
Google has the spell correction. Microsoft could certainly be harvesting that
data directly. (Which appears to be what they are doing on that query.)

~~~
indyank
Very true...they might be doing this as part of their "cost cutting"
exercises.Hasheem had mentioned this and I am not sure what made him to
mention "cost cutting" in that panel discussion where Matt and Blekko CEO
participated.

------
vash_stampy
Round and round the speculation wheel goes! Where it stops, no one knows!

------
grhino
Using what people click on from search results from competing search engines
sounds like copying the competition to me.

It sounds like they automated the process of piggy backing off the work of
other search engines, not just Google.

They should exclude all competing search engines from this process.

------
muro
From the article (about spell correction):

> Well, above is the same situation where Bing gets a misspelled word right —
> a link to a definition of the correctly spelled word at the top of the list.
> But it’s very unlikely it figured this out from Google, given that for the
> misspelling, Google doesn’t auto-correct the word nor provide the same
> answer.

Perhaps because people first click on the spell correction and then on the
result - so maybe they don't yet copy also the spell correction, only the
results. I think this is even stronger that they copy the results, not weaker.

------
gojomo
Doesn't this passage suggest that Google ignores robots.txt in its own cross-
comparisons of search relevancy:

 _Google said in October that it found statistical evidence that Bing suddenly
became more Google-like. More listings in the first page of results of both
search engines seemed to match, as did more of the number one results._

How would you get statistically significant results for such things, over
time, without constant automated probe queries against Bing?

I think such probes are both legal and wise... but Google should drop the
pretense that robots.txt is a sacred barrier across which no analysis can be
done, no matter how indirect or for what purpose.

Also, I'd wager at some time in its history – if not constantly even today –
Google has shown panels of users results from Google and its competitors in
various combinations – side-by-side, with and without branding, intermixed
randomly – and used their reactions to detect areas where the competitors are
doing well, and Google could improve.

Further, either human eyes or algorithms then tried to determine adjustments
to close any gaps in user satisfaction. The net effect of any such process is
– surprise, surprise! – leveraging strengths of _other_ engines to patch
weaknesses in Google. This is normal, expected behavior by any serious search
competitor.

~~~
Jabbles
I think there is a major difference between identifying areas for improvement
and copying a specific search result from a competitor.

~~~
gojomo
I don't see it as that different. If the panel suggests a competitor's top-10
result is strongly preferred, and then Google examines why they didn't have
that in their top-10, and tinkers with their weightings until they do, then
they've used a competitor's output signals to train their own systems.

~~~
moultano
>I don't see it as that different.

It is _very very_ different. You can't tinker with the weighting of something
that doesn't exist, and if you don't have the necessary data, no amount of
reweighting is going to improve things. Microsoft in this case no longer needs
to come up with any data of their own, they can just use clicks on Google as a
proxy for any combination of signals.

~~~
gojomo
Does Google use clicks from anywhere on the web other than their own sites as
a ranking signal?

~~~
moultano
I don't have an exhaustive knowledge of everything we do, and I couldn't tell
you if I did.

Amit has made it _very_ clear however, that we will _never_ do anything that
would cause us to directly or indirectly copy a competitor's search results.

~~~
gojomo
It's clear to me Google wouldn't intentionally do something identical to what
Bing's done.

But given what Googlers can't say, and don't even know about what other groups
within Google are doing, it's not ' _very_ clear' to me that you aren't
already doing _very very_ analogous things, with regard to every other site on
the internet.

You've got the data; you're allowed to use it by your privacy policy; you've
got the rationalizations handy. ("Sites didn't block us; fully-informed users
opted-in; this is a crucial way to fight the manipulators; it's only helping
us weight things we already found by other means; etc.")

Amit's not made anything clear to me, with his finessed "put any results"
wording. Danny Sullivan picked this up too, as he remarks in the headlined
article:

 _Google’s initial denial that it has never used toolbar data “to put any
results on Google’s results pages” immediately took a blow given that site
speed measurements done by the toolbar DO play a role in this. So what else
might the toolbar do?_

There's wiggle room in the definitions of 'copy' and 'competitor' in your
'never' promise, too. Is it OK if Google Toolbar data hoovers up implied
editorial-quality signals from user navigation on every site that isn't a
'competitor'? (And given Google's size, what site isn't a competitor in some
respects for audience share?) Is it 'copying' if your use of clicktrails makes
a preexisting result move from #11 to #9 after you observe it satisfying
people in other browsing sessions? Move from #99 to #2?

(Has the effect of any of Google's competitive analysis ever resulted in a
single result moving closer to the position, higher or lower, that it had at a
studied competitor? Some people could call that 'copying'.)

Maybe none of the clickstream sources Google uses stick out as a dominating
factor because Google has so effectively "commoditized its complements" – and
no one other entity (except maybe Facebook) has access to as much clickstream
data as Google does, simply from its own sites.

Given that, it seems a little convenient that Google's standard is "every
aggressively creative use of behavioral trails that led up to our 70%-90%
share dominance was OK, but from now on let's be really rigorous about letting
others observe our info-effluents."

~~~
moultano
For more official policy you'd have to ask either Amit or Matt. I can't speak
for the company here.

Speaking only for myself and only on the ethics, I generally feel that any
site that allows itself to be indexed is pretty happy with Google (or bing)
doing whatever they can to rank it better. Even with the link data that sites
provide, you can add rel=nofollow to the links if you don't want search
engines to use them but still want your pages indexed (yelp has done this for
instance.)

For me that's the ethical boundary. Sites have various ways of indicating
their wishes, and that ought to be respected in spirit beyond the technical
details.

Legally, the technologies that make the internet work all rely on the idea of
fair use, so it is very important whether something is "fair."

~~~
gojomo
I've seen no statement that Google throws out Toolbar (and other) clickstream
data for sites/pages that Googlebot can't visit (which includes not just
robots-precluded but also login-required pages). Not that I think you should
throw such data out; that's not what robots.txt was meant for, and the user
arguably has more claim to that interaction trail than the site. But that
seems the standard you're suggesting.

If Google doesn't want IE features or the Bing Toolbar observing its site
interactions, it can disallow such visitors. A steep price to pay, at too
coarse a level of control? Yes, just like a site deciding to bar Googlebot.

I would agree that a 'fair use'-like analysis makes sense.

I would further agree that any site solely, or predominantly, powered by
indirect observations of Google users would be an unfair taking. You'd crush
such a site in court.

Meanwhile, a site that tallies Google referrer inclicks for itself, or for a
network of participating sites (as with analytics inserts), even republishing
summaries of Google source URLs and search terms as public data, is almost
certainly fair use. It's taking data you're dropping freely onto third-party
site logs, and making a transformative report of it.

What Bing is doing seems to me somewhere in-between. The mechanism avoids
literal copying of specific artifacts but the net effect in some cases
approaches the same result. As with other 'fair use' analysis, it's rarely
black-and-white. The magnitude of the information used, its effects on the
market, and the value-added transformation afterward are all important. I
don't know how a court would rule in such a suit but the discovery process
would surely be fun for spectators like myself!

------
joelhaus
I found this very strange... Did Bing increase clickstream data usage in their
algorithm when Google saw Bing results get more Google-like in October of 2010
or not?

 _Shrum:_

    
    
      Not so, Bing told me. In October, Bing says it rolled out a new 
      ranking algorithm plus a new experimental system called “Aether” 
      that allows them to test changes in their ranking methodology. 
      That’s what caused the bump that Google saw, not some sudden use 
      of the surfstream, Bing said.
    

_Web (during October 2010):_

[http://www.google.com/search?q=Bing+new+ranking+algorithm...](http://www.google.com/search?q=Bing+new+ranking+algorithm&&tbs=cdr:1,cd_min:10/1/2010,cd_max:10/31/2010)

------
kenjackson
Good article (a surprisingly heartfelt article -- odd for a tech blog). Key
paragraphs: _Bing says it does NOT do this. It says there is no Google
specific search signal that it being used, no list of all the popular pages as
selected just by Google users. Instead, it has a 'search signal' based on
searching activity observed across a range of sites.

For example, if you did a search on Amazon, Bing might detect that. A search
on eBay might get spotted. A search on Yahoo, that also might get extracted.
Any number of searches might be identified. Bing would associate the next page
you went to after doing those searches as being a possible 'answer' to those
searches._

~~~
o_nate
I think comparisons to Amazon or eBay search are misleading. Amazon only
indexes and searches its own site - the same goes for eBay. So of course these
companies wouldn't mind if Bing copied some of their search results - after
all, this would only end up driving more traffic to their sites - since their
search results link back to their own sites.

It's a much different case when Microsoft is copying search results from
Google or another whole-web search site. There Microsoft is directly competing
with their product, and the bottom-line is that it's fundamentally fishy for
Microsoft to be using their own search results against them.

~~~
encoderer
Copying search results. I love the notion of that.

Somewhere in the world at this moment there's a person with the Bing Toolbar
installed searching for something using Google. One of his results is to a
website, for the sake of narrative lets suppose it's one of our own: A YC
startup.

The searcher clicks to the YC startup, it's exactly what he's looking for, he
converts.

Now, another searcher, using Bing by choice, searches that same term. That YC
startup is in the results. The click is made, another conversion happens.

The first user consented to his click analysis by installing the toolbar.

The startup will surely agree that yes, we are a very good result for that
term! We should show up on Bing, DDG, Google, Whatever! And we don't care how
we get there!

The second user gets a result that's maybe ranked higher because of the first
users click was noticed by Bing.

Google doesn't lose a customer because the second searcher was already on Bing
to begin with.

Bing does nothing but analyze their own users' behavior (users of their
toolbar) to deliver better results.

I find controversy over this beyond absurd.

~~~
joesb
> The first user consented to his click analysis by installing the toolbar.

The toolbar says nothing about taking his data to improve Bing, only for Site-
Suggestion which, to average user, is clearly a browser feature.

> Google doesn't lose a customer because the second searcher was already on
> Bing to begin with.

What about one who could have converted to Google if he search on Bing and
didn't find a result?

Google couldn't lose a customer but there's no way Google would gain a
conversion from Bing in this situation.

> Bing does nothing but analyze their own users' behavior

Assuming that this is a search result that Bing wouldn't have found by itself,
then this "user behavior" wouldn't have occurred for Bing to capture had
Google not exist.

Without Google each user can then probably only have behavior of clicking same
old web site they have collected from long ago, there's no search engine for
them to learn new relevant site easily.

~~~
encoderer
Don't take it personally that I'm not going to "line by line" your comment the
way you did mine -- I'm pretty busy today.

But the issue is, my clickstream is _mine_. If I chose to share it with
somebody -- Bing, Google, My Mom -- it's my choice.

If I, as a user, don't want Bing to have access to my clickstream, I won't
install their toolbar.

But if I do, it's not Google's business.

~~~
indyank
The issue is simple..google uses all its heavy resources to rank that startup
as #1. (good or bad).But bing just uses their toolbar to capture that
information and rank them. They call this "cost cutting".Doesn't this sound
cheap?

------
indyank
It is known to all that both use user data in whatever way possible.But what
bing seem to be doing it is use it on users searching on google (through IE
etc. or maybe even windows) and find the more relevant ones.Did you guys
notice Harry Shum mentioning about "cost cutting" in the video where Matt, he
and the blekko CEO discuss on this issue.

So, it looks like bing is indirectly using google's data by incurring lesser
cost and this is how they seem do it.Spy on google searchers and build the
right database. instead of spending on innovation and new ways on the
algorithm. Let google do that part while we piggy back on their good ones.

This is what has really irritated google.

------
treelovinhippie
Remember when the polarizing search battle was between Google and Yahoo?
...yeah neither do I.

These battles do nothing more than establish the two dominate players.

