
Why is Google so hysterically hypocritical about Bing using its public data? - HardyLeung
http://www.roughlydrafted.com/2011/02/01/why-is-google-so-hysterically-hypocritical-about-bing-using-its-public-data/
======
geuis
I read half of this. Its late, and I don't have the time or energy to refute
point by point every single wrong thing that is written in that half.

Its easy to sum up. Yes, Google indexes hundreds of millions of sites. It does
so in order for other people to be able to search and find those sites, which
is important to their owners. Its a symbiotic relationship, not theft of
intellectual property.

Google has spent billions of dollars in manpower and physical capacity in
order to be able to do that. Also, each and every one of those sites can
_very_ easily say "don't index me" with a simple robot.txt file on their site.
Some do, most don't, because most sites find it valuable for Google to index
them.

Meanwhile, Microsoft is trying to compete in the same space as Google. They
are also spending lots of money and manpower to build their search engine.
But, by using Google's search results to improve their own product, they are
acting as a parasite. Google gets no benefit from Microsoft using their data.

So to sum up my own summary: Google is a mass symbiote, Microsoft is a
parasite.

~~~
greyman
Microsoft do NOT steal Google's search results. They use clickstream data from
Bing toolbar users to improve their search results.

What happened is that Google engineers engineered a use case, in which signal
from those clickstream appear like a stealing of search results. Or in other
words, Microsoft uses Google _users behaviour_ in their ranking algorithm. But
that's not unethical, and Google is doing the same thing.

~~~
luigi
But the effect is the same as intentional scraping and outright stealing:
search results that could have only come from Google are appearing as Bing
results.

Bing needs to blacklist Google from its clickstream. Simple.

~~~
nlogn
I don't even think they should just blacklist Google. They should just respect
robots.txt.

edit: I should have clarified. I know that the Bing crawler likely respects
robots.txt, but if they are using clickstream info to build their index, it
seems right that they should respect robots.txt there as well, no?

~~~
stanleydrew
I'm pretty sure the Bing Crawler does respect robots.txt. The data Bing
collected didn't come from spidering Google.

------
iqster
I had to reread the original Google article to verify a point. The signal Bing
is alleged to be using is that users CLICKED on the result that came up ...
not the results that just came up first in Google upon a search query. And
these are users who agreed to install the Bing toolbar!

Here's a hypothesis. I suspect Bing is making use of data that is typed in the
toolbar or browser's search box ... not data captured from text typed directly
into Google's search page. This might explain why their "honeypot-take" rate
was only 8%. This is a subtle point. So ... imagine if you are the coder who
wrote this signal collection feature. Would you capture "term in search
box,next URL clicked" OR would you capture "if search box search engine ==
BING or Yahoo or something else then capture term in search box, next URL
clicked".

Let me propose an alternative experiment. The test clickers should have
clicked on the second or third (or some position N where N > 1) link. This
would have demonstrated if Bing is using the actual click information or the
search results themselves directly. The former seems completely fair on Bing's
part. The latter _might_ be debatable. My point is this ... the Google
geniuses fail to distinct these two cases and are muddling it up for the PR
drones. This waste everyone's time and productivity. Moreover, they belittle
the hard-work of engineers and scientists. It is sad that this is what it has
come to.

~~~
jamesgeck0
From your comment: I suspect Bing is making use of data that is typed in the
toolbar or browser's search box ... not data captured from text typed directly
into Google's search page.

From the official Google blog: We asked these engineers to enter the synthetic
queries _into the search box on the Google home page,_ and click on the
results, i.e., the results we inserted.

~~~
dangrossman
The search text is in the referring URL no matter where it was typed in. It's
captured without ever looking at Google.

------
coffeemug
Oh jeez, not again. _This is the Microsoft we know and love to hate, less so
now that it is falling apart and must be pitied as an underdog._

Let's see:

Market cap: Microsoft - 239.49B, Google - 195.50B. Revenue: Microsoft -
66.69B, Google - 29.32B. Profit margin: Microsoft - 30.84%, Google - 29.01%.

Falling apart. Right...

~~~
andrenotgiant
All it took to piece together the author's ignorance was the

"Daniel Eran Dilger is the author of “Snow Leopard Server (Developer
Reference),”

quote down at the end. I'm sure his line of thought was: "All my friends at
Starbucks have MacBook Pros, so Microsoft must be doing really badly."

Probably the same reason half his argument seemed to involve Android
jealousy...

~~~
j79
Goodness, your comment seems to imply you have an issue with Apple products,
their policies, and/or Steve Jobs. Yeah, the guy writes about OS X and Apple,
but he's been a tech consultant for nearly two decades. He's worked with
various technologies, from various companies. A quick Google search (heh)
would have revealed this. Or, what? Did you believe he was some 20-something
year old blogger regurgitating the same "FUD" you believe all Apple fanboys
are spewing regarding their "Android jealousy"?

------
sajidnizami
First article I've read in a while that's taking Microsoft's side in the whole
situation. Every paragraph of this screams publicity. I am just wondering here
if this is a PR tactic. Here I would be happy to see a balanced review of both
companies behavior but this is just one sided bashing.

<quote>Shame on your pretentious, obnoxious, indefensibly egregious double
standard in the field of using public information to turn a profit.</quote>
<<\-- Ironically, Bing is doing the same and unlike content creators who can
opt out by putting a robots.txt file Google cannot opt out of this because
like it or not toolbar is always sending back information.

Google indexes public content but how it aggregates it and displays it is
their deed. The ordering, etc is Google's work not content creators.Copying it
is just like copying an IP. I would love to see this go to court and be
fought.

~~~
rbanffy
I think that the Phineas Barnum assumption is pretty much valid and this
article proves it.

------
va_coder
This is starting to get interesting.

Google makes money aggregating other people's content. What happens when
people aggregate Google's content? What's fair?

~~~
InclinedPlane
Explain what you mean when you say that Google makes money by aggregating
other people's content? Google makes money by indexing other people's content
and driving traffic to other people's sites. That hardly seems like a bad
thing.

~~~
nhebb
<http://books.google.com> is considered one of the more egregious examples,
with many authors and publicists having complained about Google taking samples
of the content and posting it online. I did a Google search a while back on a
graphics issue, and got the answer I needed from a Google book search result.
You could argue (as Google does) that they are really just helping promote the
book. But I have no need too buy - I got the answer for free.

~~~
beoba
Would you have bought that book just to get that one answer? Would you have
known the book even had the information you needed without Google's help?

------
tristanperry
Hmm. It might just be me, but I get the feeling that they aren't exactly fans
of Google?!

~~~
latch
Which doesn't make his point less or more valid. Just because you dislike
something doesn't mean you can't provide insight and valid opinions about it.

------
tallanvor
I find it amusing that Dilger mentions Overture, which bought alltheweb from
FAST in 2003. FAST, of course, was later acquired by Microsoft in 2008. The
number of connections among the major search companies is rather amusing, if
not necessarily surprising.

------
fosk
I have a feeling that people's reaction on this is biased by the fact that one
of the two companies is Microsoft. What happened here it's not ethically
correct, but come on guys, we must be honest. This is business, it's about
billions, it's about market shares. It's not a news that money is not ethical.

Google collects data too, Google does its own shit like everybody else, like
Microsoft. Google has its own toolbar too. And all this story seems more a
marketing move against Bing.

What I want now to happen is to have more competition, instead of crying
Google should work hard to be unbeatable and competitive. History teaches us a
copy it's never better than the original.

------
tungwaiyip
I'll try to make my point again. There is one school of thinking that Google
is profiting by farming information from the Internet, literally on the back
of many people's labor. You can prove this by running a "Google Sting", that
is known as "Google bombing". By having many people setting up honeypot to
link an arbitrary keyword to certain URL, you seem to be able to fool Google
to artifically associate such keyword with the URL, thus "proving" Google is
profiting from your labor. I don't really buy this logic. But Google's
accusation of Bing steal from them seems very close to this line of framing.

------
tnorthcutt
Is anyone else bothered by the fact that the article switches back and forth
between serif and sans-serif font?

~~~
droz
Nope. Just you.

------
1010011010
<http://createasearchengine.appspot.com>

------
Joakal
<http://www.google.com/robots.txt>

First two lines:

User-agent: *

Disallow: /search

~~~
patio11
There is no robot involved. An actual human instructs their user agent to
retrieve a document on Google. A piece of code, running in that user agent,
does something. If anyone has a complaint here, it is the user, not Google.
(Google really really doesn't want to spin this as "MS is spying on our users"
because as soon as they say that some person is going to point out "Google is
spying on the entire world" and that person _will be right_.)

~~~
motti
Although strictly speaking you're right, it breaks the _intent_ of robots.txt
as the users' clickstream is fed into the server-side indexing engine
alongside robot crawled data.

Arguing that according to the strict letter of the spec, robots.txt only has
to be obeyed by true crawlers does hold water as strictly speaking Bing could
disregard robots.txt altogether - it's not certainly not legally enforceable.
The intent of robots.txt is clear and Bing should be trying to obey it
wherever possible.

In this case it appears the keyword was associated because it was typed into
the Google search box, and from the there to the faked destination. IMHO when
the clickstream was analysed it should have disregarded clickstreams that pass
a robots.txt-excluded page as these could establish associations that were not
supposed to revealed by crawlers.

We aught to hold Bing and any search engine to the highest standard when
talking about crawling etiquette.

Also, see <http://news.ycombinator.com/item?id=2169817>

~~~
Dylan16807
If I'm google and I block /search I want no company to collect the data.

If I'm a site that blocks /dynamic because I want to make sure my stats are
accurate, not factoring bots, and I want to keep the load down on slow-
generating pages, then I'm perfectly happy with the data being collected.

Is it the intent of robots.txt _itself_ to block the clickstream data, or is
it just google's intent?

------
zerd
Dilger first asks "Bing is using Google's search results to improve its own.
But what’s wrong with that?"

If Bing uses Google's search results, then we effectively only have one major
search engine. Yahoo has switched to the Bing engine, and Bing gets a big
chunk of it's results from Google. So we have a search engine monopoly. That's
wrong with that.

> "This is the company that indexes blogs [...], and then makes all this
> information available without consent"

He first asks what's wrong with Bing copying Google's results, because it's
public content, then says it's immoral to index public content. Double morals?
Logic fail?

~~~
tree_of_item
Where did he say it is immoral to index public content?

Google complains about Bing using public content. Google uses public content.

Snarky question? Fail?

------
andrewcooke
Imagine a playground. Young, pre-alpha nerds writhing in chaos. Little arms
flailing. Faces red. "Poopy face". "Your mommy smells".

Nerd fight.

------
ddemchuk
The world's largest scraper is complaining about being scraped...when Yelp and
all the rest of the local businesses directories realized they were being
scraped and the data was being used to google's advantage, it doesn't appear
that Google gave two shits. Tables have turned, and it's like someone stole
the handball from the recess yard.

~~~
SeanLuke
> when Yelp and all the rest of the local businesses directories realized they
> were being scraped and the data was being used to google's advantage,

<http://yelp.com/robots.txt>

Not a mention of GoogleBot.

~~~
ddemchuk
well, they're presented with either 1) killing their organic search traffic by
blocking Google or 2) allowing the crawlers but not being able to stop their
data from being aggregated and included in local results.

Yelp's API specifically says their data cannot be aggregated with other
sources. When presented with that, Google simply said they weren't using the
API, they were scraping.

Apparently scraping is a free pass to do whatever the hell you want if you're
Google.

------
zohebv
I think Google is being disingenuous with their "bing is copying my data"
story.

However, please avoid linking to roughlydrafted.com. At the risk of going ad-
hominem, I just wanted to point out that if you have read any of the material
on the site, you will know that it is just full of flamebait articles with an
absurd Apple fixation.

Everything Android does is copying Apple. Everything Microsoft does is wrong
and stupid. If flash is not supported on iPhone it is because it is the
morally correct thing to do. When Apple announced the iPhone without an SDK
roughlydrafted called developers stupid for asking for an SDK and that
javascript on the web is the new SDK. This post however takes the cake, you
must side with Bing because Google is a greater threat to Apple than
Microsoft. Seriously, how does Overture even enter into this discussion. It is
tiresome to see this site linked to all over the web.

------
earl
Responding to merely two points, one hopes Daniel isn't as stupid as his
writing. "This is the company that indexes blogs, newspapers, and both digital
and physical books, and then makes all this information available without
consent in the contexts of its ads and paid search space, and is dismissal of
anyone who objects to Google’s ultra liberal sense of copyright. "

robots.txt and noindex
[http://www.google.com/support/webmasters/bin/answer.py?hl=en...](http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449)

And "Install the Google Toolbar and do a search of Bing, and Google actually
directs your clickstream back for its own analysis"

Google/Matt Cutts have been very open about what they do and don't use that
information for, and they aren't using click tracking.

------
napierzaza
Not really. If Google wants to publicly shame Microsoft it can and is. They're
basically saying that Bing is the new "Let me Google that for you". Since
they're not currently proceeding with legal motions it's basically just a
smear campaign to make people think twice about Bing.

And there's truth to it.

------
known
Microsoft did nothing wrong. <http://en.wikipedia.org/wiki/Deep_linking>

------
gildur
Google makes money out of being the #1 search engine. Concurrence from others
is one thing. Concurrence from someone stealing your source code (open source
or not) is a whole different story. That should be reason enough, in my humble
opinion.

~~~
gildur
Oh I obviously misunderstood the article. I'm so sorry.

