This can easily be demonstrated. Google can set up a second honeypot but instruct its engineers not to click on the link, ever. If it shows up in Bing's results, then Bing is watching what Google returns and scraping its results.
But if the second Honeypot doesn't show up in Bing's results, then clearly Bing isn't copying Google's results, it's copying its toolbar's preference for links.
The entire thing is moot to me. The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware :-)
I'd question even that hyperbolic interpretation. Let's say that Google sets up a script that sends queries to websites and then record the results and incorporates what links are shown on that site into their search rankings. Is that clearly copying? No, that's just pagerank.
If you have a web directory, a link page, a blogroll--isn't Google "copying" your work by using it to improve its search results? How is that any different from what Bing's doing?
This is my first thought as well. Google's pagerank analyzes the link structure of the web as one of the inputs to its search ranking. Apparently, Bing's toolbar analyzes page content coupled with user click behavior as one of the inputs to its search ranking.
These two things don't seem very different to me. Both of them are relying heavily on the value provided to them by tracking and analyzing the behavior of users on the web to drive search results.
Bing will be at fault if they specifically target Google. But if you consider entering a keyword and then click a link is essentially targeting Google search, then it only expose another problem, that is Google's monopoly on the search market.
Microsoft is collecting the same sort of information on Google queries that it collects on Bing queries and that Google collects on Google queries. All this is happening at the long tail where both companies are most likely using something other than webcrawling to tailor search results - afterall the whole experiment is only possible because Google can seed page rankings at will to link arbitrary terms to specific search results.
(Given Google's near-monopoly of the market, Microsoft and DDG have some amusing competitive synergy going on, don't they. DDG can criticize Google all they please for retaining user data because DDG doesn't and isn't in a position to benefit from it. Microsoft, which certainly is in a position to benefit from it, doesn't need to worry about Google calling them on it because Google is the only search engine that can actually lose market share over the issue.)
It could be something a registered user could set from a browser toggle, and DuckDuckGo is a very good project, or course. My point was: data portability and user control are within Google's long term interest, not being evasive about their data cache.
Where you’ll be more limited with it, is that it’s apparently not IE, but the Bing Bar that is at stake—the connection is getting thinner.
(C.A.R Hoare's billion dollar mistake, for example).
Just because you're copying the data indirectly through a third party doesn't mean you're not breaching the copyright.
Very murky waters. If Google starts complaining that other people are tracking their users, they might end up educating users about how much they and their advertisers track.
...I hope DuckDuckGo figures out a way to capitalize on this brouhaha...
Call Google the market leader all you want, but let's not forget that Microsoft's market cap is around 40 billion dollars greater than Google's.
That's more than Research in Motion's total value!
these are cases of outliers. they don't exist on the real internet, or at least where pages exist without any other data (anchor text, inlinks, outlinks, words from the query in the document) they never get surfaced from a search engine.
abscent fake click-data there is no way google could surface these documents for the specific queries. in fact google states this openly in their "attack piece". before they manually changed the rank of these document they didn't surface these either.
the only evidence of "cheating" is that bing surfaces document for which there is no known relationship between the query and the document, except for spam created by google engineers. this is evidence only of a bug in bings ranking algorithm. clearly it is using signals from google. just like google uses signal from CNN (keywords, inlinks, outlinks, anchor text, etc).
i'm sure bing is thankful to google for helping find this defect in their system and are hard at work to fix it.
people talk about bing copying search results like google invented search results and put a lot of hard work into them. in this case the only hard work they put in was designed to spam bing.
i can only conclude that google is getting worried about bing quality and has run out of ideas on how to fix their own problems.
I get why Google is upset, but this doesn't strike me as unethical behaviour in a free market.
This should help establish if it's the toolbar that is sniffing.
If so, while it may be questionable behavior, Bing would not be copying Google's results.
There is no victim here. They are not taking your 1st result and copying it. They are taking the result the user clicked. Obviously you didn't predict that with your algorithm or you'd have always made that the 1st result. Instead, what they're tracking is user behavior, not your raw ranking.
Obviously users give Google implicit permission to track their behavior by using your product. And similarly, by installing the Bing toolbar, they're giving Bing that permission.
This is beneath you Matt and it's beneath Google.
This is disgraceful attention-whoring on Google's part. Quite surprising, too, as I don't remember them ever stooping that low.
I wouldn't be surprised if they're more interested in data from domain-specific sites like epicurious than generic search sites like Google.
I'd also guess that this won't work once the SEO guys figure out they can feed fake clickstreams to MS.
We really don't know how much value Bing puts on clicks made on Google. Perhaps a lot?
What's the Google's official stand on this?
If the case described above were true, then all Google has done here is to make inconclusive accusations and use the occasion to highlight its own dominance over search.
It seems to me this is just a cheap and slightly seedy PR stunt.
Why? This seems like a great idea.
They played dirty with Netscape/IE in the 90s and look what happened.
It is kind of embarrassing for Google, but if it is real and it continues, it's better to address it now rather than after MS becomes a titan of search and Google's market has eroded. At times, it seems like MS has changed in ways, but fundamentally they're still run by the same guys. Remember that when you play your Xbox or use Bing or any MS products, they don't like to see other successful software companies.
I remember that when Bing went out, everyone was wondering how close to Google the results were (and talked about it as a good thing).
Would it have been better if Google had jumped straight to the questionable lawsuit part, like every other company seems to do when threatened on its own turf?
Those bogus results made it to Bing's results eventually.
Ok... So it proves Microsoft analyzes the toolbar behavior and when it has no other data, it will therefore look like a copy of Google search.
Sounds fair to me. Do you want to get into a discussion on how exactly Google tracks you online?
... in 7-9% of cases.
I think Google has a right to complain. Microsoft has resorted to these less than innovative tactics to monopolize themselves for a long time now, and it isn't fair to companies like Google who have worked their butts off (and gave 1.8 million shares - $336M in 2005 - to Stanford for the PageRank algorithm) to develop their superior product.
Let's put it this way... if Google hadn't bought the PageRank algorithm from Stanford and put years of work into perfecting their search results, Microsoft wouldn't have any way to track which Google search results users click. It's an unfair tactic that clearly demonstrates Microsoft's sketchiness and desire to monopolize themselves (by any means necessary, "evil" or not) wherever there's a computer.
As for the fanboy comment... I'm certainly not a fanboy but I'll let the following speak for itself: Microsoft Internet Explorer vs Google Chrome
From a search engine user's point of view, I believe this whole fiasco is ridiculous. First, it's ridiculous because Google is handling this situation very immaturely. Matt Cutts should not have confronted the VP of Bing in a way he did. Second, if I were the user of the Bing Toolbar, I gave permission to the Bing Toolbar to use my behaviors to polish my search results. I have no problem with that. Lastly, the experiments they did has more to do with "guessing what user wanted" than "what PageRank does".
I've used Bing fairly often past 6 months because of too many spams Google search results were giving back. Now that Google has fixed (or still working on) the spam problem, I'm starting to use Google again. However, what I noticed from the past 6 months is that Google search isn't so much better than Bing. This Bing Toolbar fiasco only applies to synthetic queries that I would never make.
Is Bing cheating? I don't think so. To me, they are just using another signal from user's permission. However, the definition of cheating will be different for everyone else.
Perspective changes things here, which means no one is "right" or "wrong".
It's more like Linus say you can't use the code, but Google use them anyway.
Btw, Google contributed a lot to open source projects.
By installing the Bing Toolbar, users are giving permission to track their clicks. If Bing's server farm is searching Google and parsing the results then it is more like your example.
In other words, there's a relationship between Page A and Page B if there exists a link beween them (==PageRank). But the strength of the relationship is increased based on how many users click on that link. I think that's the information Bing were trying to capture (or if they weren't, they should have been).
What I'm saying is that it's probably an unintentional side-effect. At scale though, the effect is that Bing gradually uses Google as a signal, simply because Google is a popular site.
edit: Yet another way of saying it: I think it's not just clicks on Google searches that are captured by Bing, but clicks anywhere. Google is a large site, so its influence on Bing can be measured. This is what we're seeing. My theory. I don't work in search.
Now the interesting thing to reverse engineer is what other information might be passed along to give relevance to the search term/click pair. If Google could establish that there was a third piece of info in the tuple, such as "originating search domain" and that Bing used this to weight term/click pairs based on the authority of the source, Google's claims would hold more water. I suspect that Bing has to apply some kind of validation of the term/click pairs (for instance, only sending pairs that appear on the same results page from accredited engines), otherwise they would be subject to "Bing bomb" attacks where users or botnets vote up lower ranked (or even unranked) clicks for a given term. (And if they don't validate or detect gaming, then there would be ample opportunity to inject all kinds of synthetic behavior into Bing's search results. Based on the relatively few number of users and clicks it took to own a long tail term, it seems like the protection they have is very weak or simple.)
edit: I'm not even sure if it's only search engines that are being analysed by Bing or all pages, but it's possible that it is just SEs - they could be capturing query terms distinctly.
I'm interested in another experiment. If you set up a honeypot, search for the term, but never click on the link, does the honeypot start showing up in Bing? The article doesn't say whether you tried this. Did you try it? Are Bing scraping your results from the page or only tracking their users clicks?
Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.
Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.
Isn't compliance with robots.txt more of a voluntary thing?
I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)
For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).
It would take a pretty big leap to go from robots.txt is advisory to ignoring it constitutes a criminal action.
Google has managed to demonstrate one way MS appears to be using the data. What does google do with their trove of data? That's a lot of data to collect and not do anything with.
If they want to make it perfectly clear they should add into their privacy policies and EULAs.
But the article clearly covers the available public statements on this issue and patio11 dug up a post from Matt Cutts in his comment below that directly addresses this: http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/.
Again, if you actually read the article, you will come across the section titled "What About The Google Toolbar & Chrome?" I encourage you to read it.
 Also, see this comment and patio11's subcomment further down the page, both of which were written an hour before yours: http://news.ycombinator.com/item?id=2165469#score_2165578.
I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.
Edit: You can uncheck everything under privacy and it will still send those requests.
Edit2: What it sends back looks something like this:
<?xml version="1.0" encoding="UTF-8"?><autofillquery clientversion="6.1.1715.1442/en (GGLL)"><form signature="8551191143090325242"><field signature="620769395"/><field signature="2995202485"/><field signature="2175865763"/><field signature="904516291"/><field signature="2953051246"/><field signature="2649047790"/><field signature="2308153337"/><field signature="1003471793"/><field signature="3255484099"/><field signature="1305698505"/><field signature="3676143819"/><field signature="1275502930"/></form></autofillquery>
Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.
I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?
If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/
and this: http://code.google.com/p/chromium/issues/detail?id=60422
Please. Adding my own SSL cert to my own laptop is not harder than I'd expect. Certainly not harder than many other things you did in setting up this experiment.
I was gonna call out Matt for crawling bing's search results but I'm guessing Microsoft hasn't realized they return results from the /Search/ folder. ;)
 I have always suspected that the real value of Bing for Microsoft is to prevent Google's data mining of queries originating in Redmond.
If I'm in the business of giving horse racing tips and I read your tips to see what your strike rate is compared to mine, that's one thing. If I start tipping the same horses as you, purely because you tipped them, that's quite another thing.
edit to expand: If the measure by which a search engine evaluated itself was advertising revenues, they'd all have massive intrusive adverts, and no users. The only viable measure can be the quality of the search results themselves. As a happy coincidence, if you build something capable of delivering high quality results, you can very easily use that to produce highly relevant adverts. Imagine that each advert is like a little webpage, and rank them just the same as you do for normal webpages. (caveat: there's no link graph for adverts, so we're reduced to using a simpler text mining approach, eg bag of words vector space la-di-da).
This whole episode points to the sort of counter-espionage operations the two companies are engaged in. Look how important a propaganda victory is for Google? It strains credulity to believe that the release of this information on the day of the panel discussion is pure coincidence.
MS are desperate to regain control. Google will soon launch their own web-centric OS properly, and bam, MS will have no business apart from selling to an ever-dwindling number of companies who can't believe MS don't rule the roost any more. In 20 years they will simply cease to exist if they can't come up with a world-beating online product and win back control of people's computing lives.
Notice how they're diversifying into games and search in order to prepare for the worst case; that their core OS and 'boxed software' business fails.
It's ugly and immoral and probably legal. Good job catching Microsoft at it (and I think the really really unethical and scary bit is that Microsoft is cheerfully stealing info from users via their browser). I also realize that Google got where it is in Search by innovation and iteration, and that Google's search team has nothing to do with Android per se, but you might see how Apple people feel about the business empires built on stealing their ideas.
Then iPhone came out and it looked like this: http://km.support.apple.com/library/APPLE/APPLECARE_ALLGEOS/...
Now Android phones look like this:
You don't see any signs of copying here?
Why shouldn't Microsoft be using this kind of data? Google search result pages are part of the internet just like any other publicly available web site. Microsoft monitors what the users are clicking on Google and probably on Bing and other sites. So what? Monitoring users is not a new thing. It may be unethical and I may personally hate it, but almost everyone is doing it.
Google should stop whining about this and make their search result the best they can. If they had the best search engine, Bing could come close, but never overcome Google by just copying part of it.
My question for you Matt... is there any way for Google to build a toolbar that effectively does what the Bing toolbar does (or even a joint one?). I jump to use the various search engines because no single search engine is sufficient. But clearly when Google isn't sufficient, you don't get the value of when I go to Bing. And vice-versa (as I don't use the Bing toolbar currently, but that may change now). Or do you feel that with 65% of the market, you don't need this info?
I bet is just recording general searches (input query) + clicked links. A pretty good idea.
And don't tell me you are not using the results from the google toolbar to rank the sites in google search.
The temptation to abuse that power is pretty big.
> The day after that, Bing contacted me. They were hosting an event on February 1 to talk about the state of search and wanted to make sure I had the date saved, in case I wanted to come up for it. I said I’d make it. I later learned that the event was being organized by Wadhwa, author of that TechCrunch article. [emphasis mine]
So the supposedly independent author of an article on TechCrunch that kicked off a massive wave of Google criticism is, less than a month later, organizing events specifically for a Google competitor? Boy, that sure seems above-board.
But the article definitely gets a few things wrong. For example, having worked at Bing I can tell you this: in general "obvious" misspellings are autocorrected without comment. It's not some sort of magical copying procedure, it's actually a policy. Want proof? Here's an example query you can repeat: http://fayr.am/4KdG (direct query link: http://fayr.am/4JZD)
But otherwise, shit yes everyone is scrutinizing google trying to figure out what they're doing. That doesn't mean other players aren't doing their own optimizations, or even running relevancy metrics against other search engines. Relevancy is not a concept with fixed metrics, and every player in the search market does everything they can to figure out what their competitor is doing.
And even the raw results leakage is fairly par for the course. It's not like Bing searches are a crawl of google searches; Microsoft gets this data from browsers running this toolbar and uses it to help shore up queries where they don't return good results.
Edit: The same appears to be true for mbzrxpgjys and indoswiftjobinproduction
Edit 2: Hey, that's weird. Adding a comma, semicolon, period, or other symbol to the beginning or end of the query makes the gamed results show up on top at thor and www as well. Seems to work for all the terms at issue:
Note that some of those servers don't get updated often.
Just happy that my current SE favourite does not seem to copy and to actually have their very own results: http://entireweb.com/#q=hiybbprqag
vanessafoxnude.com has been redirecting to my current site for several years now, but back when the original site was active, much of the incoming anchor text was related to Google and search.
In the case, the customers don't get relevant results unless other potential customers use the competition! In short, Bing's results are only good if Google is popular.
Why would you invest time relying on your competition? Shouldn't you be striving to match or beat them, rather than trying to piggy-back on them?
If you have a bunch of users searching for "XYZ" on a different search engine and consistently going to link A -- wouldn't that imply it was relevant? You'd do the exact same thing for searches on your own search engine. The only difference is people have opted in to allowing you to have this info _implicitly_ by going to your search engine vs giving you this permission _explicitly_ by clicking through the EULA for the toolbar.
As a practical matter, I doubt customers will care so long as they've always had the option of turning off that part of IE's behavior. I mean, when did you last care about the authenticity of your phone directory's information?
headline: googles trap for microsoft
PR "wins" can become PR nightmares in a blink.
Firstly because I think they originally misunderstood the manner in which Bing's results were being influenced by their own, and then secondly because if they are going to complain about Microsoft collecting information about their user's usage patterns -- well, that's really, really hypocritical coming from Google. Lastly because the whole thing smacks of high school level gossip. If Microsoft is really doing something out of line, handle it in some other way than engaging in a gossipy blog war.
To reiterate, I'm actually a Google fan, I'm OK with trading some privacy for useful services, but if they are going to bang on Microsoft for collecting user usage information, well that's about the worst case of the pot calling the kettle black I've ever heard of in the tech industry.
Discussing the hypotheticals of the situation as others are doing is interesting (serious), but irrelevant. The court of public opinion isn't going to care about that nuance and will find against Bing if this goes viral. All of the other defenses won't matter either, "everyone is doing this" and so on. Public opinion won't care.
I think the same thing applies to Bing here: if they have a generic algorithm that ranks results based on toolbar (or other data) it could be easy to see how their data is skewed by Google given the amount of traffic Google search gets compared to the rest of the internets. This seems fine to me.
But if their algorithm does stuff with activity on google.com because it is google.com then this is a pretty clear foul - it is both essentially copying, and the equivalent of manually ranking results (specifically, Google results)
The corollary of this is that if their algorithm is generic, then it will still work if Google were to cease to exist. If it's not generic, it would be useless without Google.
Absolutely not. The PageRank feature sends back URLs, but we’ve never used those URLs or data to put any results on Google’s results page. We do not do that, and we will not do that.
Matt Cutts, can you clarify if Singhal in fact meant the 'narrow' or 'general' interpretation above?
And, if the 'general' meaning, then is there any statement about the use of clicktrail data in Google's published privacy policies that is as strong as Singhal's?
Wow, it almost seems that is exactly what they are doing, which is some pretty dirty stuff. Now MS always had a shady track record, but I thought recently the company got a lot better.
Piggybacking like this should really be copyright infringement or something, as there's nothing morally right about it.
On the other hand Google should be more quite about this, after all they've built their businesses database for Google Maps / Google Places by piggybacking third-party services like Yelp and TripAdvisor. And now all of a sudden when I'm searching for "restaurants" I have to scroll the page to get past Google's own crap.
Seems like a no brainer, unless i missed something.
I also really like this for some reason. It's very ... gangster. Shows that bing is scrappy and willing to bend the rules.
That being said, i will still continue using Google.
it's very discouraging
Right now the post has +186. So assuming that half of those people read your comment, we have something like 93 people who read your comment, and there are just two more downmods than upmods.
Unfortunately, HN shows people a comment's score before they read it, which skews results. People aren't objective, they tend to upmod comments that are already upmodded and downmod questions that are already negative.
So if you are unlucky and the first one or two people to read your comment are constipated, you get a negative result for not fault of your own. Bad luck, try again.
Please don't be discouraged. Try to be helpful and constructive and to present a point of view that others may have missed in the conversation. The upmods will eventually follow. Cast your seeds and let a thousand flowers bloom.
Of course your competitors are going to copy you. It's not innovative, and you might consider it 'cheating' if you forget that each and every one of us are building off of a foundation laid by other people. But it works, and that's why it happens and will continue to happen.
Copy == getting inspired by brilliance of an idea and implement on your own in the first case.
Copy == stealing in the latter case.
Copying (Ctrl C+ Ctrl V), google search results is theft, not "getting inspired" from previous body of works.
Edit - Made the PoV clearer.
Here, it means 'we have evidence, given to us by our users who agreed to share their web traffic with us, that showing this result for this query is a great idea -- so let's do that.'
People obtain data on their competitors' performance all the time and tailor their products accordingly. It's not theft, it's competitive intelligence.
Yes, but in this case, its more like claiming competitors product (the search result), as your own in your product directly. I would consider it a theft.
This is a bunch of microsoft haters making hay over nothing at all. Quit whining. It's not theft, it's not any more privacy-offensive than anything Google does, get over it.
1. User does a search in a Microsoft toolbar, using Google as his search engine. User is searching for $terms.
2. User gets a results page. User clicks on the entry in the results for $site.
3. Toolbar sends back to Microsoft that the $site was the first result the user chose for $terms.
4. Bing uses this to increase $site's placement in searches for $terms.
An interesting question then would be whether or not Microsoft also "copies" from Bing? That is, if you are using Bing as your search engine, do they still use the fact that you went to $site after searching for $terms to adjust the rankings?
@Matt Cutts - I'd love it if you could confirm exactly which user data you DO and DO NOT use to influence rankings. Or, at the very least say on record that you don't do what Bing are doing and use data from bing.com
Overall, I'm not surprised that Bing are doing this for some keywords - all the major search engines use a massive number of different signals. I'll be more surprised if it turns out this is happening at a large scale or for competitive terms.
I see this Microsoft tactic as similar. They're considering search terms that resulted in a visit to the page from other search engines as being important indicators of the page content. If they have that URL-to-URL-trail data legally, and the signal works well, and they are not singling out Google's URLs as the only source of such a signal, I'm not sure what the problem is.
Google didn't get where they are by throwing out legally-collected useful data, and Bing won't catch up to a leader who has clicktrail sensors everywhere, via analytics/toolbar/ads/mobile/etc., by throwing away legally-collected useful data.
1.Bing is inferring search results from user behavior, collected via Bing Toolbar
2. Google team makes an experiment: using Bing Toolbar to feed Bing particular behavior. Namely, they all go from a search result page on Google.com laden with a unique word to a particular target site.
3. Bing infers connection between the unique word and the target site.
4. Google cries cheating.
But what I think is more important is all of the flak that Google has been catching for supposedly slipping in its quality of search results. If it's quality is so poor, then why is Bing stealing its results? It's a great method of striking back at the negative PR they've been receiving.
Bing is apparently using toolbar click data (AFAICT it hasn't been shown that this is specifically targeted at Google or even at search engines in general) when it has no other information for the given search term. That has very little relation to the quality of Google's search in general.
Both Bing and Google are targeted towards mass market and I think people expect the same from both. If Google does it right, there is nothing more to invent. And even if there is, it is probably pretty expensive. It is so much easier to copy than to invent from scratch, just to get something almost exactly the same as Google :)
I am really interrested in what could Bing do to be REALLY different or better than Google. And if they did, Google would most likely do something very similar :)
 - http://en.wikipedia.org/wiki/Fictitious_entry
IANAL, but in certain jurisdictions, most certainly yes. Many countries have copyright laws that protect compilations of things that are individually not worthy of copyright, for example telephone books. Copying down an individual telephone book entry is of course not a copyright violation, but copying the whole listing in a systematic fashion is.
I'd guess that this law applies to search engine rankings as well - rankings/listings of individual items that are not protected by copyright, but where a lot of effort goes into producing the listing itself.
- generally speaking, the conclusion seems to be that for regular queries, Bing uses mostly other clues to figure out relevance, so this is basically a storm in a cup of water. Regardless, since both Google's and Bing's algos are closed-source, we're going on faith when either company says data gathered from one of their products doesn't affect search quality.
- the whole thing about making a ranking overrider and talking about it publicly seems like a stupid move. Why in the world would you say you developed such code and then "deleted it" in an all-code-is-version-controlled-these-days world? This won't go very well against the claims that Google gives preferential treatment to its own services (e.g. email, maps) vs competitors.
- The experiment reportedly was triggered because Bing results were getting better for misspelled searches. But, seriously, returning wikipedia as the top result for something with low levenshtein distance to a rare word is not exactly rocket science...
- if Google feels that its SERPs are the most relevant possible, shouldn't it make sense that competitors trying to improve relevance will inevitably end up showing the same results as Google on at least a subset of queries?
- if you're saying Bing has just as good results as Google, regardless of the means to the goal, then how does publicizing that help the whole "Google's overrun by spam" meme going on?
Actually, that is Google's core business, and from the amount of revenue it's generated most likely harder than rocket science. The generation between keyword and website content/results is what a search engine is all about, and what Google does (arguably) well.
I believe why Google is crying foul is because it is the only reference to generate the mapping between the keywords they made up and the website results. Bing did not have these mappings until they evaluated user clickthroughs that went through Google's results, with their browsing history going something like :
Now Bing is using the users click history to generate the mapping from keyword<=>http://website.com ; this is the shady part: if google did not generate its results, that mapping would never have taken place: the user would never have been able to tell Bing that there is relevance between the two unless Google existed.
You're talking about all of the work needed to make a search engine good, I'm talking about the specific algorithm needed for that particular type of query (rare, obscure, easily misspelled word). Different scopes.
And again, my observation in that bullet point is that, imho, the "torsoraphy" type of query could have been improved by something like the "close enough to rare word? + does wiki page exist?" algorithm, rather than copying.
Re: recording click history being shady: I don't really see what's so fundamentally different between that and recording surfing habits via ads. It goes back to the first point: Google could say they don't use that data to improve SERP relevance, but we're going on faith on that claim.
Edit: I did some digging to see if I could find an authoritative source on this, and found that Matt Cutts specifically denies this particular usage of user data for expanding the crawl set on his blog. Mea maxima culpa.
Edit the second: An amusing note on this general subject: Google will fuzz test certain search forms on, e.g., high value government websites to get at the juicy data behind them which would not otherwise be reachable from just traversing the link graph.
Isn't that basic classroom solidarity?
My guess is that they have a relevency metric from their own algorithm, but some results return poor relevency results -- then they may go to this set of secondary results and say, "Are there any results for this query that a higher percentage than expected users clicked on" and then add those results to the list.
IOW, this data probably isn't the common case.
I don't think it's cheating, no where in the article does it claim that they aren't doing their own search, they are just using Google's results as part of their own search algorithm. Is that really such a crime?
Using signals from user behavior on the toolbar on ANY search engine seems to make a lot of sense when it comes to improving search results. MS employees are the biggest QA group for Bing. Internal tools allow employees to tag queries and results that are superior/inferior to Google. Both are displayed side by side and employees provide active feedback to help improve the algorithm and identify more systemic underlying ranking issues.
Figure out the requests coming from microsoft and return a different set of search results (e.g. XXX stuff) so that it doesn't show up for organic google resutls. Set the trap and once bing has incorporated those results for a keyterm, spam TC and LOL at Steve Ballmer gettingn worked up.
This discredits the relevancy of bing and all that PR dollars spent rebranding would have gone down the drain. Imagine searching for a harmless search term like 'poodle' and getting hardcore triple xxx results.
Oh well, dont do evil right?
If that's accurate, that's a precedent I'd rather not have seen.
(a little help on the grammar here, anyone?)
If Bing was really copying results, they would have reflected the spam sites, because people click on those when they are highly ranked just as often as they click on the originator site. After all, the problem is that the content is identical.
Suffice to say, Google’s pretty unhappy with the whole situation, which does raise a number of issues. For one, is what Bing seems to be doing illegal? Singhal was “hesitant” to say that since Google technically hasn’t lost anything. It still has its own results, even if it feels Bing is mimicking them.
Funny... that's the exact same argument software / music piracy often makes.
Google are clearly monitoring Bing (and others) as a matter of course. I'm interested to know what they'd have done if they'd found Bing providing better quality results. Would they have spent resources trying to figure out what Bing were doing right, or would that be "copying" too?
disclaimer: this is not my acknowledgement that I agree with the practice.
An interesting side-effect is that Bing has in its logs the home IPs of the Googlers involved in this research (i.e., anyone who searched for "hiybbprqag" in Dec. '10).
"These searches returned no matches on Google or Bing — or a tiny number of poor quality matches, in a few cases — before the experiment went live. [...] Only a small number of the test searches produced this result, about 7 to 9 (depending on when exactly Google checked) out of the 100. Google says it doesn’t know why they didn’t all work, [...]"
The writer apparently thinks these results justify concluding the article with this takeaway:
"When Bing launched in 2009, the joke was that Bing stood for either “Because It’s Not Google” or “But It’s Not Google.” Mining Google’s searches makes me wonder if the joke should change to “Bing Is Now Google.”
By any chance, is Bing named after Chandler Bing?
"DuckDuckGo" has become by default. Its awesome.
Google states 9 of 100 planted queries showed up on Bing. You think Amazon, Godaddy, and AOL could make similar claims?
Probably...but those examples aren't worried about their market share evaporating.
ps -- there's a word for what MS's software appears to be doing: spyware.
If Bing can't find it for you, it will google it for you.
In a business environment, the relevant questions are:
1) does it break any laws?
2) is it profitable?
2b) is it consistent with the image MS wishes to project to its customers and/or regulatory overseers
One person's unfair competition is another's brilliant hack.
This is a fundamental difference between student reward-schedules and rest-of-life reward schedules.
There's no credit for effort, unless you can spin it that way in your marketing / branding.
Google has worked out their algorithms for processing the incoming data and generating an answer. Bing has apparently used Google's answer as a comparison to whether they are getting it right or not and when not, subbing in the other answer.
While in business this may not be illegal, is it still very much 'Not knowing the correct answer to the test'.
Perhaps Google has patented a few of its search algorithm components.