Hmmm. Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying. But what Bing does is when you use its toolbar, it watches what you do and uses that information to rank results. Is that really copying? It showed Google's Honeypot page because Google's engineers were clicking on the Honeypot page with the toolbar installed. That isn't copying Google's results, that's copying the actions of Bing toolbar users.
This can easily be demonstrated. Google can set up a second honeypot but instruct its engineers not to click on the link, ever. If it shows up in Bing's results, then Bing is watching what Google returns and scraping its results.
But if the second Honeypot doesn't show up in Bing's results, then clearly Bing isn't copying Google's results, it's copying its toolbar's preference for links.
The entire thing is moot to me. The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware :-)
Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying.
I'd question even that hyperbolic interpretation. Let's say that Google sets up a script that sends queries to websites and then record the results and incorporates what links are shown on that site into their search rankings. Is that clearly copying? No, that's just pagerank.
If you have a web directory, a link page, a blogroll--isn't Google "copying" your work by using it to improve its search results? How is that any different from what Bing's doing?
This is my first thought as well. Google's pagerank analyzes the link structure of the web as one of the inputs to its search ranking. Apparently, Bing's toolbar analyzes page content coupled with user click behavior as one of the inputs to its search ranking.
These two things don't seem very different to me. Both of them are relying heavily on the value provided to them by tracking and analyzing the behavior of users on the web to drive search results.
I have the same thought. It is more a matter of framing. A while ago some people accuse Google of unethically profiting because they are farming the link structure of the Internet, which is the labor of many people (is it Nicholas Carr?) I don't really buy this framing. But Google's accusation seems fall on a similar line of argument. You can also setup a "Google sting" to prove they are copying from the Internet. It is called "Google Bombing".
Bing will be at fault if they specifically target Google. But if you consider entering a keyword and then click a link is essentially targeting Google search, then it only expose another problem, that is Google's monopoly on the search market.
but the link between the obscure query and the click on the page wouldn't have been made without bing knowing the user first searched for the query on google, no? if it were simply boosting page clicks that would be one thing, but how else could rim.com rank 1st for "mbzrxpgjys"?
What the experiment shows is that where no other data is available Bing will use what it has and that Google can successfully seed Bing on the long tail. What it doesn't show is that in typical circumstances Bing is relying on data gathered from Google searches.
Microsoft is collecting the same sort of information on Google queries that it collects on Bing queries and that Google collects on Google queries. All this is happening at the long tail where both companies are most likely using something other than webcrawling to tailor search results - afterall the whole experiment is only possible because Google can seed page rankings at will to link arbitrary terms to specific search results.
Well yes there is a hard link happening between a google search result and a link being clicked, however googles argument isn't as strong if it turns out bing is doing this for all search engines. It might be that they aren't targeting google specifically, but instead they're targeting all search sites generically.
Google doesn't really want to get into a heated discussion about the evils of a search engine knowing everything you've ever searched for. Stones, glass houses, etc.
(Given Google's near-monopoly of the market, Microsoft and DDG have some amusing competitive synergy going on, don't they. DDG can criticize Google all they please for retaining user data because DDG doesn't and isn't in a position to benefit from it. Microsoft, which certainly is in a position to benefit from it, doesn't need to worry about Google calling them on it because Google is the only search engine that can actually lose market share over the issue.)
For all Google’s sins, there is a Dashboard that let you erase all you care them not to know: Google could promote that heavily in case of stone, glass house. I always assumed this would be a great way to learn more about queries: spotting what words people are ashamed to have searched for.
It could be something a registered user could set from a browser toggle, and DuckDuckGo is a very good project, or course. My point was: data portability and user control are within Google's long term interest, not being evasive about their data cache.
This is what all toolbars do, and is largely the point of why big companies offer them and pay little software companies to make them optional installs (see Corel's WinZip, which installs the Google toolbar)
I think this would still leave Google with a fairly strong argument - if Bing does it for all search engines, then they're effectively copying whoever is most popular. Since it's done through Internet Explorer, which is still bundled with Windows in most places, they could try to make the argument that Microsoft is using their position in the OS market to crush competition in other markets.
Interesting angle to go through the tied market and competition policy: that's a type of authority that is far more intelligent, and precisely just prosecuted IE in Windows. However, you'd have to either have a US court acknowledge that a European was right to disagree with them in the first place, or have a European court admit that their previous decision wasn’t enough. It’s feasible, but hard.
Where you’ll be more limited with it, is that it’s apparently not IE, but the Bing Bar that is at stake—the connection is getting thinner.
If a page contains a unique word, and people who were on that page universally go to a different page, that could be enough evidence for bing to assume there's a link between the unique word and the target page.
It might be a good exercise to chase down old posts of people who wrote the first browser toolbars, as well as the browser infrastructure that made them possible. We can contrast the speculation on why they might have been a good idea with the actual result. Not as a way to trash them, but as an exercise in how smart people can miss the mark.
As it happens there's an exact historical precedent for this. Post code <-> long/lat data is copyrighted in the UK, but users were using Google Maps (and others) to do conversion and supply them to open source databases, the end result is that Google had to change their licensing/API to restrict this sort of behaviour.
Just because you're copying the data indirectly through a third party doesn't mean you're not breaching the copyright.
Interesting case to raise. Reminds me of the NFL terms voiced after every game about the broadcast being for private use only. I imagine that if their terms don't already include a clause like this, they can try to suggest that a toolbar tracking user clicks is violating the terms.
Very murky waters. If Google starts complaining that other people are tracking their users, they might end up educating users about how much they and their advertisers track.
...I hope DuckDuckGo figures out a way to capitalize on this brouhaha...
how would you expect a search engine to be able to surface a web-page given a specific query in which there is no data to create the relationship except the signal from google engineers spamming bing with click data.
these are cases of outliers. they don't exist on the real internet, or at least where pages exist without any other data (anchor text, inlinks, outlinks, words from the query in the document) they never get surfaced from a search engine.
abscent fake click-data there is no way google could surface these documents for the specific queries. in fact google states this openly in their "attack piece". before they manually changed the rank of these document they didn't surface these either.
the only evidence of "cheating" is that bing surfaces document for which there is no known relationship between the query and the document, except for spam created by google engineers. this is evidence only of a bug in bings ranking algorithm. clearly it is using signals from google. just like google uses signal from CNN (keywords, inlinks, outlinks, anchor text, etc).
i'm sure bing is thankful to google for helping find this defect in their system and are hard at work to fix it.
people talk about bing copying search results like google invented search results and put a lot of hard work into them. in this case the only hard work they put in was designed to spam bing.
i can only conclude that google is getting worried about bing quality and has run out of ideas on how to fix their own problems.
all search engines make use of a variety of signals. Bing decided to use what users click on as a signal. Google spotted it and thought it was 'zomg bing are stealing our results'. I don't understand why you think taking advantage of a new signal to improve search is not a smart move by Bing?
I don't see this as being any different from what Microsoft has been doing for 20+ years. They let a competitor put the work into figuring something out, then make a a reasonably accurate facsimile thereof. I think it's lazy, but not particularly unethical. If Google were Benz, would they be complaining that Ford was making 4-wheeled vehicles with an engine? More appropriately, and given that I've been on a Top Gear bender lately, if Google were Cadillac would they be complaining that everyone else was copying their method for operating vehicles, with three pedals, a gear shift, a steering wheel and a handbrake?
I get why Google is upset, but this doesn't strike me as unethical behaviour in a free market.
It's embarrassing for Google to complain about this. You FINALLY get a little competition on your turf and you try to make some big issue that, as a market leader, the product you produce is being watched, analyzed and in some ways incorporated by your competitor.
There is no victim here. They are not taking your 1st result and copying it. They are taking the result the user clicked. Obviously you didn't predict that with your algorithm or you'd have always made that the 1st result. Instead, what they're tracking is user behavior, not your raw ranking.
Obviously users give Google implicit permission to track their behavior by using your product. And similarly, by installing the Bing toolbar, they're giving Bing that permission.
In short, Bing Toolbar infers relationship between words on the page and the next page the user clicks on. Google's team purposefully confined Bing Toolbar behavior-tracking algorithm to their use of google.com search result page, and then cried fool about "bing stealing Google's search results".
This is disgraceful attention-whoring on Google's part. Quite surprising, too, as I don't remember them ever stooping that low.
Referring URL? Oh right, Google intercepts all search result clicks through a redirector, and includes the search term in the intercepting URL. Yeah, the unique words in that URL could figure into describing the content of the final destination page (after redirect), the same way anchor text figures as well.
Even Matt's allegation is softened by "I believe" here: there appears to be nothing that conclusively indicates Bing is solely targeting Google. For example, the observed behaviour could be a side-effect of a generic algorithm to extract and associate search queries with a user's click stream, which is only a minor variant of what Google itself does with its own toolbar.
If the case described above were true, then all Google has done here is to make inconclusive accusations and use the occasion to highlight its own dominance over search.
It seems to me this is just a cheap and slightly seedy PR stunt.
In the hypothetical situation above, it has almost nothing to do with the search engine - it is both the user providing the query and selecting the result: this is the data of value, not which intermediary provided the list of results to select from.
They have a very long history of playing dirty. Lotus learned it, MS took information Lotus shared with them and then shared it with Excel and Office and they supposedly kept Lotus on an API changing treadmill. Digital Research learned it, MS wrote code that made Windows 3.x crash if it detected DR-DOS. Netscape learned it. Arguably, IBM(OS/2) and any other operating system vendor learned it in the 1990s as well, MS charged premiums if hardware vendors wanted to install non-Windows operating systems. They sort of tried to do it to Intuit, they made a competitor and then effectively gave it away for free. Enough so that a lot of folks avoid Mono like it's, well, actually mono. They've established that reputation, and most of the time, by the time it became clear what was going on, MS had already done irreparable damage.
It is kind of embarrassing for Google, but if it is real and it continues, it's better to address it now rather than after MS becomes a titan of search and Google's market has eroded. At times, it seems like MS has changed in ways, but fundamentally they're still run by the same guys. Remember that when you play your Xbox or use Bing or any MS products, they don't like to see other successful software companies.
Also there's the strange fact that Bing and their whole online division makes gigantic losses. They're not in it for the money, they're in it to stifle competition and hold back progress so they can milk their cashcow some more.
I think they are complaining because it could be far more widespread: it would actually be easier for the head than the long tail. Where it‘s harder is for News, and Bing appears to lag for recent results.
I remember that when Bing went out, everyone was wondering how close to Google the results were (and talked about it as a good thing).
If it's beneath Google to complain about Microsoft riding on its coattails for the highly valuable "long tail" of queries, surely it's beneath Microsoft to sue Android manufacturers for competing in the smartphone space?
in 7-9% of the times google tried to spam bing. not in 7-9% of search results. this indicates that maybe google isn't that great at figuring out how to spam bing or that bing is pretty good at defending agains spam. maybe google could take some lessons from bing on cleaning up spam and problem that seems all to prevelant on google these days.
There is, however, the legitimate complaint that they apparently do not finally have a little competition on their turf. Their competition is cheating, not innovating. That helps who, how? At best for "competition" sakes, Bing nabs a big share of the market; now there are two big dogs who make it hard to enter into the search realm with new ideas.
As we all know, this isn't the first time Microsoft has copied someone else. And I'm sure it won't be the last.
I think Google has a right to complain. Microsoft has resorted to these less than innovative tactics to monopolize themselves for a long time now, and it isn't fair to companies like Google who have worked their butts off (and gave 1.8 million shares - $336M in 2005 - to Stanford for the PageRank algorithm) to develop their superior product.
Oh I definitely did read it in its entirety and understood it perfectly. What you're failing to do is see the whole picture.
Let's put it this way... if Google hadn't bought the PageRank algorithm from Stanford and put years of work into perfecting their search results, Microsoft wouldn't have any way to track which Google search results users click. It's an unfair tactic that clearly demonstrates Microsoft's sketchiness and desire to monopolize themselves (by any means necessary, "evil" or not) wherever there's a computer.
As for the fanboy comment... I'm certainly not a fanboy but I'll let the following speak for itself: Microsoft Internet Explorer vs Google Chrome
All your comments are coming from your assumption that Microsoft is trying to monopolize in something - in this case, search. Hence, your comments (although you will disagree) are biased and irrational. Microsoft isn't trying to monopolize in anything nowadays. In fact, they can't, so they aren't even trying.
From a search engine user's point of view, I believe this whole fiasco is ridiculous. First, it's ridiculous because Google is handling this situation very immaturely. Matt Cutts should not have confronted the VP of Bing in a way he did. Second, if I were the user of the Bing Toolbar, I gave permission to the Bing Toolbar to use my behaviors to polish my search results. I have no problem with that. Lastly, the experiments they did has more to do with "guessing what user wanted" than "what PageRank does".
I've used Bing fairly often past 6 months because of too many spams Google search results were giving back. Now that Google has fixed (or still working on) the spam problem, I'm starting to use Google again. However, what I noticed from the past 6 months is that Google search isn't so much better than Bing. This Bing Toolbar fiasco only applies to synthetic queries that I would never make.
Is Bing cheating? I don't think so. To me, they are just using another signal from user's permission. However, the definition of cheating will be different for everyone else.
Err, by that line of thinking, Google leveraged Linux (the hard work of volunteers) to earn tens of billions and does not release the modified code for use of the volunteers. Of course they are not required to, but it isn't fair to Linux developers who have worked their butts off to develop Linux.
I have a suspicion what you'll find is that Bing use the toolbar to match $current_page_content with $clicked_page_content. When $current_page_content contains obscure words, that becomes the only signal, and so bing's engine will naturally associate it with $clicked_page.
In other words, there's a relationship between Page A and Page B if there exists a link beween them (==PageRank). But the strength of the relationship is increased based on how many users click on that link. I think that's the information Bing were trying to capture (or if they weren't, they should have been).
What I'm saying is that it's probably an unintentional side-effect. At scale though, the effect is that Bing gradually uses Google as a signal, simply because Google is a popular site.
edit: Yet another way of saying it: I think it's not just clicks on Google searches that are captured by Bing, but clicks anywhere. Google is a large site, so its influence on Bing can be measured. This is what we're seeing. My theory. I don't work in search.
Exactly. If they are just matching (even more simply) $search_term_entered to $clicked_link then you would expect that they are "copying" from any search engine configured in the toolbar.
Now the interesting thing to reverse engineer is what other information might be passed along to give relevance to the search term/click pair. If Google could establish that there was a third piece of info in the tuple, such as "originating search domain" and that Bing used this to weight term/click pairs based on the authority of the source, Google's claims would hold more water. I suspect that Bing has to apply some kind of validation of the term/click pairs (for instance, only sending pairs that appear on the same results page from accredited engines), otherwise they would be subject to "Bing bomb" attacks where users or botnets vote up lower ranked (or even unranked) clicks for a given term. (And if they don't validate or detect gaming, then there would be ample opportunity to inject all kinds of synthetic behavior into Bing's search results. Based on the relatively few number of users and clicks it took to own a long tail term, it seems like the protection they have is very weak or simple.)
This makes a lot of sense, and would have be easy enough for Google to test as well, creating some tiny, brand new, never before heard of test search engine that Bing would have no reason to copy, see if the same thing happened.
The article suggests that the Bing toolbar monitors what its user click and uses that information to improve Bing search results. Is that what you have conclusively proved?
I'm interested in another experiment. If you set up a honeypot, search for the term, but never click on the link, does the honeypot start showing up in Bing? The article doesn't say whether you tried this. Did you try it? Are Bing scraping your results from the page or only tracking their users clicks?
Anyone can test that Microsoft's software sends the clicks back to Microsoft, although I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.
Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.
Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.
> so I don't believe Bing has been able to crawl our search results directly
Isn't compliance with robots.txt more of a voluntary thing?
I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)
For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).
I am not a lawyer, but as I understand it there is some precedent in the US of intentionally ignoring robots.txt being unauthorized computer access, exposing you to all the liability that entails (possibly criminal).
I did not say "similar data" because "similar" is a bit too slippery a word in a technical context. There's too much plausible deniablity. What I am asking is if Google's tools send data back to Googleplex to be mined for the sake of search engine improvements.
Quote from the article: "In fact, Google stressed that the only information that flows back at all from Chrome is what people are searching for from within the browser, if they are using Google as their search engine."
I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.
Edit: You can uncheck everything under privacy and it will still send those requests.
Edit2: What it sends back looks something like this:
Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.
That's troubling. I'd be very interested in seeing a response from Google about this. Are you aware of any? Also, can you use Fiddler to inspect the content of the requests? I'm not familiar with the tool.
I see this too, if I have autofill enabled, and at least one autofill address entry.
I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?
If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/
poacher69, we crawl the public web. Anyone that blocks us out with robots.txt, we won't crawl. If you check bing.com/robots.txt, it has "Disallow: /search" . So no, we won't crawl Bing's search results pages. If anything, users tend to complain when search results from Lycos or wherever show up in Google.
From my experience, Googlebot doesn't crawl pages that are blocked in robots.txt files. Check out Bing's robots.txt: http://bing.com/robots.txt - notice how /search is disallowed. That typically means that Googlebot isn't able to access that page. The same for the other search engines, it's more down to if they specify (through robots.txt) that Googlebot isn't allowed to crawl those results.
Given that Google appears to have an active program to monitor the results of search queries on Bing and to track Bing's page rankings and the ways in which they change over time in (How else is this more likely to have come to their attention?), they should hardly be shocked, shocked to find Microsoft doing something similar.
 I have always suspected that the real value of Bing for Microsoft is to prevent Google's data mining of queries originating in Redmond.
... and feeding that data back into their own search results, rather than just using it for analysis to see how your competitors are performing?
If I'm in the business of giving horse racing tips and I read your tips to see what your strike rate is compared to mine, that's one thing. If I start tipping the same horses as you, purely because you tipped them, that's quite another thing.
Microsoft isn't using strike rate data in the important sense which you imply - the strike rate for Google is advertising revenue. It's not as if Microsoft is collecting info on the advertisements displayed and then soliciting those advertisers to spend their dollars on Bing (at least that's not part of the allegations). I am pretty confident that Google feeds every bit of legally collected relevant data back into their search algorithms.
no, the "strike rate" is precision/recall. Good advertising CTRs is a side-effect of relevancy. Relevancy is measured by precision and recall.
edit to expand: If the measure by which a search engine evaluated itself was advertising revenues, they'd all have massive intrusive adverts, and no users. The only viable measure can be the quality of the search results themselves. As a happy coincidence, if you build something capable of delivering high quality results, you can very easily use that to produce highly relevant adverts. Imagine that each advert is like a little webpage, and rank them just the same as you do for normal webpages. (caveat: there's no link graph for adverts, so we're reduced to using a simpler text mining approach, eg bag of words vector space la-di-da).
I believe that Google and Microsoft evaluate their search engines by entirely different measures. Google primarily by advertising revenue, Microsoft primarily measures by preventing searches using Google. Sure the ad money is nice for Microsoft, but they don't need it for their business to be profitable and they would be doing major research into search anyway because of its importance to businesses - they sell databases after all.
This whole episode points to the sort of counter-espionage operations the two companies are engaged in. Look how important a propaganda victory is for Google? It strains credulity to believe that the release of this information on the day of the panel discussion is pure coincidence.
Microsoft, historically, have been kings of the desktop. With the rise of the web (which post-dated MS's rise), the desktop has become less and less relevant. Google is fast replacing them - my email, documents, search, advertising, analytics is all handled by Google. I don't use Microsoft for anything in my day-to-day life. Even on my main windows machine, my files are in my dropbox, outside of MS's control.
MS are desperate to regain control. Google will soon launch their own web-centric OS properly, and bam, MS will have no business apart from selling to an ever-dwindling number of companies who can't believe MS don't rule the roost any more. In 20 years they will simply cease to exist if they can't come up with a world-beating online product and win back control of people's computing lives.
Notice how they're diversifying into games and search in order to prepare for the worst case; that their core OS and 'boxed software' business fails.
I think you may misunderstand my position. Bing is basically a research project for Microsoft. The ad revenues don't really matter, the data they collect does - it's learning v. earning. It makes sense for Microsoft to spend a billion dollars because better search algorithms have application for their B2B products and services. If they recover some of their R&D costs directly through advertising revenue, that's a windfall to the overall bottom line.
Bing copying Google's search results is just evil. Not like copying Apple's iPhone design and user interface and giving it away to Apple's competitors, which is good. Right?
It's ugly and immoral and probably legal. Good job catching Microsoft at it (and I think the really really unethical and scary bit is that Microsoft is cheerfully stealing info from users via their browser). I also realize that Google got where it is in Search by innovation and iteration, and that Google's search team has nothing to do with Android per se, but you might see how Apple people feel about the business empires built on stealing their ideas.
I don't think Android devices being similar in some ways to the iPhone is even remotely analogous to what Bing is doing here. One is called healthy competition (and I don't see how Android is copying iOS). The other is literally just copying data.
They are similar in that they both are big touchscreens with no physical keyboard. Apple did not invent that by any means. That is also a shot of the Android App Drawer and not an actual homescreen. Most Android homescreens I've seen have a few widgets on them and do not look anything like a big grid of app icons like the iPhone. Also, that "original" Android phone design you point out looks a hell of a lot like a Blackberry. Android must have blatantly copied RIM by your logic.
I usually criticize Microsoft, but I am on their side in this case.
Why shouldn't Microsoft be using this kind of data? Google search result pages are part of the internet just like any other publicly available web site. Microsoft monitors what the users are clicking on Google and probably on Bing and other sites. So what? Monitoring users is not a new thing. It may be unethical and I may personally hate it, but almost everyone is doing it.
Google should stop whining about this and make their search result the best they can. If they had the best search engine, Bing could come close, but never overcome Google by just copying part of it.
Great panel session. One of the best I've seen. I think you made your point, although I think MS did a good job neutralizing it too.
My question for you Matt... is there any way for Google to build a toolbar that effectively does what the Bing toolbar does (or even a joint one?). I jump to use the various search engines because no single search engine is sufficient. But clearly when Google isn't sufficient, you don't get the value of when I go to Bing. And vice-versa (as I don't use the Bing toolbar currently, but that may change now). Or do you feel that with 65% of the market, you don't need this info?
I can't argue about the legal vs illegal. I sure would like to tell the bing team that they should better google in their attempts. This strategy is mere copying into- 'look, I am as good as google'. Microsoft never learns from past do they??
I can't help but feel a little surprised that Google found it ethical to lie on their search results page for any reason; and clearly this trojan horse page was a lie. Certainly at every search engine I've worked at we always said, "We can put up adds and help, but we can never outright lie." I suppose that's ameliorated by the fact that this was an internal experiment, but still..
> The day after that, Bing contacted me. They were hosting an event on February 1 to talk about the state of search and wanted to make sure I had the date saved, in case I wanted to come up for it. I said I’d make it. I later learned that the event was being organized by Wadhwa, author of that TechCrunch article. [emphasis mine]
So the supposedly independent author of an article on TechCrunch that kicked off a massive wave of Google criticism is, less than a month later, organizing events specifically for a Google competitor? Boy, that sure seems above-board.
Matt Cutts from Google and Rich Skrenta from Blekko are also speaking at that event (not to mention Peter Thiel, Esther Dyson, and Malcolm Gladwell). It's an industry event that happens to be sponsored by Bing, it's not really an MS event though.
Uhh... Yeah? Everyone in search does this. I've worked at and with 3 major search engine initiatives, and we all tested heavily against Google in a variety of ways.
But the article definitely gets a few things wrong. For example, having worked at Bing I can tell you this: in general "obvious" misspellings are autocorrected without comment. It's not some sort of magical copying procedure, it's actually a policy. Want proof? Here's an example query you can repeat: http://fayr.am/4KdG (direct query link: http://fayr.am/4JZD)
But otherwise, shit yes everyone is scrutinizing google trying to figure out what they're doing. That doesn't mean other players aren't doing their own optimizations, or even running relevancy metrics against other search engines. Relevancy is not a concept with fixed metrics, and every player in the search market does everything they can to figure out what their competitor is doing.
And even the raw results leakage is fairly par for the course. It's not like Bing searches are a crawl of google searches; Microsoft gets this data from browsers running this toolbar and uses it to help shore up queries where they don't return good results.
Edit: The same appears to be true for mbzrxpgjys and indoswiftjobinproduction
Edit 2: Hey, that's weird. Adding a comma, semicolon, period, or other symbol to the beginning or end of the query makes the gamed results show up on top at thor and www as well. Seems to work for all the terms at issue:
There were a number of bugs around the first result that this whole thing uncovered, so it could have been intermittent. Our backfill in the case of no results can vary. It shouldn't have been showing anything ever beyond the first result though.
Quotes provide you with exact match results. Putting quotes around an entire phrase isn't what I intend sometimes, and quoting each term in the query is just excessive. In that case, I'll just click the extra link...
Well, I'm not seeing that result now, likely because the results are now filled with actual news about hiybbprqag. But if Vanessa Fox Nude was ranking then that likely means that some signal was associating hibbpraqag with Google and that either they're using (at least in part) a really old index or the crawler they're using doesn't follow redirects very well.
vanessafoxnude.com has been redirecting to my current site for several years now, but back when the original site was active, much of the incoming anchor text was related to Google and search.
Isn't this just the McDonald's v Subway/Burger King example? McDonald's has the research and foot traffic. Rather than do your own, watch where the successful McDonald's go and then put your restaurant across the street.
If you have a bunch of users searching for "XYZ" on a different search engine and consistently going to link A -- wouldn't that imply it was relevant? You'd do the exact same thing for searches on your own search engine. The only difference is people have opted in to allowing you to have this info _implicitly_ by going to your search engine vs giving you this permission _explicitly_ by clicking through the EULA for the toolbar.
Indeed. Drucker somewhere makes the point that a business's key activities are innovation and marketing -- which implies you don't necessarily have to be innovative in your marketing, just good at it. Hmmm... I was just thinking that rather than do my own exhaustive search for startup companies to invest in, I'll just check who has gotten support from Y Combinator and offer them a deal, piggybacking off of Paul Graham's work. Wait, it's been done? Oh, never mind.
As a practical matter, I doubt customers will care so long as they've always had the option of turning off that part of IE's behavior. I mean, when did you last care about the authenticity of your phone directory's information?
That sounds all well and good, but in terms of cost vs. benefit, it's way easier to make a good application by stealing the years of hard work of the industry leaders than it is to reinvent everything and try to come up with your own clever tweaks to improve it. If you could do it without getting caught, it would practically be a no-brainer.
Yes but it is one thing if a small startup does it, and its another if a giant like Microsoft does it. In a way its applying a double standard, but i think Microsoft has enough money to invest is innovating in search space. Ultimately you can't really prove its illegal, so its a matter of ethics. Strangely we would all encourage a startup to copy what it can, so it can focus on the core innovation and not be bogged down, but we look down on Microsoft. Why is that?
Kind of indicative of where the internet is heading these days, a few big platform players which everyone else is piggybacking on. Like Windows Live Messenger attempting to stay relevant by combining facebook chat with it's own network.
Not a PR win for Google from my point of view. I'm a huge Google fanboy (daily user of the search engine, Gmail, Google Apps, Android, Google Voice, etc), but this whole situation is a PR stain on them for me.
Firstly because I think they originally misunderstood the manner in which Bing's results were being influenced by their own, and then secondly because if they are going to complain about Microsoft collecting information about their user's usage patterns -- well, that's really, really hypocritical coming from Google. Lastly because the whole thing smacks of high school level gossip. If Microsoft is really doing something out of line, handle it in some other way than engaging in a gossipy blog war.
To reiterate, I'm actually a Google fan, I'm OK with trading some privacy for useful services, but if they are going to bang on Microsoft for collecting user usage information, well that's about the worst case of the pot calling the kettle black I've ever heard of in the tech industry.
I'm happy for Google. Now can they get back to improving their search results? The changes they made last week (http://news.ycombinator.com/item?id=2152286) did help some technical searches, but many results are still being overwhelmed by SEO spammer crap.
I agree in principle, disagree in detail; this is a loss for Bing. Google isn't going to get much further benefit out of this but Bing is going to have egg on its face.
Discussing the hypotheticals of the situation as others are doing is interesting (serious), but irrelevant. The court of public opinion isn't going to care about that nuance and will find against Bing if this goes viral. All of the other defenses won't matter either, "everyone is doing this" and so on. Public opinion won't care.
I have to agree with you. After reading all comments in the thread it seems like the best bet for Google is to use this to make MS look bad. Using the cheating in school test analogy. If you get caught cheating, it is a blow to your reputation.
My take is this: the whole Google ethos is that they are trying to have the best algorithm to give the best results. Outside of this sting they have always been at pains to put forward the view that nothing is manually ranked.
I think the same thing applies to Bing here: if they have a generic algorithm that ranks results based on toolbar (or other data) it could be easy to see how their data is skewed by Google given the amount of traffic Google search gets compared to the rest of the internets. This seems fine to me.
But if their algorithm does stuff with activity on google.com because it is google.com then this is a pretty clear foul - it is both essentially copying, and the equivalent of manually ranking results (specifically, Google results)
The corollary of this is that if their algorithm is generic, then it will still work if Google were to cease to exist. If it's not generic, it would be useless without Google.
When asked by SearchEngineLand, Google's Singhal seems to imply Google Toolbar clicktrail data is never used for ranking, but his wording is actually a bit vague:
Absolutely not. The PageRank feature sends back URLs, but we’ve never used those URLs or data to put any results on Google’s results page. We do not do that, and we will not do that.
Matt Cutts, can you clarify if Singhal in fact meant the 'narrow' or 'general' interpretation above?
And, if the 'general' meaning, then is there any statement about the use of clicktrail data in Google's published privacy policies that is as strong as Singhal's?
Like companies give a crap about "user experience" unless that means earning more and more money.
Piggybacking like this should really be copyright infringement or something, as there's nothing morally right about it.
On the other hand Google should be more quite about this, after all they've built their businesses database for Google Maps / Google Places by piggybacking third-party services like Yelp and TripAdvisor. And now all of a sudden when I'm searching for "restaurants" I have to scroll the page to get past Google's own crap.
Your comment begun with +1. It is now at -1. So of all the people that looked at it, some modded it, and of all those people, there are two more people who downmodded it than upmodded it. Two.
Right now the post has +186. So assuming that half of those people read your comment, we have something like 93 people who read your comment, and there are just two more downmods than upmods.
Unfortunately, HN shows people a comment's score before they read it, which skews results. People aren't objective, they tend to upmod comments that are already upmodded and downmod questions that are already negative.
So if you are unlucky and the first one or two people to read your comment are constipated, you get a negative result for not fault of your own. Bad luck, try again.
Please don't be discouraged. Try to be helpful and constructive and to present a point of view that others may have missed in the conversation. The upmods will eventually follow. Cast your seeds and let a thousand flowers bloom.
Sort of petulant on Google's part to release this, no?
Of course your competitors are going to copy you. It's not innovative, and you might consider it 'cheating' if you forget that each and every one of us are building off of a foundation laid by other people. But it works, and that's why it happens and will continue to happen.
No it's not. That's absurd. A clicked search result is a successful product. Bing is taking note of a competitors successful product and using that information in it's own decisions on the products (serps) it produces for it's users.
This is a bunch of microsoft haters making hay over nothing at all. Quit whining. It's not theft, it's not any more privacy-offensive than anything Google does, get over it.
I agree with you 100% if that's the case, ( and matt Cutts ' comment suggests it is) . My argument were for the case if Bing were to crawl google results page, (programming equivalent of Searching on google and feeding results on own db as results for the term ), not considering user clicked information.
It's not a wrapper. Bing isn't passing requests to the Google API and then returning the results on the Bing page. Keep in mind that 93/100 of the seeds Google injected into Bing's database were filtered out.
"3-d movies are a great idea, lets take scenes from Avatar and put them in our movie" I don't think excessive simplification lends justice to the issue. Just like how piracy isn't necessarily theft, what Bing is doing isn't necessarily copying.
1. User does a search in a Microsoft toolbar, using Google as his search engine. User is searching for $terms.
2. User gets a results page. User clicks on the entry in the results for $site.
3. Toolbar sends back to Microsoft that the $site was the first result the user chose for $terms.
4. Bing uses this to increase $site's placement in searches for $terms.
An interesting question then would be whether or not Microsoft also "copies" from Bing? That is, if you are using Bing as your search engine, do they still use the fact that you went to $site after searching for $terms to adjust the rankings?
So in an effort to be as good as a competitor MS is watching what you do when you interact with that competitors website and sending that information home. Seems like a really big reason to suggest to anybody you know that they uninstall the Bing toolbar.
Google gathers lots of user data on 3rd party websites via services such as (to name a few):
- Google analytics (opted in for data sharing)
- Google toolbar
@Matt Cutts - I'd love it if you could confirm exactly which user data you DO and DO NOT use to influence rankings. Or, at the very least say on record that you don't do what Bing are doing and use data from bing.com
Overall, I'm not surprised that Bing are doing this for some keywords - all the major search engines use a massive number of different signals. I'll be more surprised if it turns out this is happening at a large scale or for competitive terms.
It's a little late for me to elaborate but here's what I meant: Microsoft has a long history of copying its competitors and calling it "innovation". Absent any other evidence, that tends to put the burden of proof on them.
It's instructive to think of the cases where Google can return a search result, even though the searched word doesn't appear on the page. Most often, this occurs because another site includes an outlink to the page, with the searched word. That is, they're 'copying' a publicly-available source that indicates that word is associated with that page.
I see this Microsoft tactic as similar. They're considering search terms that resulted in a visit to the page from other search engines as being important indicators of the page content. If they have that URL-to-URL-trail data legally, and the signal works well, and they are not singling out Google's URLs as the only source of such a signal, I'm not sure what the problem is.
Google didn't get where they are by throwing out legally-collected useful data, and Bing won't catch up to a leader who has clicktrail sensors everywhere, via analytics/toolbar/ads/mobile/etc., by throwing away legally-collected useful data.
1.Bing is inferring search results from user behavior, collected via Bing Toolbar
2. Google team makes an experiment: using Bing Toolbar to feed Bing particular behavior. Namely, they all go from a search result page on Google.com laden with a unique word to a particular target site.
3. Bing infers connection between the unique word and the target site.
Wow, I'm surprised by all the developers on Microsofts side on this one. Google spends a lot of money developing proprietary algorithms for determining search results. Microsoft is then stepping in and taking advantage of the money Google spent by copying some of their results. It's rather like someone taking the results of a Consumer Reports list and publishing it themselves. It borders on illegal, and it's definitely shady.
But what I think is more important is all of the flak that Google has been catching for supposedly slipping in its quality of search results. If it's quality is so poor, then why is Bing stealing its results? It's a great method of striking back at the negative PR they've been receiving.
"If it's quality is so poor, then why is Bing stealing its results?"
Bing is apparently using toolbar click data (AFAICT it hasn't been shown that this is specifically targeted at Google or even at search engines in general) when it has no other information for the given search term. That has very little relation to the quality of Google's search in general.
So what? Is it a scandal that Walmart and Target both send employees into each others stores and actively monitor prices on items? It's called being competitive, and to be competitive you have to at least match what your competitor is doing, then beat them.
It's interresting. A little bit like browser wars, isn't it? Browsers are really similar between themselves. If any new noteworthy feature appears in one, it is very likely to be copied to another, which is a very good thing for end users and is a reason for which competitiveness is good. At the end of the day, users want more-less the same functionality, no matter which browser they use. There are some differences in details and quality, but rather minor.
Both Bing and Google are targeted towards mass market and I think people expect the same from both. If Google does it right, there is nothing more to invent. And even if there is, it is probably pretty expensive. It is so much easier to copy than to invent from scratch, just to get something almost exactly the same as Google :)
I am really interrested in what could Bing do to be REALLY different or better than Google. And if they did, Google would most likely do something very similar :)
IANAL, but in certain jurisdictions, most certainly yes. Many countries have copyright laws that protect compilations of things that are individually not worthy of copyright, for example telephone books. Copying down an individual telephone book entry is of course not a copyright violation, but copying the whole listing in a systematic fashion is.
I'd guess that this law applies to search engine rankings as well - rankings/listings of individual items that are not protected by copyright, but where a lot of effort goes into producing the listing itself.
- generally speaking, the conclusion seems to be that for regular queries, Bing uses mostly other clues to figure out relevance, so this is basically a storm in a cup of water. Regardless, since both Google's and Bing's algos are closed-source, we're going on faith when either company says data gathered from one of their products doesn't affect search quality.
- the whole thing about making a ranking overrider and talking about it publicly seems like a stupid move. Why in the world would you say you developed such code and then "deleted it" in an all-code-is-version-controlled-these-days world? This won't go very well against the claims that Google gives preferential treatment to its own services (e.g. email, maps) vs competitors.
- The experiment reportedly was triggered because Bing results were getting better for misspelled searches. But, seriously, returning wikipedia as the top result for something with low levenshtein distance to a rare word is not exactly rocket science...
- if Google feels that its SERPs are the most relevant possible, shouldn't it make sense that competitors trying to improve relevance will inevitably end up showing the same results as Google on at least a subset of queries?
- if you're saying Bing has just as good results as Google, regardless of the means to the goal, then how does publicizing that help the whole "Google's overrun by spam" meme going on?
<quote>But, seriously, returning wikipedia as the top result for something with low levenshtein distance to a rare word is not exactly rocket science...</quote>
Actually, that is Google's core business, and from the amount of revenue it's generated most likely harder than rocket science. The generation between keyword and website content/results is what a search engine is all about, and what Google does (arguably) well.
I believe why Google is crying foul is because it is the only reference to generate the mapping between the keywords they made up and the website results. Bing did not have these mappings until they evaluated user clickthroughs that went through Google's results, with their browsing history going something like :
Now Bing is using the users click history to generate the mapping from keyword<=>http://website.com ; this is the shady part: if google did not generate its results, that mapping would never have taken place: the user would never have been able to tell Bing that there is relevance between the two unless Google existed.
>> Actually, that is Google's core business, and from the amount of revenue it's generated most likely harder than rocket science.
You're talking about all of the work needed to make a search engine good, I'm talking about the specific algorithm needed for that particular type of query (rare, obscure, easily misspelled word). Different scopes.
And again, my observation in that bullet point is that, imho, the "torsoraphy" type of query could have been improved by something like the "close enough to rare word? + does wiki page exist?" algorithm, rather than copying.
Re: recording click history being shady: I don't really see what's so fundamentally different between that and recording surfing habits via ads. It goes back to the first point: Google could say they don't use that data to improve SERP relevance, but we're going on faith on that claim.
It is widely assumed in SEO circles that Google uses toolbar data, among other sources, for finding new URLs to crawl. They're very enthusiastic about getting all pages on the public Internet into the crawl set. User data gets them there faster and more reliably than a hypothetical competing crawler using only e.g. the observed link graph.
Edit: I did some digging to see if I could find an authoritative source on this, and found that Matt Cutts specifically denies this particular usage of user data for expanding the crawl set on his blog. Mea maxima culpa.
Edit the second: An amusing note on this general subject: Google will fuzz test certain search forms on, e.g., high value government websites to get at the juicy data behind them which would not otherwise be reachable from just traversing the link graph.
I seriously doubt that is the only thing they use the information for, and unless someone can find somewhere in the google toolbar/chrome EULAs that specifically asserts what they will/won't do with the data, I'd assume the they use it for all sorts of things.
I don't think that's true - at least, I've not seen any evidence. There's a good chance the Google toolbar sends back browser data, but I think it's extraordinarily unlikely that Google snoops on what people are searching for on Bing, and what results they end up clicking on, and then adjusts their search results on the back of that.
I suspect that this isn't the only way they determine search results :-)
My guess is that they have a relevency metric from their own algorithm, but some results return poor relevency results -- then they may go to this set of secondary results and say, "Are there any results for this query that a higher percentage than expected users clicked on" and then add those results to the list.
I'm really curious as to how this is different than Bing using google's search results in some form of aggregate pageranking. If we assume that some arbitrary metric of "authenticity" exists for searches and a search for mbzrxpgjys results in results in a low (<0.1%) result for authenticity, but Google suddenly declares that www.page.com is the foremost authority in mbzrxpgjys's, it stands to reason that a good page-ranking scheme would take that into account and bump it to the front of the line.
I don't think it's cheating, no where in the article does it claim that they aren't doing their own search, they are just using Google's results as part of their own search algorithm. Is that really such a crime?
While the "cheating" angle on this seems hugely overblown, I do think that companies that harvest data through toolbars etc. should be obligated to explain upfront in clear language how they use the data. Not bury it in the legalese of a vast impenetrable ToS.
Give me a break, MS has always played the fast follower game which means they will ride on the work and investment done by the market leader and it's worked out well for them in other parts of their business.
Using signals from user behavior on the toolbar on ANY search engine seems to make a lot of sense when it comes to improving search results. MS employees are the biggest QA group for Bing. Internal tools allow employees to tag queries and results that are superior/inferior to Google. Both are displayed side by side and employees provide active feedback to help improve the algorithm and identify more systemic underlying ranking issues.
Instead of whining, I would have gone on the offensive.
So we have a competitor copying our search results. Great. Now how can we fuck with that?
Figure out the requests coming from microsoft and return a different set of search results (e.g. XXX stuff) so that it doesn't show up for organic google resutls. Set the trap and once bing has incorporated those results for a keyterm, spam TC and LOL at Steve Ballmer gettingn worked up.
This discredits the relevancy of bing and all that PR dollars spent rebranding would have gone down the drain. Imagine searching for a harmless search term like 'poodle' and getting hardcore triple xxx results.
This is an ultimate opportunity for Google - Can't they somehow spoof the results that are sent back to Bing. I now if someone was cheating off me in an exam, I would try and give them the wrong answer.
I'm not sure about this. It almost sounds like Google is posturing. The reason I say this is that while Google was getting bombed up until last week with scraper sites, Bing wasn't.
If Bing was really copying results, they would have reflected the spam sites, because people click on those when they are highly ranked just as often as they click on the originator site. After all, the problem is that the content is identical.
Well, the article suggests that there is more to it than just mere copying. The way I see it, Bing is basically taking a look at what is that the user does when it searches for a term that is unknown to Bing. Once they have this initial data, they are able to filter it as they see fit to avoid getting spam results.
The allegation is that Bing is/was copying results for long-tail queries. The specific example being some very misspelled words. So in no way is the claim made that Google and Bing results are identical (even for most of those long tail results, given that it's only the top ranking pages).
Suffice to say, Google’s pretty unhappy with the whole situation, which does raise a number of issues. For one, is what Bing seems to be doing illegal? Singhal was “hesitant” to say that since Google technically hasn’t lost anything. It still has its own results, even if it feels Bing is mimicking them.
Funny... that's the exact same argument software / music piracy often makes.
Google wasting time embarrassing Bing? I think this is more interesting that learning what Bing are doing.
Google are clearly monitoring Bing (and others) as a matter of course. I'm interested to know what they'd have done if they'd found Bing providing better quality results. Would they have spent resources trying to figure out what Bing were doing right, or would that be "copying" too?
The biggest article surprise for me was Google's claim they don't use the toolbar or Chrome directly to improve search queries. I assumed measuring bounce rates and patterns in link graph traversal across the entire web was part of their raison d'etre, as with Google Analytics
Since when has reverse engineering been cheating? If the article is correct, there still is no allegation that Google's algorithms have been used. I don't think Google is in much of any position to cry foul over any company using data mining to tailor search results.
"These searches returned no matches on Google or Bing — or a tiny number of poor quality matches, in a few cases — before the experiment went live. [...] Only a small number of the test searches produced this result, about 7 to 9 (depending on when exactly Google checked) out of the 100. Google says it doesn’t know why they didn’t all work, [...]"
The writer apparently thinks these results justify concluding the article with this takeaway:
"When Bing launched in 2009, the joke was that Bing stood for either “Because It’s Not Google” or “But It’s Not Google.” Mining Google’s searches makes me wonder if the joke should change to “Bing Is Now Google.”
The balls on MS people are fucking amazing: "Harry Shum, VP of search development at Bing, responded by admitting that Google had uncovered a new form of search fraud, and said he wished Google had spoken to Microsoft about it before taking it to the press". So bing is either (a) scraping all web behavior out of ie, or (b) scraping G's search engine results, or (c) both -- and dude is pissy because G didn't give them time to get their lies together in private? Amazing.
ps -- there's a word for what MS's software appears to be doing: spyware.
Look at it like a math problem... Google takes all numbers and figures out the answer itself. Bing writes down the answer from Google's paper. They are hardly the same. Your comment is somewhat like saying "Hey, we're all writing down numbers"
Think of it this way: Your professor asks you for the name of the imaginary friend that the professor's son always talks about. You have no idea what the answer could possibly be, so in your head you choose "John," a reasonably common name with a very low probability of being right. Google answers with a strong conviction in it's voice that the answer is "Mark," another common name. You have no evidence to believe that your own guess is correct, so in the face of the appearance of belief on Google's behalf, aren't you the least bit tempted to say that it's Mark and not John?
Actually it is a test, and how well you answer it is directly responsible to how your business will or won't succeed.
Google has worked out their algorithms for processing the incoming data and generating an answer. Bing has apparently used Google's answer as a comparison to whether they are getting it right or not and when not, subbing in the other answer.
While in business this may not be illegal, is it still very much 'Not knowing the correct answer to the test'.
No, it's a product. Come on. Every product is a "test" of the market so in that one definition of "test", yes. But not in an academic sense, which is what you meant. Because only in an academic sense does the concept of "cheating" exist.
Look at it like a machine learning problem. If you slavishly match the test set you are doing something called overfitting. Your performance on the rest of the web will decrease, because you inherit bias from the test set.
I personally don't mind if one is copying the search of another. The whole idea is to get the BEST search results possible. And thats what I use a search engine for is getting by far the best results possible. I don't mind how they do it and as far as I can tell they aren't breaking any copyright laws...
In any case, if Bing is indeed copying Google's results, it is utterly unethical even if users are receiving accurate search results. As mentioned in the article, what Bing is doing is analogous to copying Google's "exam," which by any reasonable standard is wrong, even if a "proctor" didn't specifically state this rule.
Perhaps Google has patented a few of its search algorithm components.