Hacker News new | comments | show | ask | jobs | submit login

Here's an alternative hypothesis: the bing toolbar might look for explicit search queries (either strings entered into a textbox, or q=, query= parameters), and navigation from such pages to external domains. This would match all "search engines" in the most relaxed meaning of the term: product search, thesaurus, lexicons, dictionaries, everything; and I'd argue to be a legit signal for a "general search engine" to match.

(Legit sidenote: Google has, via the use of Analytics data, a mass coverage of clickstream for the whole web, which are default opt-in, follows you everywhere, and can identify you uniquely. The Bing Toolbar at least asks first.)

If this is the case, Google isn't being picked upon; rather, they are merely the first, who figured this out externally. Cookie for the scientific rigor, but no cigar for the way they PRd the story. Correlation, after all, does not equal causation.

"Legit sidenote: Google has, via the use of Analytics data, a mass coverage of clickstream for the whole web, which are default opt-in, follows you everywhere, and can identify you uniquely. The Bing Toolbar at least asks first."

Google does not use Google Analytics data in any way in our rankings. I've said that plenty of times before, but it's worth mentioning.

Honest question: why not? Surely identifying sites that have disproportionate organic traffic relative to search engine referrals can only be good in identifying places people actually want to visit online?

As a webmaster I would opt-in for this sort of thing in a heartbeat if I thought it would help your algorithms understand my site. I'm sure Joel Spolsky and most other legitimate online publishers would do so too.

Google can already calculate this ratio of organic traffic to search engine traffic using the Google Toolbar stats - no need for Google Analytics.

It will result in a positive feedback loop.

Though Analytics data IS used for things like Double Click Ad Planner.

However, Google does use Google Toolbar data for rankings.

Well, while doing this they clearly knew that Google is by far the biggest search engine in the world, and naturally most of the data would be coming directly from Google. Right?

Why does that matter? They're only "stealing" results from Google insofar as people (/Google employees) used Google in order to illustrate what pages they want returned given a certain search query. I have a sneaking suspicion that the exact same thing would have happened if they were in the position to do the same query bombing on Ask Jeeves or crappywebsearch.net as it did for Google.

Search engines to me are an obvious case of a means to an end. If a search engine better than Google were to come out tomorrow I would switch to it (from Google) instantly with no regrets. Google's sense of propriety about their results (or, more accurately, what users clicked on after searching via Google), especially given the fact that they are well-known for their penchant for sucking in user data like a black hole (not that I care-- I want them to use it if it means better searches), to me seems 9 parts hypocritical and 1 part prima donna.

Need people be reminded that this is the same company that "accidentally" logged users' WiFi browsing habits while driving StreetView cars around Europe? Give me a break. Everyone is guilty, and no one is going to do anything differently now than they did before.

It matters because if you use your main competitors results and ranking to change your results and ranking, you can no longer claim you are original, innnovative or better. It just taints everything you did.

They didn't use either. Certainly not the ranking. Also, you're obviously better if you do-- in fact, you're better by definition. :) (innovative and original, perhaps not so much)

Why does the user's click from the results page suddenly belong to Google (apart from the fact that in this specific case they actually artificially created a fake long-tail result)? If I Google Bing, and then Bing's ranking of Bing goes up a a result (not that it's not already #1, but whatever), can you actually say that it's Google's result and ranking? What if it's nytimes, or any number of extraordinarily common searches where you're really just doing a domain lookup for a name you already know?

What if I didn't click on anything until the 30th page of results because that was the only useful result, and it causes the Bing rank to go higher? Does Google have any ownership over the rank then, even if the useful page was ranked lower than much more useful results? Couldn't Google then just return a list of every page on the internet in response to every query and then claim that their results are being stolen?

To be honest, I'm not really convinced that either side is in the right here. I just think that it should be made clear that there is a large distinction between stealing results and tracking clickthrough behavior. One would be laughably shortsided and of dubious ethics, the other is basically common practice, and is being made a bit more than it is because of its superficial appearance.

They definitely used results from Google.

They use results for terms users entered to Google to crawl pages that are not in their index (torsorophy example which is not an artificial one) therefore enriching their index based on google's results, incresing their depth.

As for ranking, it is more blurry, but When you record users clicks, which directly correlates with ranking, it starts stinking.

Google use Google Toolbar to record users clicks, which directly correlates with ranking.

This, in fact, could be quite wrong. Consider the fact, that most modern websites include a search engine of one form, or another; and the usage graph of the web has an extremely long tail.

Even considering the top 100 most visited websites on Alexa: all of them has a search form, and only 20 or so belongs to Google; it's very easy to see how the aggregated usage of the other 80 could be much, much higher, than the aggregated usage of Google properties.

Therefore, while Google might be single most impacted organization in the world, most of the data comes from non-google properties. And none of this has anything to do with my original argument of the algorithm itself being benign.

I'm guessing they're only looking at the Referrer field, and they're doing it across the board. (For me, Google's result pages all point back to Google, so they can leak this info via the Referrer field.)

I'd also guess that the data from domain specific sites are more valuable than generic search sites. (User selects appropriate site, does search, selects appropriate result.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact