Hacker News new | past | comments | ask | show | jobs | submit login

As many others have said, I don't think that using click data from the browser/toolbar as one of thousands signals can be considered "copying". When doing a query with a nonexistent word, all the other signals are zero because there is no knowledge about the query, so the only remaining one is the history of clicks "sniffed" from Google/etc... SERP. OTOH, on real world queries the signal has probably a relatively low weight.

I don't think that it is a secret that Bing uses click data from browser/toolbar as a signal, it's just a not well known fact. For example in the paper "Learning Phrase-Based Spelling Error Models from Clickthrough Data" (http://aclweb.org/anthology/P/P10/P10-1028.pdf) by Microsoft Research, they explain how to improve the spelling corrections by using click data from "other search engines".




The paper you mentioned appears to be saying that Microsoft is extracting spell corrections via clicks on Google. That's pretty surprising news.

I just pulled down the paper and noticed this: "The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser .... In our experiments, we "reverse-engineer" the parameters from the URLs of these sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired"

Some of the recent discussion has been about whether Microsoft looks at lots of different sites vs. doing something special or different for Google. This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction? Figure 1 of that paper looks like Microsoft is using specific Google url parameters such as "&spell=1" to extract spell corrections from Google.

Targeting Google specifically is quite different than using lots of clicks from different places. It looks like you work at Microsoft--can you say any more about this?


> The paper you mentioned appears to be saying that Microsoft is extracting spell corrections via clicks on Google.

Well, no, that's a research paper that says that they have made experiments in that direction, but this doesn't imply that this is currently done in Bing. But it gives an hint about what kind of data is available from the "log files from a commercial Web browser".

> Targeting Google specifically is quite different than using lots of clicks from different places.

From the article, they have handcrafted rules for both Google and Yahoo, that together with Bing have (I think) the 95% of the market. I'd say they are not targeting Google, they are targeting the majority search engine users. There just happen to be only 3 major search engines, so a few handcrafted regexes are sufficient.

I wouldn't be surprised if Google Maps has handcrafted (or manually tuned) scraping code to extract reviews from Yelp and other major review sites, and same for Google News for the extraction of the news body from the major online news sources. How is this different?

> It looks like you work at Microsoft--can you say any more about this?

Yeah, I should have been more clear about this. I am interning at MSR and have some involvement with Bing (and actually worked there last year), but my comments are personal and about facts that are public.

BTW, IMHO using the click logs can't be considered "copying", more like "a way to discover new sites to crawl and the keywords that lead to them". This is not copying the SERP results.

Since it "looks like" you work at Google :) can you answer this question (it was also asked here: http://news.ycombinator.com/item?id=2165963)? Doesn't Google use Chrome to get traffic statistics, through the opt-in "send usage statistics" and the malicious site protection?


>I wouldn't be surprised if Google Maps has handcrafted (or manually tuned) scraping code to extract reviews from Yelp and other major review sites, and same for Google News for the extraction of the news body from the major online news sources. How is this different?

Sorry, but Google drives traffic to their sites. That's what a search engine is supposed to do. Msft just scrapes Google's results and presents the data as its own.


> Sorry, but Google drives traffic to their sites. That's what a search engine is supposed to do.

Then why are newspapers not so happy about it? http://www.guardian.co.uk/media/2009/nov/09/murdoch-google

And, BTW, just to be clear, Msft can't "scrape". That would violate robots.txt.


> Then why are newspapers not so happy about it?

Rupert Murdoch and his kin are shortsighted, blustering fools when it comes to the 'net. Relying on their attitude to make your point is counterproductive at best.


"Doesn't Google use Chrome to get traffic statistics, through the opt-in "send usage statistics" and the malicious site protection?"

I saw that Peter Kasting from the Chrome team commented on this question at http://www.mattcutts.com/blog/google-bing/#comment-712619 . Here's what he said "I work on Chrome and we absolutely do NOT collect clickstream data through Chrome. Not even when you turn on the off-by-default “anonymous usage statistics”."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: