Hacker News new | past | comments | ask | show | jobs | submit login

If you read the article and other comments here it's been made perfectly clear that the Google toolbar and Chrome browser are not sending similar data back to Google.

Ah, at least the google toolbar does. If you enable PageRank on the Google Toolbar it sends back all the urls you visit just like the bing toolbar.

From the toolbar privacy policy: "Toolbar's enhanced features, such as PageRank and Sidewiki, operate by sending Google the addresses and other information about sites at the time you visit them."

Google has managed to demonstrate one way MS appears to be using the data. What does google do with their trove of data? That's a lot of data to collect and not do anything with.

If they want to make it perfectly clear they should add into their privacy policies and EULAs.

Yes absolutely. I don't think anyone in this thread or in the article denied that the Google Toolbar sends data to Google. And you are absolutely right that Google's use of the data collected should be clearly stated in a privacy policy and EULA. It might be, I haven't read them.

But the article clearly covers the available public statements on this issue and patio11 dug up a post from Matt Cutts in his comment below that directly addresses this: http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/.

I did not say "similar data" because "similar" is a bit too slippery a word in a technical context. There's too much plausible deniablity. What I am asking is if Google's tools send data back to Googleplex to be mined for the sake of search engine improvements.

Then what use is the word "similarly" in your comment? Similarly send? As in via HTTP requests? I think that's either obvious or irrelevant or both.

Again, if you actually read the article, you will come across the section titled "What About The Google Toolbar & Chrome?" I encourage you to read it.

[edit] Also, see this comment and patio11's subcomment further down the page, both of which were written an hour before yours: http://news.ycombinator.com/item?id=2165469#score_2165578.

Quote from the article: "In fact, Google stressed that the only information that flows back at all from Chrome is what people are searching for from within the browser, if they are using Google as their search engine."

I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.

Edit: You can uncheck everything under privacy and it will still send those requests.

Edit2: What it sends back looks something like this:

<?xml version="1.0" encoding="UTF-8"?><autofillquery clientversion="6.1.1715.1442/en (GGLL)"><form signature="8551191143090325242"><field signature="620769395"/><field signature="2995202485"/><field signature="2175865763"/><field signature="904516291"/><field signature="2953051246"/><field signature="2649047790"/><field signature="2308153337"/><field signature="1003471793"/><field signature="3255484099"/><field signature="1305698505"/><field signature="3676143819"/><field signature="1275502930"/></form></autofillquery>

Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.

That's troubling. I'd be very interested in seeing a response from Google about this. Are you aware of any? Also, can you use Fiddler to inspect the content of the requests? I'm not familiar with the tool.

I see this too, if I have autofill enabled, and at least one autofill address entry.

I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?

If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/

Is this malicious site detection by any chance, or does that use a different mechanism?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact