Hacker News new | past | comments | ask | show | jobs | submit login
Search leakage is not FUD. Google et al., please fix it. (gabrielweinberg.com)
114 points by bjplink on Jan 24, 2011 | hide | past | favorite | 159 comments



(I'm in all-day training today, so I can't participate on this thread much. Also, this is all my personal opinion.)

While Gabe's most recent post was a well-worded statement of his position, my guess is that Google's response was based on the billboard, which says "Google tracks you. We don't." On the website the billboard points to, Google employees are portrayed wearing ski masks and trying to spy on you. That does strike me as trying to a encourage a bit of fear?

This is a browser issue that's not specific to Google or even to search engines, but Google is the only company mentioned on donttrack.us until you get to the "more tools" section at the very bottom. Meanwhile, Google is the first (and only) large search engine to offer https to the best of my knowledge. It's a one-character addition to http://www.google.com for anyone that feels strongly about this topic.


Keep fighting the good fight. I've been annoyed by this DDG campaign since they debuted donttrack.us. Google never took the low road in order to gain market share and I don't see why DDG feels the need to spread FUD. Gabriel is incredibly fortunate to have received a warm welcoming and avid following from the developer community, but he is squandering that good fortune by engaging in what is essentially a smear campaign that many of us see right through. He should stick to what makes people want to use a search engine in the first place: Outstanding search results.


Uh, Gabriel made a point by point analysis of the issue, and your response is "fud" and "many of us see right through"? Can you please stick to responding to the issue and not engaging in your own contentless smear?


Gabriel has been avoiding the 2 most important points though.

1. Most of this is theoretical

2. Theoretically advertisers can build profiles of you based on the web pages you visit anyway. Referer leakage or not


The only part that's theoretical is the extent we're being tracked online but there's plenty to suggest it's "as much as possible".

- Google's targets ads based on your browsing history - that could include Search, Analytics, AdSense, DoubleClick, and a whole ton of other data they have and collect. [1]

Rapleaf tracks so much that it's actually "a challenge" not to identify people. [2]

QuantCast settled for $2.4 million for making sure their precious tracking cookies were recreated if you delete them. [3]

Facebook just tried to give apps your fricking phone number and address. [4]

Google and their referral leaking is not really "the bad guy" in all of this - there's no bad guy, just a bunch of companies consuming and analyzing as much data as they can get their hands on.

It's not evil, but is it necessary?

[1] http://googleblog.blogspot.com/2009/03/making-ads-more-inter...

[2] http://blog.rapleaf.com/dev/2010/07/20/anonymouse/

[3] http://www.wired.com/epicenter/2010/12/zombie-cookie-settlem...

[4] http://technolog.msnbc.msn.com/_news/2011/01/18/5868697-face...


It's not evil, but is it necessary?

Gathering all this data is necessary and essential for any company who's primary product is their users personal information. (see: Google, Facebook, et al..)

And, corporations being what they are, people within that corporation will optimize to try and make as much money as possible off of that information, because it is in the corporations best interest.


"More money" for them doesn't make it necessary for any of us.


It does if you feel like using their services for free. Otherwise, feel free to opt-out entirely by not using their resources.


You don't need to use Google services to be profiled - their ads and analytics are virtually everywhere.


If you don't use any of Google's services which tie an account to you then they aren't tracking you. They are tracking a browser session and once that cookie expires the trail stops. Even if the cookie never expires, Google doesn't know who you are.


You don't need an account to be tracked efficiently - if that was the case Google would be screwed since most people don't use them beyond search.


I'm starting to think we're arguing over something entirely different. Maybe some over-stuffing of phrases. I'm talking about Google knowing that I, James Simmons, uniquely identified individual, am searching for [something undesirable to have others know about]. I don't personally care either way really, because I've given them this information. Are we talking about the same thing or are you just talking about Google knowing from site A to site B that your browser's owner likes Korean pop music and chocolate cake mix?


My argument or question is why does Google (and not specifically Google, plenty of others) need to know all of that stuff?

You or the other guy mentioned "to make money" which is imo a really weak defense - we wouldn't accept that from anyone who kills, prostitutes, sells drugs, smuggles immigrants, or even legal-but-tasteless stuff like RIAA lawyer.

The data they gather is probably almost always "I like x music" or other innocuous stuff. But it's not always. What if you're searching about a rash on your junk (Google Search), or clocking up lots of views on bondage stuff on RedTube (Google Analytics), or you were browsing a forum for suicidal people and clicked a link to a site that had AdSense? Or pirating a ton of stuff?

Google and god knows who else has a lot of deeply personal information that "making money" doesn't justify - we say, search and browse very intimate stuff on the web, and we don't even know who it's being shared with.

My stance is most of it is just none of their business, even if they've chosen to build a business around it. Although I'm singling out Google they're just the easiest example.


>My argument or question is why does Google (and not specifically Google, plenty of others) need to know all of that stuff?

Alright, but do you assume that they know it is you? By name, by your identity? If not then why do you care if they know your user-agent likes bondage? Are you worried about seeing ads for bondage movies while you're not searching for bondage, while perhaps someone is looking over your shoulder?

>we wouldn't accept that from anyone who kills, prostitutes, sells drugs, smuggles immigrants, or even legal-but-tasteless stuff like RIAA lawyer.

I've noticed something about people arguing on your side of the fence. They keep dragging in ridiculous extremes to try and prove their point; Comparing those extremes to Google (or whoever) knowing information about you. Another guy in here was comparing this to being watched in your home with video cameras against your own will.

>Google and god knows who else has a lot of deeply personal information that "making money" doesn't justify

So stop giving it to them. They won't know who is searching this information unless you link your identity to an account and therefore enable them to link it to your human identity. As for whether or not it's justified I think is subjective. It's a moral issue and it's entirely subjective.

>we say, search and browse very intimate stuff on the web, and we don't even know who it's being shared with.

There was a time when you could search and browse anything on the Web without the repercussion of someone else finding out about it. But that era is over. No amount of arguing this fact will bring it back. In the Web of 2011 and beyond if you search for bondage videos and you are logged into your Google account then Google will have a way to map it to you. If you don't log in, don't create an account, then they won't. It's that simple.

In the Web of today, if you search for it, you just have to be aware that people have the technological means to know about it and there is no good reason for them to not want to know about this information. It helps them make the decisions they need to make to generate more money -- the entire purpose of a business. No altruistic causes here.


>In the Web of 2011 and beyond if you search for bondage videos and you are logged into your Google account then Google will have a way to map it to you. If you don't log in, don't create an account, then they won't. It's that simple.

No it's not. They're likely going to track IP addresses and make assumptions there. And even if limited IPv4 addresses give some protection for a while, browsers can be fingerprinted.

I'm sorry, but people like you are way too complacent. People like us know a little history. Things can get nasty.


>No it's not. They're likely going to track IP addresses and make assumptions there.

So? Unless Google is getting direct data from ISPs and requesting personal information about your identity from them then your IP address is just as useless to them as it is to me for linking your tastes to your human identity. Browser fingerprinting, in the Panopticlick sense, is just as useless to that end. At Google's scale the Panopticlick method doesn't even work because of how many people they come across. Once the number of browsers that match your "fingerprint" is > 1 it becomes useless to them.

>I'm sorry, but people like you are way too complacent.

I'm not complacent, I just know what I'm talking about. Take the tin-foil hat off.

>People like us know a little history. Things can get nasty.

You are so full yourself. You have no idea what you're talking about.



You could at least add some commentary to your comment. I've read that article and I don't see the implications to our discussion here. They linked together all the information that women (and everyone else) put out there for others to scrape. That is not the same as the keyword/referrers/logged into Google argument that we are having.


Just thought I'd show you how identifiable you really are, without your knowledge or consent or even knowing who the companies tracking you are.

And it's exactly the same as what we're discussing - Rapleaf was even one of the companies I specifically mentioned earlier. These companies don't look at any single piece of data individually, they collate as much as they can and the result is ... what I linked to.

Referrers and search terms are just two easily ended streams of data.


Yes, you are correct. It is very easy to link all this data together to create a profile about a person. They probably have more data about each of us than any of us realizes. I think even I might be surprised by how much a company like Rapleaf has put together about me.


you guys have officially crossed the threshold from disagreeing to flaming-without-purpose. dial it back a bit, or take it outside.


Well, it may be a little heated but aside from that "you'll be fucked, who cares" comment I don't think we're going to far. I doubt anyone here has had their feelings hurt.


Maybe so. If you're right, that will be great. If I'm right, well, I hope I'll be OK. You'll be fucked, but who cares about that? Not me.


NB: re Quantcast -- Quantcast and Clearspring paid a combined $2.4MM [1]

[1] http://www.research-live.com/news/legal/quantcast-and-clears...


1) actually, to my understanding it's a pretty common practice. i know that i at least have been served ads based on queries i just made, or pages that i just visited. i'd be very surprised if i was the only one here on HN who has had that experience.

2) that's certainly true, but i'd say that means we should be concerned about those alternate methods as well, not that we shouldn't be concerned about this one.


1. I don't doubt that google (who owns 60% of ad space online) doesn't have some kind of profiling on you already, but they don't need http referers to build that data. Hell, even the example email in the blog has nothing to do with referer tracking, unless gabriel is suggesting that wikipedia is running hidden ad trackers on their pages.


well, i think Gabriel's point (though he's had some trouble getting it across) is not that Google is building the profile on you, but rather that other ad networks are.

you do raise an excellent point about the email example, though. hopefully Gabriel or Matt could shed some light on that?


Profiles based on the web pages you visit is different from the search terms used to find those web sites. People are more honest with their search engine than they are with their family.

I do not place as high a value on privacy as others. I hope that doesn't bite me in the butt someday.


I'm not spreading any fear, uncertainty, or doubt with my response. In what way should this blog post rectify the view of donttrack.us and that billboard in the eyes of the people who think it's a load of crap?


Not considering the specifics of DDG billboard and the Referer angle, does it not bother you to the slightes that Google does in fact track you all over the Internet?

One is basically checking in with Google everytime he visits any site that uses Analytics, embeds a YouTube video or a font from a Font Directory, or a Google-hosted Javascript snippet. Not to mention GMail. Don't know about other people but this bothers the hell out of me. So while I don't think DDG should've focused on the Referer issue, they got the core issue absoluely right. Google does in fact track everyone.


So is Facebook with their like button, AddThis with their widgets and probably dozens or even hundreds of other services people use. At this point no, I'm not worried about it.


Your not being fair in answering the question. Your answering the specific question and ignoring the intent.

You say your not worried because other people track you as well. Do you mean if other people didn't track you, it wouldn't be okay?


To be perfectly clear: I am not worried because tracking is a part of life on the Web. This isn't going to change. This isn't specific to Google. I do not care about it and I'm not worried about it.


Would you be against tracking in real life? Cameras everyone out in public? In your home?

To what degree is privacy not an issue for you?

I only ask because your answers are essentially this:

Tracking is okay because you are being tracked.

Which, in practical terms, means little as far as opinions go. You are simply apathetic to privacy on the web.


>Tracking is okay because you are being tracked.

Wrong. I never said tracking was "ok" on the merit that it's already happening. It can't be stopped and basically because we fuel it. I give the Web a lot of information about myself. From typing queries into a search engine to storing my photos, thoughts, and relationships in Facebook. I, like millions (billions, really) of other people have willingly sacrificed a certain level of privacy by using these free services who make money off our data.

>Would you be against tracking in real life? Cameras everyone out in public? In your home?

Tracking people with cameras, or in their homes (get real, dude) is so far removed from the kind of opt-in tracking we're talking about. I call it opt-in because you wouldn't be tracked if you didn't participate. Being watched by some government agency in your home, presumably against your will, is a far leap from here. You're really grasping at straws.


> Wrong. I never said tracking was "ok" on the merit that it's already happening. ... snip ...

Sorry. That was clearly my interpretation. Thank you for clarifying. However, I disagree on several points. First, it can be stopped to a certain degree. Just because it can't be stopped across the board doesn't mean we shouldn't fight against it.

An example in context to this discussion: You give Google your data. No one is arguing that choosing to give Google your data is wrong. Rather, it's that the data you gave to Google is now being given to other sites without your knowledge.

> Tracking people with cameras, or in their homes (get real, dude) is so far removed from the kind of opt-in tracking we're talking about.

First, "(get real, dude)". I am. Maybe you didn't hear about the case about the school district spying on students by accessing their web cams in their laptops? So these things do happen in real life.

Next...

> is so far removed from the kind of opt-in tracking we're talking about.

You think people are aware that they are sending the resulting site specific data about their searches? Or do you think they are just assuming Google is tracking this? Do you really think people understand what companies like RapLeaf can do?

Did they really opt-in?

> Being watched by some government agency in your home, presumably against your will, is a far leap from here.

But being tracked by businesses you never knew existed without every visiting their site isn't a far leap.

> You're really grasping at straws.

I'm not, really. My questions were merely that. If you took them as anything more than honest questions, it's your fault. Mostly a result of you not being clear about what your okay with being tracked.

Finally, stop being antagonistic. Your insults are childish. You can disagree, but you can do so being less rude.

And if you weren't aware of being rude, you were.


>Rather, it's that the data you gave to Google is now being given to other sites without your knowledge

That is a completely unsubstantiated claim. Given to who?

>Maybe you didn't hear about the case about the school district spying on students by accessing their web cams in their laptops?

I did hear about that. It was a huge controversy and when people found out about what the school district was doing people got in trouble. That wasn't ok and people were up in arms about it. That still isn't even an example of what you were stating because it was a one-off thing that people weren't aware of and when they became aware of it the problem was fixed.

>You think people are aware that they are sending the resulting site specific data about their searches?

Perhaps not, but I did clarify in the following sentence my exact meaning of "opt-in" in this case, which you seem to be ignoring. So what if sites get the search queries? I have sites too and my Google Analytics shows what people are searching for to reach my site. Do I know who those people are? Not at all, not in any way shape or form.

>But being tracked by businesses you never knew existed without every visiting their site isn't a far leap.

Yes it is. Some Website knowing that some browser searched for some keyword is still an incredibly far leap from being tracked in your home by video cameras.

>Finally, stop being antagonistic. Your insults are childish. You can disagree, but you can do so being less rude.

I have not insulted you once. Telling you to get real is not an insult, not under any stretch of the imagination. Especially given the context of the remark. Perhaps I am being a little harsh, but only because your argument went from worthy of a response to completely off-the-wall.


> That is a completely unsubstantiated claim. Given to who?

Google allows the forwarding of your search information to other sites. It's not unsubstantiated, and well established. It's the point of all of this. Yes, it's being done in your browser. But it can be fixed on the search engine end.

> That still isn't even an example of what you were stating because it was a one-off thing that people weren't aware of and when they became aware of it the problem was fixed.

Yes, when people are made aware of it, they fixed it. Like in this case, educating people on the problems pushes for fixes. The problem here is people aren't aware that the information they are sharing with specific sites are being shared with other sites.

Basically, the issue at hand isn't you sharing your data with specific sites; it's that you are sharing it with sites you aren't aware of.

> Perhaps not, but I did clarify in the following sentence my exact meaning of "opt-in" in this case, which you seem to be ignoring. So what if sites get the search queries? I have sites too and my Google Analytics shows what people are searching for to reach my site. Do I know who those people are? Not at all, not in any way shape or form.

I'm aware of your 'opt-in' remark, and I didn't ignore it. However, no one is 'opting-in' to sending their search queries to your site. Oh, I'm sure their are people who don't care. And for your site, it's not a problem. You don't know who these people are.

But these people aren't saying "Yes, send my search data to coderdude's website." They are searching in Google, and the net result is coderdude get's this information. The next step is what you can do with that information when coupled with other tracking information.

We've already seen how specific Facebook ads can get in the past when you have a lot of specific information about a user, and we already know RapLeaf and others store a lot of information.

> Yes it is. Some Website knowing that some browser searched for some keyword is still an incredibly far leap from being tracked in your home by video cameras.

Sorry, I really didn't mean to tie the two so closely together. My intent in asking the question was really just to gauge how important privacy is to you. I know people who are quite fine with the whole "tracking everyone if it fights terrorism." Anyways, the point of the last statement was merely to say that to some, spying on your activity at home is essentially spying on your personal internet traffic. Indeed, for many, the later is more revealing then the former.

Even still, a poor direct comparison, and one I really hadn't intended to make in the first place.

> Especially given the context of the remark.

It was a question, not an assertion. You inferred more from the question than intended. I'll blame myself for not being more clear to distance the question from being more than it was, but it was just a question.

> I have not insulted you once. Telling you to get real is not an insult, not under any stretch of the imagination. > Perhaps I am being a little harsh

Let me just say your attitude was insulting then. Relax. I'm not crazy. =) I probably just failed at being specific enough, though I hate constantly hedging and making assumptions about how people will read things.


>Google allows the forwarding of your search information to other sites. It's not unsubstantiated, and well established. It's the point of all of this. Yes, it's being done in your browser. But it can be fixed on the search engine end.

They forward along the keywords. Useless for identification purposes.

There isn't a point to arguing this any further with each other, as I think we've both made our points clear. Good Game.


> They forward along the keywords. Useless for identification purposes.

Identification isn't the point. If you think identification is at issue, you don't understand.

> as I think we've both made our points clear.

I can't help but think I failed.


Yes, but the real problem is you are blaming Google but forgetting all the other sites that can track.


Like blaming Microsoft for all viruses targeting it.

Just because you didn't create the problem doesn't mean you can't be part of the solution.


I agree. Attacking the market leader seems desperate to me, and I think it'll lose a lot of goodwill.

You don't win customers/users by saying "Our competitor is rubbish and here's why". You win them by saying "Our product is awesome and here's why".


what he's doing is highlighting a key differentiator between their products, and trying to educate the market on why his implementation is more awesome.

in other words, he's taken a page out of your "how to win customers" playbook.


Not sure that worked so well for Diaspora.

Can you give some examples of companies that have successfully "smeared" the market leader with FUD and ended up ahead?


> Can you give some examples of companies that have successfully "smeared" the market leader with FUD and ended up ahead?

Apple. See Mac v.s PC TV campaign.


In what way are they ahead that had anything to do with that campaign? The iPod, iPhone, and iPad are all markets that they are ahead in but those all have nothing to do with the Mac vs PC campaign. They still are not "ahead" in the general computer marketplace.


> In what way are they ahead that had anything to do with that campaign?

Because the campaign was more than just Mac v.s. PC. It was also the Apple ecosystem.

> The iPod, iPhone, and iPad are all markets that they are ahead in but those all have nothing to do with the Mac vs PC campaign.

This statement is not true. The campaigns did impact the perception of the Apple brand. To assume the campaign had nothing to do with the success of these devices is wishful thinking.

> They still are not "ahead" in the general computer marketplace.

In terms of pure numbers, no. But in numerous other ways to measure success they are.

Just look at the growth of Apple after a campaign where it attack the PC. Discounting this isn't smart.


> The campaign was more than just Mac v.s. PC. It was also the Apple ecosystem.

It was mostly about the Mac OS X vs. Windows. And I'll repeat, they are still not ahead there.

> Just look at the growth of Apple after a campaign where it attack the PC. Discounting this isn't smart.

Correlation != causation. I would say instead, "Just look at the growth of Apple after it introduced several really amazing products that had no equal in the marketplace".


> Correlation != causation. I would say instead, "Just look at the growth of Apple after it introduced several really amazing products that had no equal in the marketplace".

Which is beside the point. Apple's ads and growth are relevant to the question. The question didn't ask for an example where attack advertising directly led to growth.

I'm not implying that the ads led directly to Apple's amazing growth and dethroning Microsoft at the revenue level (amongst other areas).

Don't get me wrong, I'm not arguing for or against attack ads. I'm also not arguing against your points in this comment. Merely that they don't matter in context.


> The question didn't ask for an example where attack advertising directly led to growth.

I think that's an awfully pedantic interpretation of the original question. To me, 'Can you give some examples of companies that have successfully "smeared" the market leader with FUD and ended up ahead?' very clearly implies just that.


> I think that's an awfully pedantic interpretation of the original question

Nevertheless, that's the question. It's a simple question, and a better one then what you imply it is. "Can you give some examples of companies that have successfully smeared a market leader with FUD and ended up ahead as a direct result of that smear campaign" is a loaded question.

Rather, the question as asked is simple: Despite having a smear campaign, have companies succeeded? Are their success stories? Yes, yes there is. There are countless success stories of companies and organizations directly attacking their competitor and being successful, if not market leaders.

Apple is just one example.


You probably missed the original wave of Mac vs PC ads.


Strange definition of ahead.


Considering iOS devices, revenue and growth in various markets, they are ahead.

Edit: Apparently someone disagrees, and thinks Apple is losing to Microsoft. I'd love to have a company lose to a rival in the same way Apple has.


> Google never took the low road in order to gain market share

Excuse me? You forgot the unwanted automatic opt-in for Google Buzz? Or doing a u-turn on net-neutrality in cahoots with Verizon? Or "accidentally" collecting wifi data?

This particular issue may not fall into that category, but the days of "don't be evil" are behind us. Like most major corporations, Google puts profit before ethics.


Can you list points on why it's FUD?


Can we move past the billboard issue and concentrate on the meat? After all, that may have been a silly move (and silly, not mean, google is large enough to take a friendly poke in the ribs like that without pretending to be seriously hurt) but it does not detract from the point.

Google does have a tremendous amount of data on the connected part of the online population and it is a point in DDGs favor that they do not track their visitors.

Search leakage really is an issue, even if the majority of the people do not care they probably should and google could easily fix this, so is it going to or not?


If they break the way HTTP_REFERER etc work, I would stop using them and find something else.

Stop trying to spin this as "something that needs fixing".


Stopping search terms leaking to 3rd parties is not 'breaking the HTTP_REFERER', after all doing a POST instead of a GET already does that.

The referrer was meant to indicate the source of a click, not to indicate the terms a user used to search for content on the web.

Stop using words like 'spin' when you simply disagree with someone.


Thx Matt. At issue here is the search leakage, and I tried to keep this post to that topic. I included the quote from Google because it mentions the Referrer sting aspect; I included the full quote because I didn't want be accused of taking it out context.

The part in particular that is relevant is "All search engines and websites use referrer terms as part of the architecture of the web, but we recognize our responsibility to protect the data that users entrust to us and we give them meaningful choices to protect their privacy."

It is not a choice that you need to give a user -- you can just do it for them as it should not effect search results. The bottom line of the post is that Google can easily control this particular leaking of personal information, so why not do it? I cannot seem to get a straight answer to that question from you or anyone.


BTW, DDG still leaks search terms:

  * Disable javascript
  * Do a search
  * Click on an https result
Your search terms are sent to the website as referrer.

  Referer	https://duckduckgo.com/html?q=https
If you're going to claim to be whiter than white, you better cover all bases.


If there is leak I'm of course happy to fix it. Ahh, this is a bug I just committed yesterday fixing something else for someone. Will fix right away -- the html version is supposed to be using POST by default.

Edit: fixed. The non-JS versions are somewhat new and there are a lot more bugs floating around them. Please report whenever you see something awry.


I just tried this right now from the http: version and it still has the same problem.


I don't understand this at all because for the non-JS version you should never be on https. If you go the http homepage, the default form submit is POST and HTTPS. JS changes the action, so if you have JS off it should submit to the non-JS vesion (via HTTPS and POST). Also, the non-JS plugins are also HTTPS. Please email me to sort out what is going on: http://duckduckgo.com/feedback.html


My guess is that a significant portion of business website technical infrastructure has some dependence on being able to access the search terms. Tons of commerce websites would scream if they lost this information.

It's probably a complicated web of business relationships and strategies that is the answer for why Google would be reluctant to make such a change. It would cause a lot of upheaval and potentially ill will from companies affected by the change.

It would be an extremely bold move by Google to make to prevent search leakage by default. Does Google believe that encrypted.google.com appeases the users who are concerned? Does Google believe that it is only a minority of users that are concerned?


I have a gut feeling that google should be on the side of the users in this case, not the websites. Even if the users 'don't care' it's mostly because they are not aware of issues like this.

Typically someone that types a query in to a searchbox on www.google.com thinks they are having a conversation with google, not with all of the potential links they'll click after having completed their query.

Try explaining to a few non-computer literate people what happens behind the scenes when you do something as simple as a keyword search, most of them will have their eyes solidly glazed over long before you get to the HTTP_REFERER header.


Nothing against what you've said but search terms are hardly my bank account details, I'm well aware that they are being 'leaked' (although, I'm not sure that's even the correct term given that it's quite delibarate) and I doubt my whole search history would reveal more than which language I was working on and what error message had me stumped. Oh, and maybe which celebrity I have the hots for too.


It seems to me that Google used the same kind of consumer-pandering rhetoric when they entered into Net Neutrality confrontation with the carriers. Google branded itself as a company that would lead initiatives to protect consumer access to information. Now, it feels like they have stopped leading this charge. The DuckDuckGo campaign may be a little too consumer-pandering, but I think its perfectly appropriate for how Google has positioned itself to consumers.


Well, All this FUD has ensured that I'll never try DDG. It's sad really - it's not even a real search engine, just some hack using APIs. And the best marketing Whiny-berg can come up with is half-truths about referer linking.

I'll stick with the search engine that has a legion of brilliant programmers working hard to bring me great search results.


I'm sorry you feel that way. All I can say is that I tried my hardest in this post to do the opposite of FUD. I believe this is a serious issue and I am trying to get it solved for everyone.


You haven't begun to establish people want it "solved".

There are already solutions for the people who want them (stop sending referer, adblock, etc).

FWIW, I do not want Google to change their behavior in this area.


> You haven't begun to establish people want it "solved".

I think we'll all agree that "people" in general (NB: not us) don't even understand the problem, so it's a little premature to claim a problem based on not wanting it solved. At this point I see it largely as an education campaign.


While I agree that people don't understand the technical aspects of ensuring privacy on the web, they do understand the broad nature of the issue (i.e. my information is being shared, sold, or otherwise used for profiling and profit).

Maybe some other HN members would like to chime in here, but was anyone working during the late 90s/early 2000s and remember the "cookie scare" that went around? Some departments made it their policy to turn off cookies in the browser by default, other users refused to even go through a shopping cart (requesting instead to phone in orders) because they heard cookies steal their information.

That scare is probably why Google is being very proactive here because if users glean the incorrect information, it can be very difficult to convince them otherwise.


People want it solved.

See the outrage over other privacy issues in the past regarding Google, Facebook, and countless other services. Specifically, see the outrage over the Google Buzz fiasco as an example.

Privacy issues are a concern. People aren't interested in the specifics, but they make the assumption that their information is kept private.

> There are already solutions for the people who want them (stop sending referer, adblock, etc).

There were solutions for Google Buzz at the time. Just because they exist doesn't mean they are implemented in a way that people can use them easily. The problem isn't the tools, it's the ability to use those tools by default.

Let me make this clear: people assume their information is secure by default. So, the reason you don't see a bigger uproar over this is because of assumptions people make.


People are not used to tracking, but they will become. Web companies aren't going to stop doing it, as it is giving up enormous competitive advantage and quality of service.

DDG's "not tracking you" for example means Gabriel can never offer you personalized search based on your search history (he wouldn't be able to anyway as it's not in the APIs and it is impossible to be implemented properly on top of existing APIs). This is a tangible hit on the quality of your search results right there.


Privacy issues are a concern, but is this privacy issue a concern?


I don't think it's necessary for a promotional campaign to convince people who don't want your product that other people do. Your comment could point to a flaw in DDG's advertising, or it could just mean you're not their audience.


Ski masks?


What image would you choose for that point? I'm happy to switch it -- I actually already switched it once at the request of someone else.


someone with binoculars would probably work better. the ski-masks have a connotation of criminal behavior, and while i think this is undesirable behavior, it certainly isn't criminal.

as an aside, i don't think i've ever seen someone wearing a ski-mask while skiing, or for that matter, for any legitimate purpose. at what point will we stop calling them "ski masks"?


Done.


I would suggest you stick to blog posts that make logical arguments, instead of illustrative FUD.

And then when people call you out on your FUD, I suggest you don't deny it.


Google employees snooping personal data is a real issue that has actually happened. I don't think an illustrative guide in and of itself is a bad way to convey information.


Jason masks!


And the best marketing Whiny-berg can come up with is half-truths about referer linking.

Dude, not cool. This is Hacker News. Please refrain from ad hominem attacks and stick to making intelligent, comments that contribute to the conversation, please.


You seem to be forgetting that "some hack using APIs" describes a very large amount of YC startups.


And they all should be judged on their own merit.


I fully agree. But in the original post the commenter is insinuating that DDG is bad because it is "just a hack using APIs" in stead of a search engine based on the work of a "legion of brilliant programmers". Which results DDG is also using.


The only sad thing is when people come to the defense of corporations that would chew them up and spit them out in a second if it came to it.


Have you worked with Google and had a bad experience? The people I've worked with there have been extremely helpful and refreshingly 'good'.

It's very easy to wrongly assume that a big company that makes profit will treat everyone poorly, but that hasn't been my experience at all.


But how is it relevant to corporate policies if the people are nice? Google is a corporation that's making money by selling ads and those ads are targeted based on user's data.

No matter how good the Google employes, the incentives are perverse and will twist Google's decision in favor of eroding customer privacy even more. See Schmidt's "privacy is dead" quote. Privacy is not dead yet, but the likes of Facebook and Google are trying really hard to kill it because there's more money for them if they do.


Privacy is dead.

People are posting intimate details of their lives on the net. Sending naked pics around their phones. They killed privacy off themselves because it gives them freedom in other ways.


As B.Schneier said, if people post private details, it doesn't mean they don't care about their privacy at all.

Even for those people, it's important that they have some control over what they post. For instance they expect their phone pic to be private to them and the person they sent it to, or they expect their Facebook info to be available only to friends and they expect that they can be able to delete it or change it.

Even if your premise were correct, we have to ask ourselves if it is desirable for society that privacy is dead and I can't see how the answer would ever be yes.


Gabriel mentions in this post that https does not fully solve the problem, as referrers are still sent when using google in https mode. Can you confirm whether that's correct?


I'm pretty sure it's correct. Browsers specifically block sending HTTPS referrers to HTTP sites, but not HTTPS to HTTPS. So using HTTPS mode wouldn't stop sending referrers to HTTPS sites unless Google has a referrer-avoidance mechanism on the results page itself.


how many search engine results are https?


"The only reason I've heard to not prevent search leakage is that marketers use Referrer info to do better search engine optimization (SEO). But the information doesn't have to disappear, just the current mechanism of transferring the information in a personally identifiable way."

I struggle to see how this could work in a way that's a fraction as useful to webmasters as the current system. Sites that sell things like to tie keywords to conversions. They can learn, for example, that keyword X drives sales, but keyword Y doesn't, and assign resources accordingly. Online businesses become more efficient, and searchers get more of what they want. I think it's largely a good thing all round.

My respect goes to DuckDuckGo for coming up with a clever way to differentiate themselves from their competition. However, if the problem is that sites are inadvertently sharing keywords with third-party ad networks, then point the finger at those ad networks, not at Google. Blaming Google makes about as much sense as blaming Firefox, Safari, Internet Explorer and the web in general for sending referrers in the first place.


Thank you for pointing this out bromley. I've been reading Gabriel's responses and in no place does he mention that keyword level data is needed for understanding differences in user behavior based on entry keyword. This might not make a difference if you run a content site, but for any type of commerce site, I don't want to know how "Google" does, I want to break it down to the keyword level and better understand where efforts should be focused.

I also don't think that the keywords are shared with 3rd party networks explicitly. I think what's happening instead is that you search for something, you land on a page relevant to that something, and the ad network code is reading the content on the page and assigning a keyword target or theme to your search. For example, you might search for a Ford F-150 and you get to Edmunds and the ads are sold on a "by make/model" basis using ad segmentation so the ad network now can assign your cookie a "Pickup trucks" behavioral tag but it never had to read the referrer header, it was implied.


Another issue with this is that it gives Google Analytics a pretty dominant advantage. Suddenly all other analytics packages are shut out of Google search query data, unless Google is benevolent enough to create an API for it.


Referrers aren't passed on to all the elements loading on the page. If you click to nytimes.com from a Google search, the referrer is sent once in the HTTP request to nytimes.com and then your browser makes all the other required requests separately once it gets back the HTML page. When the ads are loaded they don't get your Google referrer, they'll either get nothing or a nytimes.com referrer. You can work around this with JS (which is how Google Analytics works), but it's completely unnecessary for the problems Gabe's talking about...

Referrers aren't needed for targeting. On that Gout example, Google knows you researched gout so they can target you with gout ads on sites that run AdSense or DoubleClick (which is a lot of ads). If you visited another site about gout that ran ads from a different network, then they too could target you. The referrer has nothing to do with it, it's what you're requesting.

If you don't want targeted advertisements, it's far more effective to use adblock or modify your /etc/hosts file than it is to use DDG.


Couldn't the ads access it via JavaScript?

http://www.w3schools.com/jsref/prop_doc_referrer.asp

Edit: AdSense at least does:

    var ua=document

    "&ref=",P(ua.referrer.substring(0,512))


While it is true that the requests that load the ads don't include the referrer, any Javascript loaded directly into the page can access the referrer (window.location.orgin). A lot of ad networks work this way, meaning they do have access to those search terms.


In many cases, ads are loaded in iframes, which hide the referrer from the ad network. It's not in all cases, but most big publishers don't want malformed ad network JavaScript to destroy their entire page, so they wrap ad calls in iframes to protect against that.


I know that Javascript can pick up referrers, since Google Analytics gets referrers and uses Javascript and nothing else.

Can't advertising networks insert code to pick up referrers in the same way?


If the js is running on the main page itself, yes. If it's in an iframe then no.


Javascript exposes document.referrer, so ad networks can and do still grab your search query.


"It's unfortunate that DuckDuckGo is preying on people's fears and offering incomplete information in order to garner attention," a company spokeswoman said in an e-mailed statement.

It really is impressive that you're on their radar enough to warrant a reaction like that.


I fear he's losing his goodwill and credibility by taking this angle of attack though. His whole argument really boils down to the fact that advertisers are retarded, not that privacy is being exposed.


Agreed. Especially since most cases similar to his argument are retargeting, not query leakage.

In his example, if the user actually clicked directly from Google to Wikipedia, Google would be the only one who knew about the user's interest in gout. Google isn't in the business of sharing this information (believe me, it wish they were ;).

In most cases that people might assume to be related to this, you search for Timbuk2 bags, click through to their site, then are bombarded with ads all over the internet for Timbuk2 bags. This has nothing to do with search leakage, this is retargeting. Timbuk2 drops advertiser pixels on their site so they can later target those users with advertising.

Most advertisers are stupid. They don't have the fancy tech to handle and parse search terms, target users, and display ads. They're probably using RMX or DoubleClick, where you only have the ability to retarget users that have seen certain pixels. They may be using AdSense or AdWords to target queries, but those are using Google's own data, which has nothing to do with search leakage.

I think DuckDuckGo rocks, but as someone working in the online advertising industry today, this issue seems manufactured for publicity. This information is useful in theory (and I'm sure a small number of companies are using it) but there are much bigger issues that are getting exploited by everyone.


I imagine putting up an anti-Google billboard in SF had something to do with it.


Yup, but it still would have been very easy for them to ignore him and wait to see if there's any non-tech-blog reaction to it.


Constantly attacking Google over something the vast majority of users don't care about seems like a bad idea.

I was ready to try duckduckgo if it could give me the results I wanted (Despite the hugely irritating UI and infinite scroll).

But the constant attacking Google seems bad business to me. It IS FUD. Google doesn't track you. Your browser sends a referer header, which it has done since the dawn of time. Who cares?

flagged.

I think you're going to lose a lot of goodwill Gabriel.


This is my last post on the subject. I felt that my position was being read unfairly, and I wanted to set the record straight. I apologize if it did not come off that way as it was clearly not the intention.

I truly believe this is an unnecessary leaking of personal information. And I address the browser argument directly in the post, as well as the argument that no one cares.


Gabriel, there seems to be some confusion here in the thread about "breaking HTTP_REFERRER" or in some way changing the current referrer behavior, which was not how i understood your post.

can you confirm quickly that you are proposing that search engines sanitize/anonymize referrer data, and not that they somehow change the referrer behavior?


I guess it depends what you mean by "break." I meant what you just said, i.e. just drop the search terms.


Looking forward to some posts from you on improved UI, removal of infinite scroll, and improved search results :)


"Google doesn't track you. Your browser sends a referer header..."

that's missing the forest for the trees, i think. while it's true that the browser is what sends the referrer, Google has some say in 1) what the referrer is (ie, what url is sent as the referrer) 2) and what Google itself does with that information once they (Google) have received it.

Google could have a stated policy of discarding or anonymizing referrer data. to my knowledge, they do not (i will gladly be corrected on this). they could also structure their results pages urls in such a way that the search terms are not sent as part of the referrer.

for the record, i think that Gabriel has seriously messed up here. i think he's approached this issue in entirely the wrong way, that he is going to lose a lot of good will over it. however, that Gabriel has done something foolish does not detract from the technical merits of either his search engine, or his argument with regards to search leakage. either they're sound, or they're not.

i don't use duckduckgo (yet?). but his argument with regards to search leakage seems to be sound, even if his ad campaign is not.


Google doesn't track you.

I'm having a hard time coming up with a reasonable context in which this statement is true.

I think you're going to lose a lot of goodwill Gabriel.

FWIW, I switched my default search in Firefox to DDG last week, partially as a result of the recent "FUD". It's working well and I've already used it for queries that I would prefer Google didn't associate with my IP address. (I'm not logged in to any Google accounts in Firefox, but I am in Chrome, and I'd expect Google to be able to correlate them).


The Google TOS clearly states that they have the right to analyze your content in order for them to provide ads. This includes Gmail, Docs and whatever else you may use. When it comes to search they save your search history for customized searches.

Let me put it this way: they don't target those ads so precisely by not having any information on you. On the contrary, they have lots of info. How safe and how anonymous that info is, that's up for debate.


If you don't want to wait for other people to fix this, there is a handy Firefox addon called No Referrer. It can block referers selectively, either when you click a link on a certain URL, or when you click a link that points to a certain URL. It uses regular expressions so it should be flexible enough.

It also blocks referers being sent from localhost/local URLs. I would be interested in trying out an option that only allows referers to be send to the same domain or its subdomains. The interesting part is seeing how many things that option would break.

EDIT: Forgot the link.

https://addons.mozilla.org/en-US/firefox/addon/no-referrer-m...


If you care about search leakage, turn the Referer header off in your browser. Problem solved. Why is it any website's job to change the way HTTP is designed to work?


Ok. Now how do I turn off the data mining?


Don't send requests to domains you believe are mining your data.


Or to domains that might log your requests and later be bought out by a company that mines your data.

Or to domains that might log your requests and turn around and sell the info to others.

Or to domains that might log your requests and then be cracked.

Oh wait. That describes half the fucking sites on the web.


I try to avoid it, though it is hard these days since so many companies do it. So besides that I'm also trying to convince people that they shouldn't put up with it.


https://encrypted.google.com

SSL pages prevent referer headers from being sent.

Easy.

The country specific pages dont have equivalents, so no encrypted.google.co.uk, but you can get the same effect using the gl parameter in the URL, so the url for a UK search would be:

https://encrypted.google.com/search?gl=uk&q=foo

Get your list of valid country codes here: http://www.google.com/cse/docs/resultsxml.html#countryCodes


Too late to edit but I should point out that this also handily prevents governments or ISPs from viewing your queries, or anyone else for that matter. Except in cases of MITM attacks obviously.


Perhaps I'm not thinking malevolently enough, but in what situation would the search terms that I used be enough to invade my privacy? Presumably, the content of the site is related to whatever you searched for (otherwise you wouldn't click on the link).

If you are willing to click the link and go to the site, the site will most likely have some idea of why you are there, and what you are interested in, regardless of the referrer headers (because, you know, the site is hosting the content that you are reading).

It seems that if I am willing to visit the site at all, I should also be willing to disclose trivial information like this. So I'm not sure why I should care.

Saying that this is not disclosed also seems a little disingenuous. Referrer headers are pretty standard. If you have a problem with Google doing this, you also have a problem with pretty much every other site that uses hyper-links. It seems that there is a lot of useful semantic information that could be gathered by being able to identify which documents reference your document. Eliminating referrer headers seems like it would be a net loss (pun not intended).


yep, you're not being malicious enough :)

the issue is not whether the destination site receives the search terms (and indeed, Gabriel suggests that they should continue to do so, either through the GWT, or some other method).

the issues is that currently, any advertising networks in use by the destination site also receive the search terms, via the same mechanism: the referrer. that's the crux of the issue.

while the destination site can't follow your traffic once you leave it, the ad networks, because of their large user base, frequently can. they can begin to build a much more thorough profile of who you are and what you are searching for than anyone single destination site could. whether that's an invasion of privacy is your call, but to many people it is. currently, they're simply unaware that it's happening.


Thanks for explaining, I can now see how this might be a problem.

Through this same mechanism wouldn't the advertising networks be privy to the content of the sites that I am visiting? It seems that even if we eliminate this, we still have issues with advertisers being able to track and create a profile based on the content of the websites you are visiting.

The headers do seem to create a direct link between a given search and a set of visited sites, but can't things like cookies and tracking pixels be used to the same effect? Possibly then using NLP to figure out the most important words on the page? Or the SEO terms that the website uses to get picked up by the search engine?

If you are going to let an advertiser post content on your site, it seems to me that it would be very difficult to keep said advertiser from tracking your users.

If the user uses Adblocking software or otherwise blocks the advertisers' sub-domains, does the advertiser still receive the referral headers?


yes, that's right: this is just one arrow out of the quiver. sanitizing the search terms out of the referrer does not fix the full problem.

whether the advertisers are receiving the headers are not depends on how the ads are served. to my knowledge, most of the time, they're not receiving the header directly, they're using JS to access the referrer indirectly (it is exposed via the document object). some adblockers merely hide the ads, which would still allow the advertiser to access the referrer. others prevent the ads from loading which i believe would prevent any access.


Just to avoid confusion, the GWT acronym is usually used to reference the Google Web Toolkit (http://code.google.com/webtoolkit/), not their Webmaster Tools product.


In Gabriel's defense, I'd say there's something very antithetical between FUD and a detailed description on how to fix the (alleged) problem. I don't know of any other FUD campaign in which a simple solution was provided -- one which won't directly benefit the entity raising the objection. It's Google, et al.'s decision which way they want to go -- whether they take his advice or ignore it.

But one could excuse the billboard potentially to a person trying to highlight that this is a big issue. But again, this seems quite different from an incumbent that is trying to cast doubt via obfuscation, which has usually been the case in FUD...


I see the billboard as nothing more than a prank that is now pulled out to beat down the discussion about the actual subject matter.


Also posted on site:

I agree that the amount of information a well-tagged website can collect on users is frightening, but I don't think that stripping search keyword data from the referrer is the solution. I think Gabriel is going after the wrong thing.

Here's why: A good Search Engine will never send a user to a page that isn't textually relevant to the search they entered. In 99.9% of cases, the text they entered is ON the page they hit. So if a user searches for: [SOMETHING CREEPY] they will be hitting a page that already has [SOMETHING CREEPY] published.

To put it another way: "Your Keyword data is never going to give a website something it didn't already have. It's just going to reveal what pieces of its content are of interest to you."


It's more complicated than that. See the part of the article about ad networks.

On the other hand, I think the gout example was google ads, not an ad network on wikipedia, so hiding referrer info wouldn't have helped.


But in this case it's the site choosing to share the search term with the ad network -- and www.medicinenet.com/gout/article.htm (my first result for gout) has a pretty good idea that you searched for gout. It's also a rare enough term (this is my first time ever typing it) that any calculation on statistically improbable phrases will know I searched for "gout", not "article".

In most cases the "leakage" is pretty minor from search engine to page, the big leak is from page to ad network.


You are affected if you see ads, using https, using noscript, blocking ads and erasing most of your cookies you are somewhat not vulnerable.

Using google the first link that came with 'gout' was wikipedia.


Still as an advertising demographic "someone researching gout for whatever reason" is more valuable to target than a viewer you know nothing about -- even if half of them are researching the Henry VIII, the other half have a new condition and they need to buy something if only they knew what.


"But in this case it's the site choosing to share the search term with the ad network"

in the same way that most users probably aren't aware of this, most publishers probably aren't aware as well. i, for one, never considered that the referrer might be passed along to the ad networks i use on my sites. it simply never occurred to me. i definitely didn't choose to share that information with them.


agreed. I guess I left out my suggestion for what the problem really is:

If you are concerned about Ad Networks having so much data on you, clear your cookies and block cookies from them. Then they will never be able to string together more than one piece of data.


You're only bumping the problem from a per-browser (cookie) aggregation to per a machine or household (IP address) aggregation.


Let's not slip into a false dichotomy. We don't need to argue about whether to block cookies OR block search engine leakage.


The author is singling out out one company and saying they should be doing things differently than the rest of the web, because of what 3rd parties can do as a result. Why not go after the advertisers if they are the real miscreants? FUD!

There is absolutely no reason that Google should break the web to pacify this guy.


Taking his suggestions would no more "break the web" than Craigslist broke paper. It would harm a common business model, sure, but I have yet to see a good reason for me to care about the business models of web companies any more than I care about the business models of newspapers or record companies.


I wish there was a way to flag comments that obviously did not read the article, as there's an entire paragraph devoted to "omg you're just attacking Google you meaniehead!"


While I appreciate that DDG would want to differentiate itself from its competition, if the actual goal is improving user privacy on the internet I don't understand why this is being treated as a Google issue rather than a browser one. Google is an important site, but just one site of many. Wouldn't it make more sense to try to convince browser makers to have HTTP_REFERER turned off by default, either in entirety or for cross-site purposes?

It also seems worth noting that if for some reason you wish to continue using Google instead of DDG, and if you are concerned about the potential privacy issues, you can already change your browser not to send the referer header:

http://kb.mozillazine.org/Network.http.sendRefererHeader

https://chrome.google.com/extensions/detail/dkpkjedlegmelkog...


Not sending the Referer header at all can break some sites, probably more than is acceptable to do by default. But stripping the query part of the Referer header might be reasonable. Probably the main side effect of that would be to make Google mad. (P.S. the header is called "Referer", not "HTTP_REFERER").


Which sites would break, and why? (genuine question, not contrariness) I've always assumed that cross-site Referer is sufficiently brittle that one cannot depend on it, since it's isn't there if one enters a URL by hand or arrives via a redirect. And while I like it for use within a site, it seems that Cookies have taken over for most uses.

Sorry about the sloppiness with HTTP_REFERER vs Referer. You are correct --- I tend to think of it from the CGI point of view rather than browser. Browser sends Referer as an HTTP header, which web servers commonly set in the environment as HTTP_REFERER. Thus the question should be "Why not have browsers default to not sending the HTTP Referer header?"


Even if search engines stop sending keywords in referral data, the ad networks and webmasters will still be able to piece together your browsing history. Every day there is less and less anonymity on the web, and for the most part people are ok with it. Ten years ago very few people would willingly use their real name online. Facebook changed that.

The web as we know it has been built on the assumption that search engines pass along keywords in referrer data. Changing this would have a significant negative impact on a lot of businesses. Considering that most users don't really seem to care about privacy, at least if you judge by actions and not what they say, I don't see why a company like Google would ever stop sending along keyword data to webmasters. They'll piss off webmasters who buy ads from them, and it won't help them increase their share of the search market.


I use DDG as my primary search engine and plan to continue, but I am really getting tired of this aggresive/attacking marketing approach of gaberial.


There aren't enough people at the FTC to read the complaints that would flood in if Google changed this and Google Analytics became the only tracking platform that could do SEO keyword analysis.


I don't see any reason why analytics should have access to that information when other tracking platforms would not.

Chinese walls should take care of that.

And if those are not in place then google has bigger problems.


I don't feel qualified to say whether this is FUD or not, but it certainly imposed some fear on me. I don't know what a Gout is and now I'm afraid to search for it.


The gout example in the article strikes me as very odd, since the only site that should've received the referrer with the search terms was wikipedia. Therefore we've left to conclude that either Google is targeting ads in their AdSense network based on search terms (which is outside the scope of Gabriel's argument) or Wikipedia is passing search terms to ad networks that for some reason it doesn't display ads from.

Equally likely is that this concerned user clicked on a different health-related site with an ad network that classified him according to that site's content or stated category -- there's simply no evidence that the aggregation of search terms happened.

Note: I'm not saying the story couldn't be true, just pointing out that no technical evidence has been presented to rule out the other possibilities.


From the American Heritage Dictionary: "Gout: (n) A disturbance of uric-acid metabolism occurring chiefly in males, characterized by painful inflammation of the joints, especially of the feet and hands, and arthritic attacks resulting from elevated levels of uric acid in the blood and the deposition of urate crystals around the joints. The condition can become chronic and result in deformity."


I'm using Firefox add-on RefControl, which can remove referrer for 3rd party requests


Anybody paying attention to http://www.techmeme.com/110124/p25#a110124p25 ?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: