Hacker News new | comments | show | ask | jobs | submit login
Google personalizes search results even when you’re logged out, new study finds (theverge.com)
350 points by dcu 10 days ago | hide | past | web | favorite | 165 comments





Welcome to 2009, I guess

"Today we're helping people get better search results by extending Personalized Search to signed-out users worldwide (...)"

https://googleblog.blogspot.com/2009/12/personalized-search-...


I don't think this DuckDuckGo being sloppy, just The Verge. The real headline here is that the effect extends to Incognito mode.

In the 2009 post, Google said:

> ...customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser... You'll know when we customize results because a "View customizations" link will appear... Clicking the link will let you... turn off this type of customization

None of that appears to be match the study results here. Incognito mode isn't supposed to pick up session cookies from normal browsing, no customization notice appears, and an option to turn the behavior off certainly isn't provided.


Honestly, Google is almost certainly using canvas fingerprinting to track you across accounts, webpages, and more regardless of incognito.

There should be zero expectation of privacy when using a Google service. I've switched to searx.me, which wraps Google and (optionally) a bunch of other search engines.


Do you have a citation that they do canvas fingerprinting or are you just guessing?

They don't. But welcome to HN Where conspiracy is more prevalent then truth.

I don’t know dude, it would take Google the whole of 5 minutes to publish how they track people. It’s not like they haven’t already prepared a contingency eg for congress.

You’re being really flippant about speculation, which is a total red herring. The point is they’re not honest.


You mean they don't already explain it? Please.

Probably not, hence "almost certainly".

Unfortunately: "How canvas fingerprint blockers make you easily trackable"

https://multilogin.com/how-canvas-fingerprint-blockers-make-...


Found this with Firefox Focus.

Using a minority browser with a lot of stuff blocked makes you very unique.

It's good for blocking 3rd party cookies but you stand out like a sore thumb to Google.


Is there a chrome extension or easy way to randomize canvas properties programmatically to avoid canvas fingerprinting when opening a new chrome tab?

Canvas fingerprinting is highly concerning to say the least, thanks in advance to anyone with working solutions.


> Google is almost certainly using canvas fingerprinting

Since that's a client side mechanism it should be rather simple to find out, no? (eg. an addon that injects tracking into the DOM accessors necessary to get at the canvas)


wow, we took that instance down: but here is their stat list if people still wanna try the service.

https://stats.searx.xyz/


There's a difference between Signed out, and Incognito (no cookies)!

+ People generally tend to miss the point that Incognito doesn't prevent sharing the IP of the user.

+ I think DuckDuckGo's study missed out using VPN in their analysis. i.e., SignedIn vs Incognito vs (Incognito+VPN)


The "even in Incognito" part of this is certainly the biggest result I see. And I agree on the study limitation; attempting to clean up localization effects after the fact doesn't feel like a strong fix. It should be possible to isolate device and location effects by using multiple devices in one location, then VPN-ing one device to multiple 'locations'.

One thing that caught my eye was Google's response about Incognito:

> The company did confirm that it does not personalize results for incognito searches using signed-in search history, and it also confirmed that it does not personalize results for the Top Stories row or the News tab in search.

Since it's a corporate reply, the standard question is what's not present: a statement that Incognito isn't personalized, or isn't personalized beyond device type and location. Perhaps I'm too cynical, but "we don't personalize using X" parses as "we do personalize in other ways".


it does not personalize results for incognito searches using signed-in search history

To me this sounds reasonable. A very large number of searches are locality based, and it is entirely reasonable to localize them based on IP address (and - as you note - the device type).

It's also reasonable to customize based on recent (session based) search history (refinements, spelling corrections, etc).

The difference between this and personalization seems mostly about semantics IMHO.


> To me this sounds reasonable. A very large number of searches are locality based, and it is entirely reasonable to localize them based on IP address (and - as you note - the device type).

I wish this was trivial to disable. I regularly search for things where I want the global result, and instead get weird local results that I don't care about. It's much easier to narrow a global search to a local one by adding an appropriate region name to the search than it is to expand a local search to a global one via search terms.


Overall I agree with this, I definitely see why Googlers are frustrated at having all of this framed as 'personalization'.

But broadly, I see three bases for objecting to these changes.

First is the lack of user control. Like many other people in this thread I often want to turn off or 'rehome' localization, not just for weird developer use cases but for obvious stuff like "I'm about to travel and want results for that location". Disabling session-based changes is a rarer desire, but comes up sometimes when a correction or topic change is interpreted as a refinement that's biasing results. Fortunately, resetting Incognito should manage that. (I've never actually wanted to bypass device type adjustments except for dev work.)

Second is inadvertent bubbles. It's easy to imagine content-neutral rules like "show fast and mobile-friendly pages to smartphones" correlating with a meaningful content difference, and the same for location. Hard to really blame Google here, but again it'd be really nice to have the option of a "stop helping" setting.

Third is Google-driven bubbles. Some of the DuckDuckGo examples showed effects like national newspaper articles on a search for 'immigration' getting reordered, or pushing above and below non-news sources. (We can't know if that was caused by location or device type, but let's look at the case where it was.) That doesn't look like basic localization, it looks like non-local results being adjusted based on user location.

This wouldn't have to be anything purposeful; if you add location into your training set and reinforce on the usual 'success' metrics (e.g. first result clicked, final result clicked), you could easily learn that people in NYC and Houston have different behavior patterns and display accordingly. It's open to debate whether this is a bad thing, but it's certainly not what most people (including the Googler who responded to the article) mean when they say "localization".


Google definitely personalizes based on geoIP location, that's not exactly a secret.

It's not a secret, but I don't think we're doing enough to keep this on the forefront of people's minds. Every time I hear this, I am shocked! Only to then remember I already knew this but somehow let it slide...

I mean, what's the difference between this and customizing billboard ads based on where the billboard is?

IMO the issue is not google using IP-based location info. The issue (if there really is one) is people assuming/believing the internet hides their location.


hmm, that's a very interesting point. I guess that famous saying of "if you're not paying, then YOU are the product" kinda falls into play here, huh?

> There's a difference between Signed out, and Incognito (no cookies)!

Is there? I was under the impression that Incognito and its cousins generally still accept and preserve cookies for the duration of the temporary session. This means that for this purpose, there isn't really a difference.


Incognito is supposed to give you a completely clean session that doesn't carry over the cache, cookies, etc. from your normal browsing context.

In this study, my understanding is that result personalization carried over from a normal browsing session into the clean, Incognito session, likely due to IP correlation or possibly through User-Agent strings. So while Incognito has its own context that is wiped once the session has ended, the result personalization didn't need anything saved in the browser to recognize who you are.


Hmmm I wonder if using a second browser, maybe based on a different renderer, in incognito with a different user-agent set up would be enough or if it would still get enough info. At that point it would still have your outgoing ip address at least is there anything else they could still match to a signed in session? I guess you could also route all traffic in the unsigned in session through a VPN too.

Quite a few variables are used to track you, many of which do not change between different browsers.

Try this: https://panopticlick.eff.org/

Specifically, check out the "fingerprinting" details.


Incognito will accept new cookies but to my understanding it won't serve existing cookies that pre-date the Incognito session. A fresh Incognito window is supposed to be like you cleared your history and cookies before opening it, and then cleared them again when you close it. But yeah, in between the browser acts as normal.

In a normal (not Incognito) browser window, you don't have to be logged into Google for Google to read Google cookies. Logging out doesn't make you anonymous; they still know who you are.


Even with a VPN it's possible to somewhat reliably fingerprint browsers, no? You can check user agent, screen size, installed plug-ins, etc.

Yes, see antsar's comment below:

Try this: https://panopticlick.eff.org/


Disable javascript by default!

Google has said they aren't using "much" personalization these days: https://searchengineland.com/google-admits-its-using-very-li...

and:

https://www.seroundtable.com/google-personalized-search-is-v...

So it comes as a bit of a surprise that it isn't as light as they have been saying recently.


Sounds like the "study" should have been just a google search.

Because "they do X" makes looking into "what do the results of X look like" a completely pointless question?

Yeah, the problem is more with the headline than the study.

To be fair, I have noticed more and more "creep" (take that how you like) related to logged out behavior from Google.

To be fair, I have noticed more and more "creep" from Google full stop. I was hoping GDPR was going to put some sort of dent into this behaviour from large corporations, alas.

It's interesting the tone of the blog as well, I bet if that article was written today it would've had a lot more rigor around the language being used. And it would be ensuring users that their security and privacy isn't be compromised, as opposed to this blog which is simply describing what Google thinks is a great feature that the public would be benefit from.

I remember Paul Graham gave a keynote at one of the Pycons some years ago. It was about interesting ideas that you might want to work on in the future.

One idea was to build a search engine that returns unpersonalized results. He talked about how Google will be moving into a "it's true, if it's true for you" kind of world. His idea was that it will open new opportunities. I think DuckDuckGo is one example, and they've grown and are doing pretty well. I think a lot of it comes as a reaction to Google, Facebook and other such things.

"It's true, if it's true for you" is also a great phrase worth remembering. It describes so much about the current world and where things are headed, and why some things seemed to have gone off rails.


It use to be that DuckDuckGo would let you contribute to the search engine years back providing interfaces to various 3rd party data. I am not sure if that is still the case?

Has anyone written anything for DuckDuckGo?


You used to be able to contribute to those info cards at the top of searches but about a year ago they closed it off from external contributions.

They stopped August 31st, 2017 [0], however they still have several open source efforts [1]

[0] https://duckduckhack.com/

[1] https://github.com/duckduckgo/


That's the Bang system

While I have switched to DuckDuckGo a long time ago, there basically is no proof they're actually not doing exactly what Google does.

You're asking for proof of a negative.

DuckDuckGo just put out a post criticizing Google for personalizing search results. They're a small company without many resources, and privacy has been their primary selling point from the get-go. Yes, DuckDuckGo could also be personalizing their own results, but if they were and people found out, there's risk of a significant outcry. Until I see evidence to the contrary, I'm inclined to assume good faith.

See also: discussions around "but how do we know for sure that Apple has better data privacy than Google?"


They are VC-funded and make money selling ads. Stop idolising DDG as the true and only fighter for privacy. Sooner or later they will bow down to Ad Cash or fuck up otherwise.

It's not about company size or profitability, it's about words + past actions. As I mentioned, I made a similar argument recently about Apple.

If you assume all organizations are equally ready to ignore user privacy, what do you use? Do you program your own everything or do you only use FSF-approved software whose source code you have personally examined and compiled?

It's all about degrees.


They sell ads based on your current search terms, not your search and browsing history. Your browsing history is easier for Google to summarize based on all your searches, and URLs you click which get redirected from google.com to other sites with a unique hash.

The funny thing is that what DuckDuckGo is doing is actually working... By limiting ads to the search words they manage to provide relevant ads (insofar any ads are ever relevant). It turns out that when I search for "office chair" I'm actually not shopping for the Macbook Pro I was looking at a month ago... :O

Do you mean personalizing search results? IF so, that's incredibly easy to check: run the same searches on computers in separate physical locations and compare results.

The article mentioned Google tracking users even inside browser sessions. So I'm not sure how to replicate this individually without a significant amount of effort.

I meant tracking you and selling the outcome to the the highest bidder.

Tracking is not invisible, see ghostery of disconnect or... .

So your "there basically is no proof" is both wrong and a red herring.


The problem is, these "personalized results" have become mostly useless. For more technical (or simply specific) queries, Google seems to vomit useless only vaguely related links instead of, you know, pages that cover what i searched for.

This does more to encourage me switching to some other search engine than any privacy concern.


That's a separate issue, but a real one.

Practically speaking, many non-technical users _do_ want to formulate their query in an imprecise, sloppy manner and have the search engine figure it out.

Programmers are capable of and accustomed to putting a lot of care and precision into exactly how they communicate (especially when communicating with a computer), but regular people don't usually do that. They may not even be capable of it.

The only solution I can see is for a search engine to support both styles of communication. It could either learn/guess (on a per-person or per-query basis) or just let you tell it how to behave.

But yes, Google web search could definitely use improvement here. There are times when there are VERY obvious clues that I'm being precise, and Google totally misses it. For example, my printer is a DCP-L2550DW. If I search for "DCP-L2550DW margins", it will include results that don't have "margins". I could have just typed the model number, but I went out of my way to keep typing. If that's not a strong enough signal that I definitely want results related to margins, I don't know what is.


Likewise matching DCP-L2550DW to pages with DCP-L2550BW or something similar. It is so obviously specific a no results answer is better than a misleading one.

I think this is due to people more often than not finding the top page useful regardless which just reinforces the link.

But it's super annoying to find you've just waited for a 300mb download of the wrong driver.


I think this is a lot more defensible. Eg (real example) if I search for 74HCT350, I would probably be okay, if not actively happy, to get results for 74ACT350 (a socket-equivalent chip in a somewhat faster logic family). It's much less defensible if "DCP-L2550DW" is quoted, since that's a clear indication that you want the exact string, but it's still better than outright ignoring half your query the way google does.

I agree very strongly with this. I don't understand why people complain about having to read the search results.

In so many cases slightly different product numbers are equivalent and differ in things like the market they are for, or the color, or something else you may or may not care about.

It seems to me that hiding these results required both Google to do mind reading and to decode spec sheets to see if they match what you had in mind.. all from a single string of semi-random characters.


Even worse: "<model number> manual". I don't google that because I like reading manuals in general.

You can try putting double-quotes around the model number. This is Google’s syntax for exact matches. For a good example, try searching for NYC both with and without quotes around it. Without quotes, you will get a much larger set of results where Google is trying to figure out what you mean when you say NYC… including results for New York City.

This doesn’t always work exactly but in general if you are looking for exact matches for certain terms, put those terms in quotes.


Double quotes keep getting less and less effective -- Google is progressively ignoring them more often when similar results are available. Worse, I've seen Google tell me that a term in double quotes simply doesn't exist on the web, when other engines will return many thousands of results. I seriously doubt that the competition has a bigger index!

This is what finally put me on using DDG.


So DDG aka (bing / yandex) is better for double quotes.. maybe, but it could also be just an illusion, try same queries there and see.

I have, that's actually why I switched. I was getting better results with DDG.

I think what I've noticed with technical searches is that Google tries too hard to "predict" what I am looking for. It always ends up showing me irrelevant (but surely highly ranked) fluff related to prior searches. The thing is, I don't need to see something related to my prior searches. I need to see what I am looking for TODAY. For me the "personalization" is often actively harmful.


If you put the word "margins" in quotes in your query, all of your results will contain that word. Similarly, you can force words not to appear by prefixing them with a "-".

> The only solution I can see is for a search engine to support both styles of communication.

Isn't this exactly what you're asking for? Non-technical users can be imprecise and let Google figure it out, while technical or power users can get more precise results with the various search operators?


If you put the word "margins" in quotes in your query, all of your results will contain that word.

This seemed to be true at one time, but I noticed last night that it doesn't always work.

These days it seems to put include results without the word, then provide a tiny link below each deficient result with a link to "must include" the missing word in a new search.

That used to be the previous behavior, but now it's inconsistent.


Do you have an example where adding quotes doesn't work? As far as I can tell, it always works for me (yeah, we may be getting different personalized results, but still...)

It’s kind of hard to share links because google personalizes results against our wishes. But a search for ‘"monkeys" burritos san diego “fries”’[0] returns a first link [1] that doesn’t include ‘monkeys.’

There are also other results [2] that include ‘monkey’ not what I asked for.

It’s infuriating. I just want to grep the internet. Is that too much to ask?

[0] https://www.google.com/search?client=safari&hl=en-us&ei=P1oH...

[1] https://www.pinterest.com/pin/481463016388903842/

[2] https://www.tripadvisor.com/ShowUserReviews-g55543-d4580159-...


In your first example, >90% of the text on the page is in the "More Like This" section which is heavily personalized by Pinterest. So I'm seeing different content than you are, and Googlebot saw different content than either of us. Given the complete randomness of the content in there, I wouldn't be surprised at all if "monkeys" appeared on the version that was indexed.

In your second example, the word "monkeys" does appear on the page, in the "Show reviews that mention" section.

More importantly, did you have a result in mind for this query that isn't showing up? It seems somewhat nonsensical.


Thanks for the reply. I’m on mobile and searched on mobile and I don’t see “show reviews that mention.” I do see “read reviews that mention” but that includes stuff but not monkeys [0].

The Pinterest page perhaps has monkeys at some point, but doesn’t now.

This isn’t a real search, I just added something whimsical (and I miss San Diego burritos). This is behavior I remember but don’t typically track so it’s hard to remember on command.

Note you can use search tools | verbatim and it will return pages will all terms. The Pinterest page is still there, but the trip advisor page is missing [1].

[0] https://imgur.com/gallery/Xlp8lsk [1] https://www.google.com/search?q=%22monkeys%22+burritos+san+d...


As loathe as I am to invoke Inception, having an advertising platform index advertisements and then sell my traffic to advertisers feels like a recursive death I don't want to die.

All that link appears to do is add the quotation marks around that keyword.

That used to be the case, and it was fine, but they started breaking those operators a few years back (maybe when they tried to make “+” search for Google+ pages, but I’m not sure).

I'm aware of quotes, but I have two issues:

1. It's tiring to type quotes around every word every time.

2. It takes you to the other extreme. Just because I'm being more precise than Google thinks I am doesn't mean if I have 8 words in my query that all of them are mandatory. There is potential for a useful middle ground that quotes don't get me to.


Quotes are one of many operators google has for search (1). You can build very powerful search arguments with a handful of easy to use operators. Sometimes taking the time to type in the proper operators can save you much more time by bringing you to the content you are looking for faster.

https://support.google.com/websearch/answer/2466433?hl=en


Please quit helping. The thing is, until like 4 years ago, Google worked just fine without any operators. Then they improved it to the point you can't find anything specific any more.

Edit: Actually you could help. Show me where I set "Verbatim" as a permanent option :)


The only solution I can see is for a search engine to support both styles of communication.

It would be great if Google had some kind of, let’s call it “Code Search” feature that searched all open source repositories.


> They may not even be capable of it.

Come on now...


For a lot of topics I just find myself with a lot of fake answer sites these days. About once a week I’ll get almost an entire page of results that way.

Seems like dark SEO is winning.


I had actually forgotten about those until just now. I use Bing for most of my searching (I fall back to Google in the semi-rare cases where Bing doesn't work), and fake answer sites don't seem to be as prevalent on Bing. Maybe those sites tailor their SEO to Google's algorithms?

Or maybe I've just grown numb to their presence.


Fake answers are another problem, I'm talking about stuff like Google dropping a word from my query because it feels there aren't enough results that include it.

Edit because I can't reply any more: As dsfyu404ed says below, it tends to drop the central term of the query, making the results 100% useless. Not that they're particularly useful even if it doesn't do that...


> I'm talking about stuff like Google dropping a word from my query because it feels there aren't enough results that include it.

It's particularly infuriating when it drops the location word from a search query for info on law and regulation.

I don't care if broad form automotive insurance is available in some non-specific state, I care about if it's available in the state that I put in the query.

Edit: Why is this downvoted?


Edit: Why is this downvoted?

It's lunch hour at Google, and people there are hitting their phones and HN.

You can see a lot of interesting trends on HN by day and time of day. Watch what happens to any story critical of insert nation here after 6am in that nation.

Or notice how on Fridays (NA time), HN is pretty much a ghost town.


I've observed that too. Then force the words to be there with quotes, finding that I still get plenty of resume. Which seems weird to me, because most people don't travel to the second page. I'd rather have an exact result than more pages.

But then again, it doesn't seem like a simple problem and one that has companies in either way.


I suspect this is one of those features that helps the less technically inclined majority, while hearting those who know what they're doing. I can see the average Googler including words that aren't actually relevant to their query.

As long as I can force it with quotes, I'm happy. Google has even started suggesting next to the results "include only results with the word XXX", to remind me the quotes might be needed.


I have noticed this too. I tend to include all the key words and the problem domain in my search and get bad results on google and then other people just type a whole sentience exactly like you would say it to a human and get back good results.

Sadly DDG has recently adopted this antifeature too. A page of irrelvant results is served with a small disclaimer that "Not many results include ${DROPPED_SEARCH_WORD}"

I should investigate how to write a search plugin for Firefox that " puts" "every" "word" "in" "quotes"


I hear this sometimes but have never seen anyone giving proper examples of it.Which queries were answered better before, or by other engines? It is always vague.

I think I agree with the complaints about this study. Unique doesn't mean personalized. IP geo-location, ISP... Don't forget browser and OS versions... there are lots of things you can sample over without actually representing any info leak from your logged in session.

Someone searching "at the same time" could technically be different times (remember time zones!) for the purposes of the algorithm (is the search during "work hours" or not, etc).

Users checking results on mobile phones compared to desktops... Without more details of how they controlled for these factors, the conclusion doesn't really follow.

Edit: I think to really measure the conclusion, they'd need 87 people in the SAME geo location, perhaps on fresh out of the box devices. That would be the best way to create the "placebo" group for their test, which they don't seem to have done.


containers

I'm the last one to say you shouldn't scrutinize what Google does, but this is complete non-news.

In the beginning, I was suspecting actively used logged-out cookies like Facebook infamously uses for example (try it, they're showing your face and keep tracking you all over the web). Reading on about differing search results in private mode, I was then expecting something like Google actually using IP + Fingerprint matching, which would be way more devious.

In the end, this was purely about Google showing a different page to everyone. Playing the devil's advocate, this is about the only way to escape the exploration/exploitation dilemma.

Is DuckDuckGo seriously complaining that Google is basically A/B testing everything, all the time? Because if that's the case, their data scientists should take some notes here.


Maybe it's complete non-news to you. Facts aren't simply pointed out once and ubiquitously committed to the public consciousness. There's nothing wrong with a healthy reminder.

Well, this has been public for the past 9 years...

https://googleblog.blogspot.com/2009/12/personalized-search-...


God forbid somebody joined the internet or became aware of privacy issues in the past 9 years.

Am I missing something? Google gives me the creeps as much as it does to any sane person, but if I understand correctly, DDG is comparing just the variation of results on each page. This doesn't mean that you're still in the (same) bubble when you log out.

How about comparing the logged-in data with logged-out / private tab data? Did they find these two sets related? If not, G could be just implementing some sort of A/B testing on grand scale (learning from clicks and making search algorithm better).


I always assumed they would, just like Youtube 'recommends' videos for you whether you're logged in or not. This is one of the reasons why I don't use Google / Bing / etc directly anymore, instead I use them through a meta-search engine (using a local instance of Searx [1] with some extra code to have it search my local content as well [2]).

[1] https://github.com/asciimoo/searx

[2] https://github.com/asciimoo/searx/pull/1257


The study’s result seem to be that users often get unique results. That’s not the same as “personalized”, and it certainly isn’t evidence of “bias” as the spreadprivacy.org-link suggests.

A good faith interpretation would point to google running learning algorithms on their results. That would also seem to be a far better explanation for Google changing parts of the page layout, such as the position of news and video results.

The use of the term “bias” for describing differences search results also trips my conspiracy theory detectors.


I think this should be considered the original source instead?

https://spreadprivacy.com/google-filter-bubble-study/


So much Vox media blog spam on HN - yesterday their Tumblr story (wrapping Tumblr's official announcement and adding very little) hit front page, and the official announcement (posted earlier and modded up first) was marked as dupe.

I'm not sure I'm comfortable which a critique written by an economic competitor (DuckDuckGo) that obscures its identity and doesn't disclose its conflict of interest.

Well it does say that it's the DuckDuckGo Blog, and it has the DuckDuckGo logo at the top. That's not very obscure.

https://spreadprivacy.com/google-filter-bubble-study/


Well I mean what's wrong with the study itself?

If you think DDG might have biased the structure or even falsified results, fine - make that claim. But otherwise this shouldn't really matter. Otherwise we're going to go down a rabbit hole of arguing on subjective matters, like why trust DDG, rather than pointing out objectively what's wrong with the research.


How exactly is DuckDuckGo’s identity “obscured”? There’s the DDG duck up in the corner for heavens sake.

Here's where Google (barely) exposes an option called "Signed-out search activity" retention: https://www.google.com/history/privacyadvisor/search

Make sure to access it while not signed in to Google, but using a browser mode which persists cookies (ie, not incognito mode). The actual control is https://www.google.com/history/optout

Here's the equivalent for YouTube signed-out watch and search history: https://www.youtube.com/feed/history . Click "Clear All Watch History," then click "Pause Watch History," then choose "Search history" and repeat those 2 steps again.

Do all of this from each device you use Google or YouTube from.


Here’s Google’s search liaison clarifying things 2 hours ago on Twitter. Localization shouldn’t be confused with personalization. Disclaimer: I work at Google, but not in search

https://twitter.com/searchliaison/status/1070027261376491520...


I can buy the "eventual consistency" argument, and maaaybe even the desktop/mobile argument, but location and language are part of what people mean by "personalization" (even if not technically correct), and frankly, they're the most annoying part.

I suppose that Google wants to optimize for people typing "football" in their search engine, at the expense of those trying to type "football results $countryname", or something more precise. But this seems ultra-annoying for anything more substantial than random trivia searches.

As for language, well, years ago I set my default search engine in Chrome to explicit "https://www.google.com/search?hl=en&q=%s", because language optimization made 80+% of my searches return wrong answers. I admit I might be in a minority here, insofar as English-speaking people with work to do are a minority.


> location and language are... the most annoying part

This is a constant frustration - I fight with this feature every time I search for info about some place I'm not, for instance to plan travel. Searching with a country/state/zip alongside the query helps, but it's a kludge that's much worse than actual location-specificity, especially if one is going to an ambiguous place like Portland or Washington. I'd very much prefer it if Google offered the same sort of mechanics as weather sites: defaulting to my location, then offering a little box saying "search as if I'm located in:". Instead, the only option is "use precise location", which is no more helpful than basing location on my IP and history.

In this sense, it's the same standard-uses-only paternalism that plagues Android. Maps on desktop has a great feature of "leave at" or "arrive by", which was removed from mobile in favor of a 'helpful' integration that can notify you about when to leave. It's terrible for anyone who wants to check a route for odd travel hours, or even do something as simple as decide when to set an alarm based on likely morning traffic. And so, of course, people have been requesting the more flexible option for years while consistently being told that the existing version is "more helpful".

Increasingly, this stuff doesn't just bother me on a privacy or information-neutrality level. It's actively damaging, to the point where DuckDuckGo is actually more usable for certain types of search.


> It's terrible for anyone who wants to check a route for odd travel hours, or even do something as simple as decide when to set an alarm based on likely morning traffic.

Yup. It's a great example of what I complain about when I say that software is being dumbed down and loses its usefulness. As another recent thread[0] pointed out, the handling of street labels is ridiculously broken. It makes Google Maps useless as a map - i.e. as a tool I could use to look around the place and find my bearings, or plan a route in advance, or scout the traffic across many roads without putting in a destination.

I too am bothered by this more than by privacy issues, to be honest. The latter are maybe more important, but the former are causing me frustration daily.

-

[0] - https://news.ycombinator.com/item?id=18358902


> Maps on desktop has a great feature of "leave at" or "arrive by", which was removed from mobile in favor of a 'helpful' integration that can notify you about when to leave.

I use this feature all the time and it's still there on the mobile versions, as far as I know. Ask for directions to a place, you'll see a list of routes, above the routes is a dark blue bar saying "depart at HH:MM". Touch that and it will let you select departure time, arrival time, or the last train.

Maybe this feature has moved around but it’s definitely there on mobile. If you touch an individual route it will take you to a different screen which doesn’t have this button, so go back to the list of routes and it will be there.


I believe the feature may only be missing on Android? At least, that's where I found the forum discussions with Google support stating it had been removed.

At any rate, I just tried to enter a destination from the main screen, hit directions, and provide a source. I also tried hitting 'Go' and then entering the destination, and tried each with "my location" and a specific address. In every case, I get several routes. The shortest is already highlighted, they all list times if I leave now, and there's a top bar asking if I want to drive, walk, etc. Touching 'back' returns me to the destination, with no departure source, so there's no list of routes which doesn't have one preselected.

From there, all I can do is pick a route and start, which begins navigation. None of available menus before or after starting let me set a "depart at" time.

If you're willing, could you tell me where our flows differed? I'm always curious about this sort of fragmentation, or if this is available I'd love to know where.

(Poking around, I did just discover "remember monthly driving stats", which is new since I last opened that menu and on by default. So thank you for letting me turn that off - and another way in which I'm irritated by Maps.)


That clarification is appreciated, but it seems to contradict or ignore several of the results listed?

In particular, DuckDuckGo attempted to check for the effects of fresh browser windows, changes from localization, and A/B testing or incomplete rollouts.

- The study results found that the 'distance' between two people's incognito results was 2.8 times larger than the distance between one person's normal and incognito results. If this is correct, the tweet's claim that you can use Incognito to see the impact of personalization for yourself is seriously misleading.

- The study attempted to control for localization by tokenizing all local results, so that two result sets which each had a local story in the same spot would be treated as identical.

- The study attempted to control for rollout and testing effects by running all searches at the same time and assuming that those forces would lead to most users seeing similar results, with a few differing. Instead, they found substantial variance across all users.

I can certainly come up with explanations for how each of these changes could be non-personalized. Incognito variance could be a consequence of region and device effects which are user-independent. Tests might be scaled up near 50/50, and rollouts may not hit all datacenters at once, invalidating the "changes for a few users" assumption. And most significantly, DuckDuckGo's localization control looks completely inadequate to me. It would completely fail the given "football" example, and might fail for the given search terms: 'vaccine' produced highly-localized results they adjusted for, but 'gun control' could produce pseudo-local results like a preference for national stories with the user's state as a keyword, and that wouldn't have been identified.

A better version of this study would presumably try to alter its variables separately, for instance by using multiple devices in one location and one device in multiple locations via VPN. As is, the controls seem seriously lacking.

But I'm just making up reasons, and frankly they don't seem likely to explain the sheer size of the differences. When I run those searches, I don't get "football" and "Paris" level customization, I get a bunch of national-scope results that have no obvious reason to vary between incognito users. I wish the tweets here had touched on any of that. As is, they explain general search mechanics while frustratingly bypassing the most significant claims.


In comparing two people's incognito results, you have no idea if they hit the same search clusters. Were they in the same room? Same ISP? Same geographical region? Same continent? If I issue the same search back to back in different windows, I have a good chance of hitting the same cluster. But I still have no guarantee. Many years ago, SEO people figured that some Google IPs would serve new versions of the ranking algorithms earlier than the rest. Even those were not stable and definitely not a guarantee, given how GSLB works: https://searchengineland.com/google-caffeine-now-live-on-one...

I have no clue how DDG works, but I am familiar with the Google search infrastructure as of several years ago and if you told me that the same query should always return the same identical results, I would tell you that you just don't know how Google works, even if you ignore personal results. That made monitoring and catching regressions not fun. It's better than running searches days or weeks apart, but running queries at the same time as they did does not guarantee anything, sorry.

Keep also in mind that the average query goes through thousands of machines, according to public statements. Behavior at the long tail can differ for obvious reasons. From a Jeff Dean talk, https://static.googleusercontent.com/media/research.google.c...

| Some issues:

| –Variance: query touches 1000s of machines, not dozens

| • e.g. randomized cron jobs caused us trouble for a while

| -Availability: 1 or few replicas of each doc’s index data

| • Availability of index data when machine failed (esp for important docs): replicate important docs

(see also the slides about Universal search, as well as patterns such as the elastic systems and the different tiers)


Frankly, everyone always misunderstands and underestimates the impact of location. Location affects all queries, not just "local" queries.

That's literally what is being demonstrated here.


Yes, and I responded quite directly to that.

DuckDuckGo's control definitely wasn't adequate, but in return Google's examples ('football' and 'Paris') have unambiguous localization value that's not present for two different Americans searching "immigration".

Quite a lot of the variance was link reordering and result changes that didn't have any clear regional aspects, whether at the domain or story level. If localization is the active factor, it's still pretty interesting to know that where I live determines whether to show a Wikipedia link and how to order HuffPo against the Tribune.


You just have to find out the one unique website a particular user always goes to and you can identify that the person is using a computer.

Or it is 20 bits of entropy when you go to a website only a thousand other people in the world visit.


It sounds like they found a lot of variation but not that the differences are biased in any particular direction? Could this be random?

The use of "filter bubble" doesn't seem justified if it looks like random variation.


Beeing hooked on rust the programming language and rust the game at the same time has been interesting google-wise. I used to take it granted to get rust programming results. And now google seems really confused. And I'm not getting good results for either. "Personalized search matters". And also. Fuck privacy invasion.

I've observed this a lot when working on SEO for my webpage. I'll be like, "Cool, I'm the top result for my name!" And, yes, this will be true for people searching from Berkeley, where I live. But if I go to an IP address over in SF, I'm not even on the front page.

This is honestly the worst thing ever, I've started noticing this too while testing SEO for different projects.

What if google becomes like Netflix? Only shows you results you expect, honestly a progression towards that has already rendered google search quite useless for most of my searching. I prefer searching HN on Algolia, Reddit, Medium and other websites (not at the top of my head) to find unexpected resources that I expect google "search engine" to give me.


Why is a study required to know this? Google, surely, should make this known in their privacy policy?

They literally announced it on their blog in 2009.

https://googleblog.blogspot.com/2009/12/personalized-search-...

I'm a DDG fan, but they doth protest too much sometimes.


That 2009 announcement makes several very specific promises which conflict with these results. It promised signed-out customization based on prior search history (which Google has confirmed Incognito doesn't use), with a notification and option to disable that customization.

This result finds substantial variance with no notification and no toggle, which extends almost unchanged into Incognito. I think Occam's Razor says this is about localization and device type than fingerprinting, but it's definitely not just the thing they announced in 2009.


Maybe that explains why Google gave me 1st-page results about "shrooms" few days ago while my search term was specifically asking for "Champignons" and "pizza" and whether to pre-boil or use them raw.

I'm pretty certain, based on experience in my household, that Google's display network, Facebook, and Amazon are all doing some sort of targeting outside of cookies/pixels. My assumption is that it's based on your IP address.

Youtube does that as well, and it has for a long time I guess, since I have been consistently able to reproduce the following:

1. open Youtube's home page in your main browser, while logged out and after clearing cookies

2. open the same page in a "virgin" browser (e.g. a newly created VM or even just using an incognito window)

observe that 1 has some amount of "personalisation".

When I saw this the first time I was baffled, so I did some research and found out that it's because of local storage. As per step 1, I was clearing just the cookies but not local storage.

Lesson learned: don't just clear cookies, remember to clear local storage as well


Localization != personalization.

I agree, they are completely orthogonal features. The only personalization that's happening is inside the new session that the person creates when using incognito mode, which is understandable.

What's been a shame is that there is still no open source Search Engine despite this being a "solved" problem. Like not even something like there is a docker image that you throw at your cluster that gets you faster and faster search results. That's the real shame.

We should've commidified the core of search engine by now with programmatic and API access as commonplace and yet here we are where search engine software is still dominated by proprietary services.


I’m guessing that it’s impractical to self-host something big enough to be useful.

I don't believe that is a good enough limitation in this day and age of docker and kubernetes.

You take the search engine docker image and keep scaling the number of nodes that are running it in the cluster till your search result time is fast enough for you.


I am sure this already exists even as I type it, but this sounds like a great job for a distributed system running a federated search protocol.

The hard part is already solved, you don't even have to crawl the web to build the index. There is already a periodically refreshed index of the web that you can download: commoncrawl.org

Now someone just needs to configure, Apache Lucene as a proper docker image that can consume this index.


To me, constantly complaining about sites/companies doing this stuff or wanting laws to make them stop... It just feels silly and pointless.

In the long term, companies will gather and use what data they have access to. Companies will tailor their UI, product, etc. in order to keep that data flowing. A ruleset based on permission and consent is not practical, unless the goal is "better paperwork."

The solution (imho) has to come from browser software or w3c. The browser should control permissions, in broadly the same way mobile OSs/appstore control permissions and login state.

ATM Google de-anonymizes you. This should just be impossible, unless the browser tells it who you are.

^I know gdpr is popular with a lot of people here. I think it has some good parts, but I disagree with other parts. We can still be friends and disagree :)


> To me, constantly complaining about sites/companies doing this stuff or wanting laws to make them stop... It just feels silly and pointless. ... companies will gather and use what data they have access to

Yet every time there's an equifax or quora data breach at the top of the front page the consensus is almost always that companies should get less data, should keep it for less time, and end users should push back against the constant data grab.

How do we square that circle?

How do we avoid FaceGooBook when it has its fingers everywhere? Even if you avoid their front end services there's countless sites using FaceGooBook for logins, captcha, fonts, frameworks that can't simply be uBlocked away without breaking something.

Are we meant to simply trust FaceGooBook that data from those sources isn't added to the others?

Without the pushback of things like GDPR (I think it has parts that won't go remotely far enough - mainly national budgets for ICO enforcement), end users seem to have neared a point of "tough, you lost".


Safety/Quality regulations reduces choices and quantity. This hits poor the most, as they can no longer buy inferior but cheap products and services. See China.

Instead regulations should be focus on reducing scam and misleading offerings. If someone wants to buy/use despite knowing the risks, let him.


> Safety/Quality regulations reduces choices and quantity.

They also prevent people dying when they plug in their appliance, or dying because there's lead paint on the product, or buying flour that's adulterated with alum, plaster of Paris or chalk.

The history of food and electrical regulations, on both sides of the Atlantic, are enough to convince me that rich and poor (I have been both) are better served by enough regulation to ensure basic standards are met. In the case of the US and food, prior to such regulation, adulteration was more common than not[0].

> If someone wants to buy/use despite knowing the risks, let him.

This is never the case. The risks are hidden, the product masquerades as a genuine iPhone charger, or contains unsafe substances that cannot be known without laboratory testing. Data is taken "to provide a better service", without mention of the 206 places it's sold to, or other uses for which it is mined, or the fun psychological experiments staff might run on their users.

To relate it back to the original discussion about data, I am fully in favour of regulations that demand adequate safeguards and protections of personal data, and high expectations of diligence from companies that must use such data. It goes without saying that I am in favour of severe penalties for egregious breach of such regulations.

[0] https://www.abc.net.au/news/2018-11-04/harvey-wiley-us-chemi...


... self reporting data leaks is "the good part" of gdpr that I mentioned in my caveat. :) I think that's important, effective and in bureaucratic.

The attempt to increase formal contract making between website and user... that's the part I find frustrating.


This isn't necessarily de-anonymizing, it is just taking your recent history into account. From 2009:

For example, since I always search for [recipes] and often click on results from epicurious.com, Google might rank epicurious.com higher on the results page the next time I look for recipes. Other times, when I'm looking for news about Cornell University's sports teams, I search for [big red]. Because I frequently click on www.cornellbigred.com, Google might show me this result first, instead of the Big Red soda company or others.

Previously, we only offered Personalized Search for signed-in users, and only when they had Web History enabled on their Google Accounts. What we're doing today is expanding Personalized Search so that we can provide it to signed-out users as well. This addition enables us to customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser. It's completely separate from your Google Account and Web History (which are only available to signed-in users).


> To me, constantly complaining about sites/companies doing this stuff or wanting laws to make them stop... It just feels silly and pointless.

A significant percentage of HN content is posts about "facefriend does this with your data" or "google does that with your data." I suspect this might be why it feels especially silly and pointless, because currently the noise to signal ratio is quite high if you spend any time on HN. For the vast majority of the rest of the population, it's all noise and no signal. So here we are with one group that works "in the industry" that is largely desensitized to it, and another group that doesn't care, or know they should care.


> wanting laws to make them stop... It just feels silly and pointless.

> The solution (imho) has to come from browser software or w3c.

Generally in society when there is a natural motive for one party to wrong another, we use law to mitigate it. Technical solutions make sense, as there are always bad actors; people still lock their doors to deter burglars. But we don't expect to need to wear a suit of armour outside our houses, because highly motivated assailants act undeterred.

Saying the solution rests entirely on the client side invites a kind of arms race, where wrongdoing is fair game.


I'm not saying wrongdoing is fair game (obviously) or that laws are pointless. They just don't work for everything.

Laws' side effects can be as important as the straightforward intention. For example, the gdpr reporting requirements for data leaks is a good idea. It's nice that people can know data about them has leaked, but the bigger reason is the side effect... the transparency will improve security.

The side effect of consent/permission requirement is an army is of compliance lawyer and vendors. They make newspapers compliant by tweaking paperwork, popups and such... not by actually doing anything that improves privacy.


> They make newspapers compliant [...]

That's debatable. Almost nothing actually meets the GDPR's requirements for freely given consent (that non-technically required consent can't be tied to / traded for the service, and is opt in; no pre-ticked boxes.)


>The solution (imho) has to come from browser software or w3c

What is this “w3c”? I thought Web standards are written by browser vendors like Google, Apple, and ummm... there are some others too but I don’t remember their names.



My search results have been personalized for years without being logged in. I didn't realize this was surprising.

It's a little more shocking when they can identify 3 different users on 1 pc behind one ip in roughly 1 search.

Seriously, freaks me out. I search on youtube for airplanes on the computer tv and suddenly a ton of music I like pops up. Kid searches for some kids show, and many more pop up. Wife searches for reviews or some shitty trash tv shows, and she get's all her interests recommended next. They know us on one computer on one ip without being logged in.


study finds in 2018. I found out google was 'listening' when I got youtube recommendations on my friends laptop in 2012, when I wasn't signed in, that I get on mine. Btw our youtube habits are different. Given his playstation had different recommendations.

So if I search for the value of PI while visiting Indiana it could show "3?"

we have known this and they state it publicly... who funded this?

Well .. they certainly do a shitty job at personalizing my results. These days I can never find what I want on Google. I've resorted to using other search tools (github search, reddit search, stack overflow search, ddg, etc)

I feel like this has happened everywhere. YouTube included.

Same experience. Oh well. All good things come to an end. I wonder what happened?

Safari 12 is impressive when it comes to preventing browser fingerprinting and cross-site tracking. I wish it was cross platform. Not sure though how much it can do against Google.

New study ? You just have to open two different browsers and open use them for a couple of days and its quite obvious. Question is whats the problem with this ?

my own private Idaho['google']

Even if I use an addon that deletes my cookies from Google every time I close the tab?

Do you switch IPs, browsers, computers, etc every time you close the tab too? If not, Google tracks you.

Even if you did all that Google will track you. Whether they match up all your identities or not is another story.

Bring a laptop from your home back to your folks place and I bet they can make the connection...

No doubt whatsoever. A lot of those "Facebook is listening to my conversation" claims come from the fact that they'll associate people based on network locations, IPs, etc.

For example: You're at your friends house with the Facebook app, your friend clicks on an ad for socks. The next day you start seeing ads for socks despite not having any sock-related activity. This is just an oversimplified example.


Google has more than cookies deployed to get after you.

There are a myriad number of tracking vectors they can use. (And to be fair, so can any of the other companies out there tracking everything.)

Deleting cookies is certainly a start, but it's not sufficient to maintain privacy.


i know there's a bunch of ways to track people without cookies (supercookies, browser fingerprinting, etc.), but is there any evidence that they're being deployed by google or any of the big western advertising companies?

I can say for certain that I worked for a company that wasn't Google and employed plenty of ways to track people that didn't involve cookies.

We used it mostly to try and track malicious users (banned users, etc.) but it's still tracking nonetheless. It was reasonably effective.

I'm sure a company of Google's size and sophistication has many more advanced methods than we did.


Nobody is questioning their ability. It’s their willingness that’s relevant, and considering they are liable for a far more damaging PR nightmare than almost any other company, it’s not out of the question that they might not actually use every trick in their arsenal.

This would all be fine if it were opt-in but all these companies who insist on excessive reliance on algorithms just end up making their service worse.

Strange as it is, I was just researching whether it was possible to search Google truly anonymously - stumbled upon a Firefox add-on called Searchonymous [0]. Planning on trying it out today.

If all else fails, can always use another search engine ;)

[0] https://addons.mozilla.org/en-US/firefox/addon/searchonymous...


Is anybody the least bit surprised? It’s been obvious forever that google has absolutely zero respect for privacy.

Yes, this can be rationalized as ‘improving search results’, and indeed the results may be better.

But, it’s also building a personal profile without consent, or indeed with implied lack of consent.

If google cared about privacy, they would simply offer people the option not to be tracked, and respect it.

They don’t.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: