"Today we're helping people get better search results by extending Personalized Search to signed-out users worldwide (...)"
In the 2009 post, Google said:
> ...customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser... You'll know when we customize results because a "View customizations" link will appear... Clicking the link will let you... turn off this type of customization
None of that appears to be match the study results here. Incognito mode isn't supposed to pick up session cookies from normal browsing, no customization notice appears, and an option to turn the behavior off certainly isn't provided.
There should be zero expectation of privacy when using a Google service. I've switched to searx.me, which wraps Google and (optionally) a bunch of other search engines.
You’re being really flippant about speculation, which is a total red herring. The point is they’re not honest.
Using a minority browser with a lot of stuff blocked makes you very unique.
It's good for blocking 3rd party cookies but you stand out like a sore thumb to Google.
Canvas fingerprinting is highly concerning to say the least, thanks in advance to anyone with working solutions.
Since that's a client side mechanism it should be rather simple to find out, no? (eg. an addon that injects tracking into the DOM accessors necessary to get at the canvas)
+ People generally tend to miss the point that Incognito doesn't prevent sharing the IP of the user.
+ I think DuckDuckGo's study missed out using VPN in their analysis. i.e., SignedIn vs Incognito vs (Incognito+VPN)
One thing that caught my eye was Google's response about Incognito:
> The company did confirm that it does not personalize results for incognito searches using signed-in search history, and it also confirmed that it does not personalize results for the Top Stories row or the News tab in search.
Since it's a corporate reply, the standard question is what's not present: a statement that Incognito isn't personalized, or isn't personalized beyond device type and location. Perhaps I'm too cynical, but "we don't personalize using X" parses as "we do personalize in other ways".
To me this sounds reasonable. A very large number of searches are locality based, and it is entirely reasonable to localize them based on IP address (and - as you note - the device type).
It's also reasonable to customize based on recent (session based) search history (refinements, spelling corrections, etc).
The difference between this and personalization seems mostly about semantics IMHO.
I wish this was trivial to disable. I regularly search for things where I want the global result, and instead get weird local results that I don't care about. It's much easier to narrow a global search to a local one by adding an appropriate region name to the search than it is to expand a local search to a global one via search terms.
But broadly, I see three bases for objecting to these changes.
First is the lack of user control. Like many other people in this thread I often want to turn off or 'rehome' localization, not just for weird developer use cases but for obvious stuff like "I'm about to travel and want results for that location". Disabling session-based changes is a rarer desire, but comes up sometimes when a correction or topic change is interpreted as a refinement that's biasing results. Fortunately, resetting Incognito should manage that. (I've never actually wanted to bypass device type adjustments except for dev work.)
Second is inadvertent bubbles. It's easy to imagine content-neutral rules like "show fast and mobile-friendly pages to smartphones" correlating with a meaningful content difference, and the same for location. Hard to really blame Google here, but again it'd be really nice to have the option of a "stop helping" setting.
Third is Google-driven bubbles. Some of the DuckDuckGo examples showed effects like national newspaper articles on a search for 'immigration' getting reordered, or pushing above and below non-news sources. (We can't know if that was caused by location or device type, but let's look at the case where it was.) That doesn't look like basic localization, it looks like non-local results being adjusted based on user location.
This wouldn't have to be anything purposeful; if you add location into your training set and reinforce on the usual 'success' metrics (e.g. first result clicked, final result clicked), you could easily learn that people in NYC and Houston have different behavior patterns and display accordingly. It's open to debate whether this is a bad thing, but it's certainly not what most people (including the Googler who responded to the article) mean when they say "localization".
IMO the issue is not google using IP-based location info. The issue (if there really is one) is people assuming/believing the internet hides their location.
Is there? I was under the impression that Incognito and its cousins generally still accept and preserve cookies for the duration of the temporary session. This means that for this purpose, there isn't really a difference.
In this study, my understanding is that result personalization carried over from a normal browsing session into the clean, Incognito session, likely due to IP correlation or possibly through User-Agent strings. So while Incognito has its own context that is wiped once the session has ended, the result personalization didn't need anything saved in the browser to recognize who you are.
Try this: https://panopticlick.eff.org/
Specifically, check out the "fingerprinting" details.
In a normal (not Incognito) browser window, you don't have to be logged into Google for Google to read Google cookies. Logging out doesn't make you anonymous; they still know who you are.
So it comes as a bit of a surprise that it isn't as light as they have been saying recently.
One idea was to build a search engine that returns unpersonalized results. He talked about how Google will be moving into a "it's true, if it's true for you" kind of world. His idea was that it will open new opportunities. I think DuckDuckGo is one example, and they've grown and are doing pretty well. I think a lot of it comes as a reaction to Google, Facebook and other such things.
"It's true, if it's true for you" is also a great phrase worth remembering. It describes so much about the current world and where things are headed, and why some things seemed to have gone off rails.
Has anyone written anything for DuckDuckGo?
DuckDuckGo just put out a post criticizing Google for personalizing search results. They're a small company without many resources, and privacy has been their primary selling point from the get-go. Yes, DuckDuckGo could also be personalizing their own results, but if they were and people found out, there's risk of a significant outcry. Until I see evidence to the contrary, I'm inclined to assume good faith.
See also: discussions around "but how do we know for sure that Apple has better data privacy than Google?"
If you assume all organizations are equally ready to ignore user privacy, what do you use? Do you program your own everything or do you only use FSF-approved software whose source code you have personally examined and compiled?
It's all about degrees.
So your "there basically is no proof" is both wrong and a red herring.
This does more to encourage me switching to some other search engine than any privacy concern.
Practically speaking, many non-technical users _do_ want to formulate their query in an imprecise, sloppy manner and have the search engine figure it out.
Programmers are capable of and accustomed to putting a lot of care and precision into exactly how they communicate (especially when communicating with a computer), but regular people don't usually do that. They may not even be capable of it.
The only solution I can see is for a search engine to support both styles of communication. It could either learn/guess (on a per-person or per-query basis) or just let you tell it how to behave.
But yes, Google web search could definitely use improvement here. There are times when there are VERY obvious clues that I'm being precise, and Google totally misses it. For example, my printer is a DCP-L2550DW. If I search for "DCP-L2550DW margins", it will include results that don't have "margins". I could have just typed the model number, but I went out of my way to keep typing. If that's not a strong enough signal that I definitely want results related to margins, I don't know what is.
I think this is due to people more often than not finding the top page useful regardless which just reinforces the link.
But it's super annoying to find you've just waited for a 300mb download of the wrong driver.
In so many cases slightly different product numbers are equivalent and differ in things like the market they are for, or the color, or something else you may or may not care about.
It seems to me that hiding these results required both Google to do mind reading and to decode spec sheets to see if they match what you had in mind.. all from a single string of semi-random characters.
This doesn’t always work exactly but in general if you are looking for exact matches for certain terms, put those terms in quotes.
This is what finally put me on using DDG.
I think what I've noticed with technical searches is that Google tries too hard to "predict" what I am looking for. It always ends up showing me irrelevant (but surely highly ranked) fluff related to prior searches. The thing is, I don't need to see something related to my prior searches. I need to see what I am looking for TODAY. For me the "personalization" is often actively harmful.
> The only solution I can see is for a search engine to support both styles of communication.
Isn't this exactly what you're asking for? Non-technical users can be imprecise and let Google figure it out, while technical or power users can get more precise results with the various search operators?
This seemed to be true at one time, but I noticed last night that it doesn't always work.
These days it seems to put include results without the word, then provide a tiny link below each deficient result with a link to "must include" the missing word in a new search.
That used to be the previous behavior, but now it's inconsistent.
There are also other results  that include ‘monkey’ not what I asked for.
It’s infuriating. I just want to grep the internet. Is that too much to ask?
In your second example, the word "monkeys" does appear on the page, in the "Show reviews that mention" section.
More importantly, did you have a result in mind for this query that isn't showing up? It seems somewhat nonsensical.
The Pinterest page perhaps has monkeys at some point, but doesn’t now.
This isn’t a real search, I just added something whimsical (and I miss San Diego burritos). This is behavior I remember but don’t typically track so it’s hard to remember on command.
Note you can use search tools | verbatim and it will return pages will all terms. The Pinterest page is still there, but the trip advisor page is missing .
1. It's tiring to type quotes around every word every time.
2. It takes you to the other extreme. Just because I'm being more precise than Google thinks I am doesn't mean if I have 8 words in my query that all of them are mandatory. There is potential for a useful middle ground that quotes don't get me to.
Edit: Actually you could help. Show me where I set "Verbatim" as a permanent option :)
It would be great if Google had some kind of, let’s call it “Code Search” feature that searched all open source repositories.
Come on now...
Seems like dark SEO is winning.
Or maybe I've just grown numb to their presence.
Edit because I can't reply any more: As dsfyu404ed says below, it tends to drop the central term of the query, making the results 100% useless. Not that they're particularly useful even if it doesn't do that...
It's particularly infuriating when it drops the location word from a search query for info on law and regulation.
I don't care if broad form automotive insurance is available in some non-specific state, I care about if it's available in the state that I put in the query.
Edit: Why is this downvoted?
It's lunch hour at Google, and people there are hitting their phones and HN.
You can see a lot of interesting trends on HN by day and time of day. Watch what happens to any story critical of insert nation here after 6am in that nation.
Or notice how on Fridays (NA time), HN is pretty much a ghost town.
But then again, it doesn't seem like a simple problem and one that has companies in either way.
As long as I can force it with quotes, I'm happy. Google has even started suggesting next to the results "include only results with the word XXX", to remind me the quotes might be needed.
I should investigate how to write a search plugin for Firefox that " puts" "every" "word" "in" "quotes"
Someone searching "at the same time" could technically be different times (remember time zones!) for the purposes of the algorithm (is the search during "work hours" or not, etc).
Users checking results on mobile phones compared to desktops... Without more details of how they controlled for these factors, the conclusion doesn't really follow.
Edit: I think to really measure the conclusion, they'd need 87 people in the SAME geo location, perhaps on fresh out of the box devices. That would be the best way to create the "placebo" group for their test, which they don't seem to have done.
In the beginning, I was suspecting actively used logged-out cookies like Facebook infamously uses for example (try it, they're showing your face and keep tracking you all over the web). Reading on about differing search results in private mode, I was then expecting something like Google actually using IP + Fingerprint matching, which would be way more devious.
In the end, this was purely about Google showing a different page to everyone. Playing the devil's advocate, this is about the only way to escape the exploration/exploitation dilemma.
Is DuckDuckGo seriously complaining that Google is basically A/B testing everything, all the time? Because if that's the case, their data scientists should take some notes here.
How about comparing the logged-in data with logged-out / private tab data? Did they find these two sets related? If not, G could be just implementing some sort of A/B testing on grand scale (learning from clicks and making search algorithm better).
A good faith interpretation would point to google running learning algorithms on their results. That would also seem to be a far better explanation for Google changing parts of the page layout, such as the position of news and video results.
The use of the term “bias” for describing differences search results also trips my conspiracy theory detectors.
If you think DDG might have biased the structure or even falsified results, fine - make that claim. But otherwise this shouldn't really matter. Otherwise we're going to go down a rabbit hole of arguing on subjective matters, like why trust DDG, rather than pointing out objectively what's wrong with the research.
Make sure to access it while not signed in to Google, but using a browser mode which persists cookies (ie, not incognito mode). The actual control is https://www.google.com/history/optout
Here's the equivalent for YouTube signed-out watch and search history: https://www.youtube.com/feed/history . Click "Clear All Watch History," then click "Pause Watch History," then choose "Search history" and repeat those 2 steps again.
Do all of this from each device you use Google or YouTube from.
I suppose that Google wants to optimize for people typing "football" in their search engine, at the expense of those trying to type "football results $countryname", or something more precise. But this seems ultra-annoying for anything more substantial than random trivia searches.
As for language, well, years ago I set my default search engine in Chrome to explicit "https://www.google.com/search?hl=en&q=%s", because language optimization made 80+% of my searches return wrong answers. I admit I might be in a minority here, insofar as English-speaking people with work to do are a minority.
This is a constant frustration - I fight with this feature every time I search for info about some place I'm not, for instance to plan travel. Searching with a country/state/zip alongside the query helps, but it's a kludge that's much worse than actual location-specificity, especially if one is going to an ambiguous place like Portland or Washington. I'd very much prefer it if Google offered the same sort of mechanics as weather sites: defaulting to my location, then offering a little box saying "search as if I'm located in:". Instead, the only option is "use precise location", which is no more helpful than basing location on my IP and history.
In this sense, it's the same standard-uses-only paternalism that plagues Android. Maps on desktop has a great feature of "leave at" or "arrive by", which was removed from mobile in favor of a 'helpful' integration that can notify you about when to leave. It's terrible for anyone who wants to check a route for odd travel hours, or even do something as simple as decide when to set an alarm based on likely morning traffic. And so, of course, people have been requesting the more flexible option for years while consistently being told that the existing version is "more helpful".
Increasingly, this stuff doesn't just bother me on a privacy or information-neutrality level. It's actively damaging, to the point where DuckDuckGo is actually more usable for certain types of search.
Yup. It's a great example of what I complain about when I say that software is being dumbed down and loses its usefulness. As another recent thread pointed out, the handling of street labels is ridiculously broken. It makes Google Maps useless as a map - i.e. as a tool I could use to look around the place and find my bearings, or plan a route in advance, or scout the traffic across many roads without putting in a destination.
I too am bothered by this more than by privacy issues, to be honest. The latter are maybe more important, but the former are causing me frustration daily.
 - https://news.ycombinator.com/item?id=18358902
I use this feature all the time and it's still there on the mobile versions, as far as I know. Ask for directions to a place, you'll see a list of routes, above the routes is a dark blue bar saying "depart at HH:MM". Touch that and it will let you select departure time, arrival time, or the last train.
Maybe this feature has moved around but it’s definitely there on mobile. If you touch an individual route it will take you to a different screen which doesn’t have this button, so go back to the list of routes and it will be there.
At any rate, I just tried to enter a destination from the main screen, hit directions, and provide a source. I also tried hitting 'Go' and then entering the destination, and tried each with "my location" and a specific address. In every case, I get several routes. The shortest is already highlighted, they all list times if I leave now, and there's a top bar asking if I want to drive, walk, etc. Touching 'back' returns me to the destination, with no departure source, so there's no list of routes which doesn't have one preselected.
From there, all I can do is pick a route and start, which begins navigation. None of available menus before or after starting let me set a "depart at" time.
If you're willing, could you tell me where our flows differed? I'm always curious about this sort of fragmentation, or if this is available I'd love to know where.
(Poking around, I did just discover "remember monthly driving stats", which is new since I last opened that menu and on by default. So thank you for letting me turn that off - and another way in which I'm irritated by Maps.)
In particular, DuckDuckGo attempted to check for the effects of fresh browser windows, changes from localization, and A/B testing or incomplete rollouts.
- The study results found that the 'distance' between two people's incognito results was 2.8 times larger than the distance between one person's normal and incognito results. If this is correct, the tweet's claim that you can use Incognito to see the impact of personalization for yourself is seriously misleading.
- The study attempted to control for localization by tokenizing all local results, so that two result sets which each had a local story in the same spot would be treated as identical.
- The study attempted to control for rollout and testing effects by running all searches at the same time and assuming that those forces would lead to most users seeing similar results, with a few differing. Instead, they found substantial variance across all users.
I can certainly come up with explanations for how each of these changes could be non-personalized. Incognito variance could be a consequence of region and device effects which are user-independent. Tests might be scaled up near 50/50, and rollouts may not hit all datacenters at once, invalidating the "changes for a few users" assumption. And most significantly, DuckDuckGo's localization control looks completely inadequate to me. It would completely fail the given "football" example, and might fail for the given search terms: 'vaccine' produced highly-localized results they adjusted for, but 'gun control' could produce pseudo-local results like a preference for national stories with the user's state as a keyword, and that wouldn't have been identified.
A better version of this study would presumably try to alter its variables separately, for instance by using multiple devices in one location and one device in multiple locations via VPN. As is, the controls seem seriously lacking.
But I'm just making up reasons, and frankly they don't seem likely to explain the sheer size of the differences. When I run those searches, I don't get "football" and "Paris" level customization, I get a bunch of national-scope results that have no obvious reason to vary between incognito users. I wish the tweets here had touched on any of that. As is, they explain general search mechanics while frustratingly bypassing the most significant claims.
I have no clue how DDG works, but I am familiar with the Google search infrastructure as of several years ago and if you told me that the same query should always return the same identical results, I would tell you that you just don't know how Google works, even if you ignore personal results. That made monitoring and catching regressions not fun. It's better than running searches days or weeks apart, but running queries at the same time as they did does not guarantee anything, sorry.
Keep also in mind that the average query goes through thousands of machines, according to public statements. Behavior at the long tail can differ for obvious reasons. From a Jeff Dean talk, https://static.googleusercontent.com/media/research.google.c...
| Some issues:
| –Variance: query touches 1000s of machines, not dozens
| • e.g. randomized cron jobs caused us trouble for a while
| -Availability: 1 or few replicas of each doc’s index data
| • Availability of index data when machine failed (esp for important docs): replicate important docs
(see also the slides about Universal search, as well as patterns such as the elastic systems and the different tiers)
That's literally what is being demonstrated here.
DuckDuckGo's control definitely wasn't adequate, but in return Google's examples ('football' and 'Paris') have unambiguous localization value that's not present for two different Americans searching "immigration".
Quite a lot of the variance was link reordering and result changes that didn't have any clear regional aspects, whether at the domain or story level. If localization is the active factor, it's still pretty interesting to know that where I live determines whether to show a Wikipedia link and how to order HuffPo against the Tribune.
Or it is 20 bits of entropy when you go to a website only a thousand other people in the world visit.
The use of "filter bubble" doesn't seem justified if it looks like random variation.
What if google becomes like Netflix? Only shows you results you expect, honestly a progression towards that has already rendered google search quite useless for most of my searching. I prefer searching HN on Algolia, Reddit, Medium and other websites (not at the top of my head) to find unexpected resources that I expect google "search engine" to give me.
I'm a DDG fan, but they doth protest too much sometimes.
This result finds substantial variance with no notification and no toggle, which extends almost unchanged into Incognito. I think Occam's Razor says this is about localization and device type than fingerprinting, but it's definitely not just the thing they announced in 2009.
1. open Youtube's home page in your main browser, while logged out and after clearing cookies
2. open the same page in a "virgin" browser (e.g. a newly created VM or even just using an incognito window)
observe that 1 has some amount of "personalisation".
When I saw this the first time I was baffled, so I did some research and found out that it's because of local storage. As per step 1, I was clearing just the cookies but not local storage.
Lesson learned: don't just clear cookies, remember to clear local storage as well
We should've commidified the core of search engine by now with programmatic and API access as commonplace and yet here we are where search engine software is still dominated by proprietary services.
You take the search engine docker image and keep scaling the number of nodes that are running it in the cluster till your search result time is fast enough for you.
Now someone just needs to configure, Apache Lucene as a proper docker image that can consume this index.
In the long term, companies will gather and use what data they have access to. Companies will tailor their UI, product, etc. in order to keep that data flowing. A ruleset based on permission and consent is not practical, unless the goal is "better paperwork."
The solution (imho) has to come from browser software or w3c. The browser should control permissions, in broadly the same way mobile OSs/appstore control permissions and login state.
ATM Google de-anonymizes you. This should just be impossible, unless the browser tells it who you are.
^I know gdpr is popular with a lot of people here. I think it has some good parts, but I disagree with other parts. We can still be friends and disagree :)
Yet every time there's an equifax or quora data breach at the top of the front page the consensus is almost always that companies should get less data, should keep it for less time, and end users should push back against the constant data grab.
How do we square that circle?
How do we avoid FaceGooBook when it has its fingers everywhere? Even if you avoid their front end services there's countless sites using FaceGooBook for logins, captcha, fonts, frameworks that can't simply be uBlocked away without breaking something.
Are we meant to simply trust FaceGooBook that data from those sources isn't added to the others?
Without the pushback of things like GDPR (I think it has parts that won't go remotely far enough - mainly national budgets for ICO enforcement), end users seem to have neared a point of "tough, you lost".
Instead regulations should be focus on reducing scam and misleading offerings. If someone wants to buy/use despite knowing the risks, let him.
They also prevent people dying when they plug in their appliance, or dying because there's lead paint on the product, or buying flour that's adulterated with alum, plaster of Paris or chalk.
The history of food and electrical regulations, on both sides of the Atlantic, are enough to convince me that rich and poor (I have been both) are better served by enough regulation to ensure basic standards are met. In the case of the US and food, prior to such regulation, adulteration was more common than not.
> If someone wants to buy/use despite knowing the risks, let him.
This is never the case. The risks are hidden, the product masquerades as a genuine iPhone charger, or contains unsafe substances that cannot be known without laboratory testing. Data is taken "to provide a better service", without mention of the 206 places it's sold to, or other uses for which it is mined, or the fun psychological experiments staff might run on their users.
To relate it back to the original discussion about data, I am fully in favour of regulations that demand adequate safeguards and protections of personal data, and high expectations of diligence from companies that must use such data. It goes without saying that I am in favour of severe penalties for egregious breach of such regulations.
The attempt to increase formal contract making between website and user... that's the part I find frustrating.
For example, since I always search for [recipes] and often click on results from epicurious.com, Google might rank epicurious.com higher on the results page the next time I look for recipes. Other times, when I'm looking for news about Cornell University's sports teams, I search for [big red]. Because I frequently click on www.cornellbigred.com, Google might show me this result first, instead of the Big Red soda company or others.
Previously, we only offered Personalized Search for signed-in users, and only when they had Web History enabled on their Google Accounts. What we're doing today is expanding Personalized Search so that we can provide it to signed-out users as well. This addition enables us to customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser. It's completely separate from your Google Account and Web History (which are only available to signed-in users).
A significant percentage of HN content is posts about "facefriend does this with your data" or "google does that with your data." I suspect this might be why it feels especially silly and pointless, because currently the noise to signal ratio is quite high if you spend any time on HN. For the vast majority of the rest of the population, it's all noise and no signal. So here we are with one group that works "in the industry" that is largely desensitized to it, and another group that doesn't care, or know they should care.
> The solution (imho) has to come from browser software or w3c.
Generally in society when there is a natural motive for one party to wrong another, we use law to mitigate it. Technical solutions make sense, as there are always bad actors; people still lock their doors to deter burglars. But we don't expect to need to wear a suit of armour outside our houses, because highly motivated assailants act undeterred.
Saying the solution rests entirely on the client side invites a kind of arms race, where wrongdoing is fair game.
Laws' side effects can be as important as the straightforward intention. For example, the gdpr reporting requirements for data leaks is a good idea. It's nice that people can know data about them has leaked, but the bigger reason is the side effect... the transparency will improve security.
The side effect of consent/permission requirement is an army is of compliance lawyer and vendors. They make newspapers compliant by tweaking paperwork, popups and such... not by actually doing anything that improves privacy.
That's debatable. Almost nothing actually meets the GDPR's requirements for freely given consent (that non-technically required consent can't be tied to / traded for the service, and is opt in; no pre-ticked boxes.)
What is this “w3c”? I thought Web standards are written by browser vendors like Google, Apple, and ummm... there are some others too but I don’t remember their names.
Seriously, freaks me out. I search on youtube for airplanes on the computer tv and suddenly a ton of music I like pops up. Kid searches for some kids show, and many more pop up. Wife searches for reviews or some shitty trash tv shows, and she get's all her interests recommended next. They know us on one computer on one ip without being logged in.
For example: You're at your friends house with the Facebook app, your friend clicks on an ad for socks. The next day you start seeing ads for socks despite not having any sock-related activity. This is just an oversimplified example.
There are a myriad number of tracking vectors they can use. (And to be fair, so can any of the other companies out there tracking everything.)
Deleting cookies is certainly a start, but it's not sufficient to maintain privacy.
We used it mostly to try and track malicious users (banned users, etc.) but it's still tracking nonetheless. It was reasonably effective.
I'm sure a company of Google's size and sophistication has many more advanced methods than we did.
If all else fails, can always use another search engine ;)
Yes, this can be rationalized as ‘improving search results’, and indeed the results may be better.
But, it’s also building a personal profile without consent, or indeed with implied lack of consent.
If google cared about privacy, they would simply offer people the option not to be tracked, and respect it.