Hacker News new | comments | ask | show | jobs | submit login
Measuring the “Filter Bubble”: How Google is influencing what you click (spreadprivacy.com)
357 points by The_Reto 44 days ago | hide | past | web | favorite | 127 comments

Google can and should customize search results based on location, and it's not just about "local articles" as the article suggests. If enough people from the same general location click on certain results more frequently, then those results should rank higher for others who search from that same general location.

We use Google because the results are useful, not because they are "unbiased". Ranking implies some sort of "bias" and is what makes search results generally useful. We don't want a search engine that does nothing clever and just spits back unranked results. Otherwise, we would be inundated with results containing credit card scams, porn, Bitcoin scams, Viagra ads, etc, when we search for... pretty much anything.

In privacy (incognito and not logged in) mode, all of the above still applies. What would NOT apply is something like: You are a vegetarian and suddenly all of your restaurant searches rank vegetarian restaurants higher in results while in privacy mode. Unless, of course, for some reason people in your general location happen to mostly eat vegetarian.

In any case, if people don't like it, stop using Google and go use some other search engine; there is absolutely nothing holding you back. More times than not, I think people will switch back to Google because they find the results more useful, even in privacy mode.

> I think people will switch back to Google because they find the results more useful, even in privacy mode.

I now use duckduckgo as default search engine and my experience is mixed.

The problem with google is that sometime you search for something new and then you see the bubble very clearly, which applies non only to search but also to youtube (maybe even more).

The problem with duckduckgo is that you are searching for something specific or something you saw months ago and don't remember well then google's index and tracking can be useful.

> The problem with duckduckgo is that you are searching for something specific or something you saw months ago and don't remember well then google's index and tracking can be useful.

At this point I don't treat search engines as some sort of dichotomy (Google or DDG or Edge, etc). Rather, I try to use them as a nice blend : Google for when I'm throwing darts at the dartboard and have no idea what I'm looking for, DDG for when I know exactly what I'm looking for (to the point where I can type in the url), so on and so forth.

There's absolutely nothing wrong with using multiple search platforms. Obviously Google is great for when you don't really quite know what you're looking for, but if I want to read Deadspin, typing "deadspin.com" into Google will be the exact same experience on DDG.

Seeing as how most people visit the same websites over and over again, it doesn't make since to just have 1 single search engine (e.g., a Google).

I use ddg as my primary, and one thing it's brilliant at is Mozilla Dev Network. I append my question with mdn (not !mdn- the bang uses mozilla's search and it's not as good) and I usually get my answer at the top. DDG is so reliable now that when I occasionally do have trouble in, say, a non-mdn related search) I fool around a while before I remember that I can try google.

Sounds like it would be nice to have a resurrection of Dogpile with a combination of Google and DDG.

searx offers this, has multiple independent servers, and you can even run your own if you don't trust any of them not to aggregate/sell your search data. One is searx.me (which is almost like telling someooe "Search me!" when you mean "I dunno.").

Startpage is basic'ly Google results, but the filter bubble is "all Startpage users". Also brings back some of the search operators that Google disabled.

Qwant is a European search engine that brags about privacy, but the results are hit-and-miss for me so far...haven't used it much.

You can use the !g bang to search Google from ddg.

I keep on trying to love duckduckgo but find that often even typing the exact title of an article I'm looking for, it's not on the first page of results...

I have Google routinely deciding that I didn't mean to use ALL three words I searched for and "helpfully" dropping them. Sure, I can tell it "no, I really want those", but the experience is definitely becoming more and more sub-par for me. As bad is when a search doesn't give me what I want, so I narrow it, only to find that Google uses my previous search to decide what I want to see so I still end up finding similar results.

I remember when Google blew us away with Page Rank (goodbye Alta Vista!), but in the last few years Google has gotten so good on providing entry-level information that it's useless for finding specifics, so I expect the next Big Thing in search to come along, though I have no idea how far out it is.

Fun example of this: last week I was trying to figure out all the floating point operations that can produce NaN. Go ahead and try searching Google for "ways to make nan"; it's going to show you dozens of pages of naan recipes, and there isn't even a link to click to make it actually search for what you've typed (instead there's a link for Did you mean "ways to make naan"?, which shows a different set of naan recipes).

Although the tailored results have been useful, I think I still like the days back when you needed search operators. I was once looking up stuff about electrons (the particle). A plain query of electron only returned results for the framework on my first page. Understandable.

The other annoyance is the lack of Wikipedia results. For a general topic, I like to have a few pages to chooses from about the topic in addition to Wikipedia. Rarely are Wikipedia results in my organic listing unless I specifically add wiki or Wikipedia.

By the way, this[0] is how to search your query.

[0]: https://www.google.com/search?q=ways+to+make+%22nan%22+-%22n...

Interesting. Wikipedia what my top result when I DDG'd "floating point operations that produce NaN"

Did you try your own sentence "floating point operations that can produce NaN." "Making nan"is a very tricky query, because it has no context that is related to floating point. I think it is not fair to expect anything else. If you give a slight context like "operations that make nan" you will get results.

Yeah, "ways to make nan", even being familiar with the domain, is an extremely unintuitive way to phrase that query.

Holy cow that's bad. It says: Showing results for ways to make naan. Search instead for 'ways to make nan'.

Fair enough, so click on the Search Instead and it says "Did you mean: ways to make naan" and then shows a bunch of links about breadmaking anyways.

You can use: ways to make +"nan"

Disclaimer: didn't test it, but I often use this trick to force G to use some word, and to use it exactly as written.

Google dropped support for + years ago but you could try using -naan and "nan".


OMG, well that could've been made more known. Wonder what the cumulative cost to my life has been.

Ha, what do you know... :D I actually always use startpage.com, but since they are using Google, I assumed they passed this query to it. What is interesting is that this trick still works on startpage.com while it doesn't on google... I must say I'm even less impressed by Google search than I was before. It's getting worse and worse.

wow I didn't know that

a search for your question "all the floating point operations that can produce NaN" gave useful results for me.

Its too bad it ignores the case of NaN (and other queries)

idk, but I think they employ multiple tokenizing strategies, including case insensitive, case sensitive,by words, by n-grams of letters...

Can you post an example? I’ve been using DDG for years, and have never encountered this.

Also, the DDG devs are certainly lurking on this thread, and can fix the class of queries in question.

No, they can't since DDG is a wrapper for Bing.

I have been using DDG primarily ever since Google tried to pull that login shenanigan on Chrome users. Switched to FF and DDG and with the occasional !g, I'm content. Try !b and tell me it's the same as DDG's results.


> In fact, DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (our crawler) and crowd-sourced sites (like Wikipedia, stored in our answer indexes). We also of course have more traditional links in the search results, which we also source from a variety of partners, including Oath (formerly Yahoo) and Bing.

What this means is that they use 400 sources for things like Instant Answers and other widgets but Yahoo and Bing for all their organic search results.

So its 95% bing/yahoo. I wish they were more transparent. But it does not sound nice when they say do not useevilgoogle instead use our slightly modified bing/yahoo wrapper.

Is it against the ToS of Google to perform search on someone's behalf and relay them the results?

I think they sometimes mix yandex, yahoo and other stuff as well. There is a reason they never disclose what percentage of queries are served from which source, kinda spoils their magic i guess. Kudos to bing team though, they seem to improved quite a bit apparently.

This happens to me sometimes, and is why "!g" is DDG's killer app.

I use DuckDuckGo as my main search engine. One place it drops the ball is in searching for anything health related. Top of the page: homeopathy, conspiracy theories and supplement salesmen.

Youtube in particular is terrible for filter bubbles.

In the name of optimizing for 'engagement', my youtube recs are full of politically polarized clickbait. They're not merely reinforcing my existing beliefs, they're actively trying to push me into a bubble.

At least facebook has the excuse of actual people pushing this stuff.

I’ve noticed this too. I’ve never clicked any of them yet they keep getting recommended.

My other favorite example is Netflix and WWII docs/movies. Watch just one and forever onward they will be half your recommendations.

I tried duckduckgo, but wasn't pleased with the results. I switched to startpage which has been very good.

This is all well and good for information retrieval, but people are making decisions based on information from search results.

In your vegetarian example, what if 51% of people were vegetarian in an area, and the general population was making decisions off these "localized" search results. We would likely expect that this would influence the minority to the tastes of the majority.

This might be fine for something like vegetarianism, but what about other topics? Should your search results be more racist because you live around a lot of racists? This is best case.

I have tangentially worked with groups that specifically utilize this to provide public opinion sway and consumer capture for their clients.

A contrived example to clarify further: Let's say that there is a link, foo.gov/taxes_in_retirement. Locations with a high concentration of retirees might click on the link more frequently compared to other locations. In privacy mode, from this location, a search for "taxes" might rank that link higher in results based on this activity (even though the search didn't contain the word "retirement"). This shows how a link might be ranked differently depending on search location even if the link itself is not inherently location-specific (as oppressed to, say, a local-news link).

Also, search engines can play with inconsistent ranking of results to see how click-throughs might be affected. For example, if moving a link from first to third in the result list has no effect (people continue clicking on the same link even though it's now third instead of first), then it's a pretty strong signal that the link should continue to be ranked first in future results. This experimentation of search results is even more important the more uncommon a search is because there is less confidence in the current ranking until there is more activity to base the ranking on.

Just as stores shift around product placement (front of the store, back of the store, etc), a search engine is free to shift around search results. Keep in mind that product producers might pay for better in-store product placement too, just as customers pay search engines for ad placement in search results.

You can't just handwave "useful" as an explanation for why location-based SERP discrimination is desirable.

It is "useful" for me to be on the phone with someone in Cleveland and describe how to find something on the Web, expecting that they can follow a similar set of steps at a similar time and get a similar result.

A (sort-of-)deterministic Web can be good and useful. It is a very strong statement of preference and exercise of power to declare that "useful" results must be meaningfully different based on the characteristics of the individual searching.

For whom is that exercise of power most beneficial? I would argue that a rapidly shifting, slippery, personally-dependent presentation of the world's information is extremely useful as a tool of control, but gives only occasional and relatively marginal benefit to individual searchers.

The 2016 US election is a big case in point. Personalizing information delivery, when coupled with asymmetric processing power and data availability, lets you have situations where an atomized polity winds up seeing what suits each individual, but with a radically degraded ability to form collective truths or consensus.

The definition of "useful" is an exercise of power.

> Google can and should customize search results based on location

I feel more and more these ideas of optimizing for 95% of the use cases give good result on paper but shitty lives for the 5% left.

I understand the good intentions behind that calculation, because making life easier for a huge majority of people should be a good thing.

But for instance boosting local results is one of the way you’ll make people often searching for foreign information miserable. Searching for remote places will most of time be met by random local businesses first. Web based international content will be outranked by local content, and your local newspaper bitching about heat waves when it’s just summer will outrank by far rock bands and manga titles.

Sometimes that’s the wanted behaviors, but for instance currently Google already works with strong preference for localised search, and that’s one of the things that pushed me to DDG.

In a way if Google wasn’t so massively successful I’d root for them to better serve mainstream searches. But in the position they are now I think it’s harder to say they should just care about the vast majority of people. Even 1% of their userbase is an incredibly huge number.

I completely agree, and often that's how I want Google to work.

But what I would also like is a way to search without using my context, as sometimes I want results that aren't related to my location etc.

Yep. Like when I'm using a VPN, but I still want to know about stuff where I am physically located.

For your vegetarian example:

The article runs through the analysis you propose, and (within the limitations of the study) show that Google does apply very similar filter bubbling logged out in incognito mode vs logged in.

In fact in “anonymous” mode, the results are much more similar to the same person’s “logged in” mode than to other randomly chosen people’s logged in or logged out results.

What you're saying makes sense for location dependent results (e.g. searching a map of nearby places). For something like "origin of the universe" results would be very different across the world. What happens when I'm in a location I'm not usually in? Do I get the local results or the results for the place I usually reside?

But their searches were (at least partially) political in nature, so a result of a national news site talking of a local news would be relevant. "gun control" is location dependent is there is a protest at your city

I guess it comes down to what a search engine should assume your reason for searching is based on what you enter. If the query doesn't strongly imply I'm looking for "news about X"; "local opinion on X"; or "closest place related to X", I'd like it to assume I'm doing "research on X" so I get results that do not take my arbitrary location into account.

How would you like Google to decide what the correct answer to [origin of the universe], [gun control] and [vegetarianism] is?

>Google can and should customize search results based on location

Should there not be an opt-out option though?

There is: Use something else.

I don't mean this in a snippy way, but truly. If it's that bothersome, why not try something else? It seems that most people instead think that they have the privilege to change the product to their desires.

> why not try something else?

Because the "something else" lacks other features. Are you suggesting that users shouldn't have "the privilege" to suggest new features or discuss already existing features?

Better than having to go to Privacy Mode (which can feel a little creepy with the spy icon and even just the name), it would be easy for Google to make a clear On/Off toggle for personalized vs raw results.

For the sake of education and not being evil (or "doing the right thing"), it would be nice to be able to view results from other typical profiles' points of view.

The arguments for personalisation make sense. But the other story is sometimes we need unbiased results. Like when reading about politics. You need the dose of "whats are the facts" rather than "What are the facts for me"

Replacing an algorithm that bombards people with propaganda with unfiltered propaganda doesn't seem like an improvement. Someone who, for example, mistakes a map of policing for a map of crime won't do well in either scenario.

The point of the article was that, even when there is no reason to do so because location and identity are identical, Google STILL gives differently ranked results. Please read more carefully.

And I've been using DDG for a bit now and have found it perfectly useable.

Sorry, but I think you are missing the point (please see my follow-up post). The point is, search results are not static even for the same inputs (search string, location, etc), even in privacy mode.

That's great that you use DDG and find it useful! If Google was a true monopoly, as the current media blitz would have you believe, then you would not have been able to so easily switch to DDG (or Bing, or...).

Criteria for monopolies aren't static and unchanging. The switching cost for energy or telephony services is not what it was in 1950.

I can't say with confidence how far along Google is toward the threshold of "Monopoly" and have yet to hear an analysis that I would consider definitive in any way.

The point the rest of us are making is that even when you opt out, which google supposedly supports, it still bubbles the results based on your profile.

This is shady, at best. It probably contributes to the US’ current political instability (different propaganda / news in red states), and is also probably an unauthorized use of personal information in places like Europe that have laws about such things.

People wanted and used google precisely because it was "unbiased" and "unlocalized". It's why it became so popular.

The only reason for the biased localized results is due to corporate pressure from media and news industries.

Also, you are conflating localized and unbiased results with spam and scam. Nobody is saying google shouldn't remove scams and spam. People are just saying they want unbiased results.

>People wanted and used google precisely because it was "unbiased" and "unlocalized". It's why it became so popular.

That's contrary to everything I know about Google's history, including the origins of Backrub and PageRank. Please provide citations in your comment.

nofunsir 44 days ago [flagged]

Found the Google employee.

Could you please stop doing this? It breaks the guidelines.

> Please don't impute astroturfing or shillage. That degrades discussion and is usually mistaken. If you're worried about it, email us and we'll look at the data.


You should see how I defend Facebook or Apple.

What's good for facebook and apple is good for google?

To make this research more interesting I'd like to see:

1) repeated queries from the same user. Do the results stay constant over time or do they change?

2) comparisons to the same experiment run against e.g. Bing or DuckDuckGo.

It seems to me that some variation in results is to be expected because of users hitting different backends which might be at different stages of index rollouts. Similarly, response times of different backends matter. If for example the video results don't come back in time you'll end up not having them in the result set.

Lastly, the insinuation of the article is that "unbiased" search results are clearly preferable. I'm not convinced. I for one like that STD for me is associated with the C++ standard namespace (which I search for all the time) rather than sexually transmitted diseases (which I luckily don't have to care about as much).

> Lastly, the insinuation of the article is that "unbiased" search results are clearly preferable.

the insinuation is that you should know if they are biased or that you should be able to get unbiased result if you so wish.

It also raises suspicions on how much google tracks each user.

From this point of view what would be interesting would be a local study, to see in 100 people all in the same neighbourhood with different browsing habits have different results. this would eliminate the "non-tracking" part of the personalization.

Unbiased compared to what?

Let's say you have three search orderings: ABC, BCA, and CAB. Which one is the unbiased one?

The one that isn't selected based on your identity.

They provided no evidence that it is selected based on anyone's identity.

>2) comparisons to the same experiment run against e.g. Bing or DuckDuckGo.

Isn't DDG mostly Bing results these days anyways? (unless you're searching in Russian)


> Lastly, the insinuation of the article is that "unbiased" search results are clearly preferable. I'm not convinced. I for one like that STD for me is associated with the C++ standard namespace (which I search for all the time) rather than sexually transmitted diseases (which I luckily don't have to care about as much).

On the other hand, authors could find better names for their libraries ...

Further, there are different solutions, where the user has full control over the context of their search. For instance by maintaining a fully user-controlled list of keywords that is remembered by a cookie (which can be deleted as well).

Google's feed does an amazing job of avoiding filter bubble. One of the best in the industry actually. I use the product and wrote a blogpost about it: https://getstream.io/blog/google-feed-personalization-and-re...

It makes total sense for them to personalize search results. If I am searching for Django it's the framework not the musician. When I search for a restaurant name it's the one in Boulder, Co, not a restaurant by the same name on a different continent.

People always adjust their messaging according to who they are talking to. It's kinda weird how it's creeping people out when computers do this.

I worry it ropes me into my domain and way of thinking, without a way of getting out of it.

I also would likely mean the web framework, but if I suddenly become interested in Django music one day, I don't want Google to make assumptions.

It's not creeping me out, it's disgusting me. When I want to look for django reinhardt, I can look for "django music" or whatever, if I want the framework "django web framework" should do the trick. Oh, and and when I add a + before I word I want that word to show up, if there are no results with that word, show me no results.

I'm fine with others having the option of personalized search, I'm not fine with me not having it.

Apparently it's not hard to turn off search personalization:


This will turn off all localization except language, location, and device type (mobile vs desktop).

Google doesn't have a way to turn off localization - obviously, it'll display different results for "football" in the US than in the rest of the world, and neither result is the "official" one.

If you need to change language, that's easy, it's just in search settings.

If you need to change location or device type, that's harder. You can either use Chrome's dev tools, or a tool like http://www.isearchfrom.com/

Just in case you aren’t aware, google deprecated the + operator years ago. If you want to ensure a single word is in your results you can put quotation marks around it. It mostly works.

Oh.. I tried it out and it worked perfectly, thank you very much! I'll gladly take the egg on my face in exchange for learning this :D

Wait, why are you downvoted? I wasn't aware, and you helped me greatly. I wish HN would offer a way to just order comments by date and make flagged/downvoted things collapsed, the end. People playing political tug of war with comments in conversations they don't even partake in is just annyoing.

Karma-free HN could be a browser extension. We could call it nirvana or something.

2018 was the year I adopted ddg, not because of privacy, but because google result sucks.

Almost every time I search, I don’t get a single result I want on the first page. The first 3 results are sponsored adds, then there is the Danish Wikipedia article (useless), then 3-6 advertisements pretending to be content, and then if I’m lucky something that was relevant 5 years ago.

DDG isn’t much better, but it’s better.

I’m not sure if search engines are really to blame though. With everyone being on Facebook, Medium, Quora, reddit, 4chan and so on, it’s like the web just stopped having content worth visiting.

If it wasn’t because HN gave me interesting content, I’m honestly not sure why I’d ever browse the internet anymore. But maybe I’m just getting grumpy.

I use DDG out of principle but to be honest Google is still much better for the stuff I am searching.

Google is better at localized stuff, but very often I prefer the English version. Like in the case of Wikipedia, if my goal was to look something up on wiki I’d always pick the English version, but on google it’s on the second or third result page.

Quora is another good example, it’s a place I often visit after search results, but on google.dk, it’s almost never a result, possibly because it’s not in danish.

DDG is much better, but once in a while when I’m searching for something very specific that I know google will first, I’ll do the !g.

If I’m looking up anything technical or comitting an act of google programming, I’ll always go straight to google.

The other day I was looking for some pipeextenders for our shower though, and neither bing, google or DDG were able to help. I ended up finding them by searching on amazon. Google was 100% commercials for plumbers and completely useless otherwise. DDG and bing had no clue what I was looking for. A few years ago, google would have been able to help, I know, because google helped me find our current ones.

I just tried googling pipeextender, tried it on both google.dk and google.com, and the results look reasonable to me - the first row is their sponsored, but that's actually a list of pipe extenders so still useful. Then the organic results all look reasonable.

What do you see when you google pipeextender ?

DDG is more convenient for getting to something you already know exists: either use a bang to get there directly, or make your query specific enough that it comes up in the search result page.

On the other hand, when I expect my search engine to make a best effort guess about my intentions, I just use the !g bang, and almost always Google finds stuff that DDG misses. This is essential for research, but downright annoying otherwise (especially considering all the linkfarms like WikiHow that have spammed their way to the top of PageRank).

Yeah have you ever tried plugging an exception with the function name into Bing or DDG? It doesn't work on those, it does work on Google.

I'm quite a privacy-conscious individual but I'm not going to significantly hamper my engineering abilities to prevent google from knowing what technical problems I'm having.

You get the Wikipedia page? How fortunate. I rarely see them on the first page now even when I know there is a relevant article with matching title. Need more space for ads and DoubleClick affiliates.

I feel the same because of the YouTube changes, I used to browse videos and end up in unexpected places. That is no longer possible, i get only related videos that are either sponsored, older videos i haven't finished and nothing else. I personally am not grumpy but am actually happy that I spend less time on YouTube. Slowly but surely I spend less and less time online and I see it as a good thing.

A small quib: filter bubbles don’t require using personal data. But they are about giving readers what they want.

In a very different context, I ran an analysis of terrorism coverage in the NY Times to measure what a geographic filter bubble looks like:

How Media Fuels Our Fear of Western Terrorism


I also ran the same analysis for all the articles over a decade by geography (and compared to population, GDP, etc):

Visualizing 10 years of International Coverage in the NY Times


While filter bubbles are more pervasive in digital media (where we can segment each user, including with personal information), they’ve also always existed.

The filter bubble is the best thing ever. I use search engines to find things and Google finds them for me. I want it to be super tailored to me and show reputable results.

I remember 2000s era search and looking at Page 2. Now I don't scroll below result five 99% of the time. Thank you, Google.

I have to say, though. US Google is better than any other Google I've used.


The problem is the term "filter bubble" conflates personalization, relevance, and recommendations.

I can do without the recommendation engines.

Source: Worked on a recommenders for mid-sized e-commerce site.

I'm hosting my own SearX [0] instance to try and eliminate search bubble and control my search history.

SearX is a metasearch engine that proxies out search requests and randomizes all browser fingerprints to make it difficult for any individual to be tracked via algorithm. I don't know how effective it is of course, but I find I prefer the search results I get out of it vs google, even if the image search interface isn't as flashy.

I put my instance behind https and simple auth to allow me a bit of security while using it outside of my private network.

If you want the privacy shield vs google/bing/etc and don't mind a middleman having your search history, there are public SearX instances as well [1].

[0] https://github.com/asciimoo/searx

[1] https://www.searx.me/

My search productivity greatly improved when I switched to self-hosted searx. I tried to advocate this to my network of friends but with little success. I run it in a docker container and it's just so easy to manage and the results are so much better.

Being open source, you're free to fiddle with it anyway you want and I consider it as a sort of condom for your privacy.

"With no filter bubble, one would expect to see very little variation of search result pages — nearly everyone would see the same single set of results."

This is the assumption underlying their research, and it is fundamentally not true.

The people that wrote that have obviously never run an actual search engine at scale.

The study’s result seem to be that users often get unique results. That’s not the same as “personalized”, and it certainly isn’t evidence of “bias” as the spreadprivacy.org-link suggests. A good faith interpretation would point to google running learning algorithms on their results. That would also seem to be a far better explanation for Google changing parts of the page layout, such as the position of news and video results. The use of the term “bias” for describing differences search results also trips my conspiracy theory detectors.

This doesn't show evidence of a filter bubble. It shows evidence of different results. The filter bubble is the idea that we are in a bubble, cut off from differing viewpoints.

(additionally, I am highly skeptical of the filter bubble's existence/effects and the book was terrible - full of "mights" and "coulds" and few solid facts.)

I have a theory that filter bubbles are causing intolerance to other peoples views. Ie most of the content we consume are through filter bubbles. In other words that most people consume content that are tailored to them. Thus we have less acceptance of things that are not similar as we are less exposed to different content.

Filter bubble examples: Search services: Google, Bing Movies: Netflix Music: Spotify, Apple Music recommendations News: Facebook Social media: Facebook feeds

That's exactly what they're causing because you have no easy way to search outside your bubble and end up thinking that's the status quo in fact dis-informing you by omission.

When they say "bubble", I think of groups of users with fewer differences within the group than outside the group, sort of like the Wall Street Journal study showed. If they're finding variation among users, but not predictably more or less variation between any two of them, then that isn't a "bubble" to me, it's just customization.

Customization is troubling, but less so than bubbling. (Hey now...)

Does Google constantly run AB tests on links to see which has higher CTR in given positions?

> Most people expect both being logged out and going "incognito" to provide some anonymity. Unfortunately, this is a common misconception as websites use IP addresses and browser fingerprinting to identify people that are logged out or in private browsing mode.

Firefox offers integrated protection against browser fingerprinting, but you have to turn it on because it's off by default: https://support.mozilla.org/en-US/kb/firefox-protection-agai...

Fingerprinting protection is also available on Safari on Mac OS X Mojave and iOS 12: https://www.cnet.com/news/new-safari-privacy-features-on-mac...

Few sparse consideration: "push to extreme effect" or when you see something that marginally interest you but keep seeing it because of "customized" results substantially invite you to dig deeper and in case of some kind of results push people to the extreme like when you search a thing from a left or right party and in few time you get more and more "lefties" or "rightish" contents.

That's may not influence too much normal, acculturated, adults but may influence young and unacculturated people, thinks for example at modern urban legend like "white sugar is poison", like "chemicals trails" and they "tam-tam effect".

Another point "censor effect": we know well that a search based information access is less detailed than a taxonomy based one, we experience that often when we organize our mails, documents, files, alternating taxonomy and search based UI. When our entire world will relay on search based UI instead of taxonomy who control search may control knowledge. So it will became easily "hide" something, "push" something else etc.

Normally this is not a problem, it start to became a problem when very few search system became so ubiquitous and dominant.

"convergence": tied to the first, think only about feeds vs aggregators. With feeds you search for specific stuff and stay up to date while you tend to ignore thing not interest for you. With aggregators this "soft polarization" effect get somewhat lost substituted by another (potentially driven) "hard polarization" effect. As a result general information became less diverse (any publisher try to be at top in any aggregator result instead of follow their style) and people became more "extreme" in their information interest.

That's have far more implication than mere privacy. And if you add to the sauce the actual communication systems status like Whatsapp, GMail etc...

Purposeful mixing or mild randomization of search results also seems like a decent way to help obscure the ranking algorithms to help thwart reverse engineering.

I'd like Google to show the full input to their search algorithm at the bottom of the search results, so I know exactly which information it used.

Well, for starters, to add contextual meaning to the words you searched for, they probably have a pretty good language model in the background.

That model was likely build with sources such as the English Wikipedia, and their archive of a few million books. So the space at the bottom of the page may be getting a little tight by now.

Of course I meant limited to my data. I don't expect Google to print its entire search index at the bottom of my search page, just because it's an input to their search algorithm ...

That would allow some pretty rampant gaming of SEO, wouldn't it?

If Google shows "these results were filtered using the fact that you're a Caucasian, in the age-group 30-35, with a predicted income of $50.000", then how is that going to help SEO much?

Search keywords are a commodity. Allowing anyone to peek under the hood is going to lead to bad actors mass scraping and gaming, even if the search result signals you're sharing are aggregated.

Google facilitates SEO gaming by not letting us openly crowd-source the removal of websites/hosts/topics that participate in manipulative SEO or any other criteria from our search results.

When was the last time you tried to permanently remove a domain/website from search results? And not temporarily via the "negative" operator, as that gets tedious.

So instead of having an open, curated or crowd-sourced list of bad domains that can cater to any specific crowd/subset, we are forced to accept or promote serious, government and large-scale censorship in order to hide bad content.

But more to the point. It's more about centralization than it is about "filter bubbles".

Being able to know what someone is likely looking for, is something that really helps the search engine experience. "I find just what I'm looking for and it is always on the first page!" is the sound of a delighted user of a search engine.

The cost however is discovery, which is to say things you might be interested in but didn't know exist. To enhance discovery you often need a wide band curator that can surface "likely" interesting things without destroying the experience of always finding what you want.

In the world of real goods these sorts of discovery curators are enthusiast publications which might talk about the new things coming down the road, or a restaurant critic that is trying the new restaurants.

Real human search and discovery is a pretty personal thing. And when it goes on all inside your head/environment its pretty acceptable too. People putting their favorite cookbooks in a more prominent place, wearing specific fashions that they like while only really shopping at clothing stores that support that fashion look.

When that information is at a third party, and dissectable by tools, then it gets creepy.

Someone who doesn't "know you"[1] but typically wants to sell you something, can find you and market to you, to help you "discover" something new on their schedule instead of on your schedule. When that knowledge about what you like and don't like, pay attention to and ignore, is weaponized into a tool against you (ostensibly to help you see "great deals" that you might have otherwise missed) whether it is a new job opportunity, fashion choices, the vehicle you drive, or even where you eat lunch. That is where it gets annoying. And when the version of you that you present to the world is quite a bit different than the version of you that only you or your most closest confidant see, and someone outside that circle gets a peek because of your search history and what you have shown interest in? That is an existential threat to 'outing' the real you.

That information is power; The power to influence you, the power to sell to you, the power to expose you, the power to control how you see the world and ultimately control your actions in that world.

If you could imagine a machine that as people used it, it condensed bricks of pure platinum out of the air. It was a side of effect of the machines operation. And now you tell the owner of the machine, you can't sell that platinum, you need to just grind it up and throw it away. Well that isn't going to happen, even if there is a big 'for show' grinding operation taking place up in the lobby of the machine's owner. The owner might say, "I charge you nothing to use my useful machine, I am going to keep some of the platinum it produces to cover expenses.

[1] I'm using the phrase in the colloquial where a "known" person is someone who is both familiar and has been granted a certain level of access to your inner thought processes.

I don't think that search personalization is all bad though. It can be used for both good or evil.

If search results were perfectly consistent, some smaller websites might not get any search traffic at all and most big corporation websites would get all the traffic. It would greatly exacerbate winner-takes-it-all effects and inequality.

Personalization allows for some small websites to start with a niche and slowly grow to become more mainstream.

So what personal information is Google using in private mode? Just location? Browser fingerprint? IP?

I'm surprised to read that the number of clicks drops by 50% if the result moves down a single rank. Personally, I rarely click on the top link. I usually click somewhere in the middle of the search results on the first page.

I understand why the result page number matters but the exact rank having such a huge impact is surprising.

I imagine a company with the data to infer a sequence of likely link clicks from one identified affinity group to another, can in theory manipulate the displayed suggested link series for a large enough population to shape behavior/thought on a significant societal and global level.

> Back in 2012 we ran a study showing Google's filter bubble may have significantly influenced the 2012 U.S. Presidential election by inserting tens of millions of more links for Obama than for Romney in the run-up to that election

Thank god someone is discussing this. I think it's a real shame that the "media" is focusing only on the 2016 election is discussing how Trump manipulated voters. Sure, they ran advertisements (and Russia did), but the reality is this is no different than any of the recent elections.

Obama was right at the forefront of this tactic:


Yet, when Trump does it (perhaps better executed, or the platforms are better) it's "manipulating an election!"

Please, continue this research and keep it unbiased.

From your own reference:

All of the Obama supporters who traded their personal information for a ticket to a rally or an e-mail alert about the vice presidential choice, or opted in on Facebook or MyBarackObama can now be mass e-mailed at a cost of close to zero.

That's a ...newsletter? Right?

How is that comparable, except in "It's on the internet" terms, to the Russian government secretly funding ad campaigns using illicitly gained psychometric data?

The obvious difference being state sponsored actors from a foreign entity weaponized this “feature.”

Not sure how you’re equating the two, but it’s disingenous.

I actually would argue it doesn't matter if it's an internal or foreign actor that's doing it. The results / outcomes are effectively the same, the problem is the same.

It's disingenuous to claim a feature like targeted advertising is also "weaponized". There's always been targeted propoganda, that doesn't make it a weapon. Everyone is capable of self deception and their own decisions. If you argue against that, then we probably should start debating whether or not democracy is a good idea.

The problem here, is that we've gotten to a point anyone can target any person or subset of people. They don't even need to be a state actor. If we view that as bad, then we should probably research it (not just the 2016 election, but all elections).

The point is about the wording. That this is possible at all is bad regardless of whether it's pro-democrat, pro-republican, pro-communist or whatever.

But you complain people are calling it "manipulating an election". They're calling it that because of who is doing it (foreign agents) and why they are doing it (to gain control over a powerful enemy state). That is what makes it election interference and it seems very purposefully blind not to acknowledge it.

If the entity is uncovering a fact then what does it matter whether it's a foreign or domestic entity who uncovered it? The fact itself is what's interesting.

I get that an entity could be more interested in uncovering facts about one party than another but in this particular case, the fact was that the Democratic Party was acting "unethically" in some people's eyes. If they hadn't been then it wouldn't have "influenced" people's opinions.

Another way to look at it: that entity didn't annotate the emails, the content of the emails was enough to anger people.

I'm not try to fork this into a flame war, I'm genuinely interested in what other users of this site think, particularly those who think differently than I do.

> If the entity is uncovering a fact then what does it matter whether it's a foreign or domestic entity who uncovered it?

How's that even a legitimate question? I can list a few reasons why it matters:




I don't care whether it's a foreign entity or a domestic one...it's dubious and not good for the future of American politics.

Trump's not even mentioned in the article.

In fact I would even buy a filter bubble if it had a "Not Trump" switch.

The point the GP is making is that it's the 2016 election (involving Trump) that is usually brought up (though not in this article) when it comes to filter bubbles as opposed to the 2012 election.

"Yet, when Trump does it ..."

How about when a foreign state, an enemy state at that, does it?

Unfortunately, many countries have long histories of doing so. America has arguably the worst track record on this score. https://en.m.wikipedia.org/wiki/Foreign_electoral_interventi...

This is a really cool application of research to describe something difficult to grok for the average user (myself included) but which really highlights the benefit of using an anonymized search engine with static results. It's like saying, "There's room for us both." But with real metrics and data.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact