
Measuring the “Filter Bubble”: How Google is influencing what you click - The_Reto
https://spreadprivacy.com/google-filter-bubble-study/
======
alopecoid
Google can and should customize search results based on location, and it's not
just about "local articles" as the article suggests. If enough people from the
same general location click on certain results more frequently, then those
results should rank higher for others who search from that same general
location.

We use Google because the results are useful, not because they are "unbiased".
Ranking implies some sort of "bias" and is what makes search results generally
useful. We don't want a search engine that does nothing clever and just spits
back unranked results. Otherwise, we would be inundated with results
containing credit card scams, porn, Bitcoin scams, Viagra ads, etc, when we
search for... pretty much anything.

In privacy (incognito and not logged in) mode, all of the above still applies.
What would NOT apply is something like: You are a vegetarian and suddenly all
of your restaurant searches rank vegetarian restaurants higher in results
while in privacy mode. Unless, of course, for some reason people in your
general location happen to mostly eat vegetarian.

In any case, if people don't like it, stop using Google and go use some other
search engine; there is absolutely nothing holding you back. More times than
not, I think people will switch back to Google because they find the results
more useful, even in privacy mode.

~~~
kiriakasis
> I think people will switch back to Google because they find the results more
> useful, even in privacy mode.

I now use duckduckgo as default search engine and my experience is mixed.

The problem with google is that sometime you search for something new and then
you see the bubble very clearly, which applies non only to search but also to
youtube (maybe even more).

The problem with duckduckgo is that you are searching for something specific
or something you saw months ago and don't remember well then google's index
and tracking can be useful.

~~~
GiuseppaAcciaio
I keep on trying to love duckduckgo but find that often even typing the exact
title of an article I'm looking for, it's not on the first page of results...

~~~
ergothus
I have Google routinely deciding that I didn't mean to use ALL three words I
searched for and "helpfully" dropping them. Sure, I can tell it "no, I really
want those", but the experience is definitely becoming more and more sub-par
for me. As bad is when a search doesn't give me what I want, so I narrow it,
only to find that Google uses my previous search to decide what I want to see
so I still end up finding similar results.

I remember when Google blew us away with Page Rank (goodbye Alta Vista!), but
in the last few years Google has gotten so good on providing entry-level
information that it's useless for finding specifics, so I expect the next Big
Thing in search to come along, though I have no idea how far out it is.

~~~
kibwen
Fun example of this: last week I was trying to figure out all the floating
point operations that can produce NaN. Go ahead and try searching Google for
"ways to make nan"; it's going to show you dozens of pages of naan recipes,
and there isn't even a link to click to make it actually search for what
you've typed (instead there's a link for _Did you mean "ways to make naan"?_,
which shows a different set of naan recipes).

~~~
noxToken
Although the tailored results have been useful, I _think_ I still like the
days back when you needed search operators. I was once looking up stuff about
electrons (the particle). A plain query of _electron_ only returned results
for the framework on my first page. Understandable.

The other annoyance is the lack of Wikipedia results. For a general topic, I
like to have a few pages to chooses from about the topic in addition to
Wikipedia. Rarely are Wikipedia results in my organic listing unless I
specifically add wiki or Wikipedia.

By the way, this[0] is how to search your query.

[0]:
[https://www.google.com/search?q=ways+to+make+%22nan%22+-%22n...](https://www.google.com/search?q=ways+to+make+%22nan%22+-%22naan%22)

~~~
dhimes
Interesting. Wikipedia what my top result when I DDG'd "floating point
operations that produce NaN"

------
BuschnicK
To make this research more interesting I'd like to see:

1) repeated queries from the same user. Do the results stay constant over time
or do they change?

2) comparisons to the same experiment run against e.g. Bing or DuckDuckGo.

It seems to me that some variation in results is to be expected because of
users hitting different backends which might be at different stages of index
rollouts. Similarly, response times of different backends matter. If for
example the video results don't come back in time you'll end up not having
them in the result set.

Lastly, the insinuation of the article is that "unbiased" search results are
clearly preferable. I'm not convinced. I for one like that STD for me is
associated with the C++ standard namespace (which I search for all the time)
rather than sexually transmitted diseases (which I luckily don't have to care
about as much).

~~~
kiriakasis
> Lastly, the insinuation of the article is that "unbiased" search results are
> clearly preferable.

the insinuation is that you should know if they are biased or that you should
be able to get unbiased result if you so wish.

It also raises suspicions on how much google tracks each user.

From this point of view what would be interesting would be a local study, to
see in 100 people all in the same neighbourhood with different browsing habits
have different results. this would eliminate the "non-tracking" part of the
personalization.

~~~
ggggtez
Unbiased compared to what?

Let's say you have three search orderings: ABC, BCA, and CAB. Which one is the
unbiased one?

~~~
c3534l
The one that isn't selected based on your identity.

~~~
moultano
They provided no evidence that it is selected based on anyone's identity.

------
tschellenbach
Google's feed does an amazing job of avoiding filter bubble. One of the best
in the industry actually. I use the product and wrote a blogpost about it:
[https://getstream.io/blog/google-feed-personalization-and-
re...](https://getstream.io/blog/google-feed-personalization-and-recommender-
systems/)

It makes total sense for them to personalize search results. If I am searching
for Django it's the framework not the musician. When I search for a restaurant
name it's the one in Boulder, Co, not a restaurant by the same name on a
different continent.

People always adjust their messaging according to who they are talking to.
It's kinda weird how it's creeping people out when computers do this.

~~~
PavlovsCat
It's not creeping me out, it's disgusting me. When I want to look for django
reinhardt, I can look for "django music" or whatever, if I want the framework
"django web framework" should do the trick. Oh, and and when I add a + before
I word I want that word to show up, if there are no results with that word,
show me no results.

I'm fine with others having the option of personalized search, I'm not fine
with me not having it.

~~~
krrrh
Just in case you aren’t aware, google deprecated the + operator years ago. If
you want to ensure a single word is in your results you can put quotation
marks around it. It _mostly_ works.

~~~
PavlovsCat
Oh.. I tried it out and it worked perfectly, thank you very much! I'll gladly
take the egg on my face in exchange for learning this :D

------
ucaetano
Official reply:

[https://twitter.com/searchliaison/status/1070027261376491520](https://twitter.com/searchliaison/status/1070027261376491520)

------
eksemplar
2018 was the year I adopted ddg, not because of privacy, but because google
result sucks.

Almost every time I search, I don’t get a single result I want on the first
page. The first 3 results are sponsored adds, then there is the Danish
Wikipedia article (useless), then 3-6 advertisements pretending to be content,
and then if I’m lucky something that was relevant 5 years ago.

DDG isn’t much better, but it’s better.

I’m not sure if search engines are really to blame though. With everyone being
on Facebook, Medium, Quora, reddit, 4chan and so on, it’s like the web just
stopped having content worth visiting.

If it wasn’t because HN gave me interesting content, I’m honestly not sure why
I’d ever browse the internet anymore. But maybe I’m just getting grumpy.

~~~
maxxxxx
I use DDG out of principle but to be honest Google is still much better for
the stuff I am searching.

~~~
eksemplar
Google is better at localized stuff, but very often I prefer the English
version. Like in the case of Wikipedia, if my goal was to look something up on
wiki I’d always pick the English version, but on google it’s on the second or
third result page.

Quora is another good example, it’s a place I often visit after search
results, but on google.dk, it’s almost never a result, possibly because it’s
not in danish.

DDG is much better, but once in a while when I’m searching for something very
specific that I know google will first, I’ll do the !g.

If I’m looking up anything technical or comitting an act of google
programming, I’ll always go straight to google.

The other day I was looking for some pipeextenders for our shower though, and
neither bing, google or DDG were able to help. I ended up finding them by
searching on amazon. Google was 100% commercials for plumbers and completely
useless otherwise. DDG and bing had no clue what I was looking for. A few
years ago, google would have been able to help, I know, because google helped
me find our current ones.

~~~
drivebycomment
I just tried googling pipeextender, tried it on both google.dk and google.com,
and the results look reasonable to me - the first row is their sponsored, but
that's actually a list of pipe extenders so still useful. Then the organic
results all look reasonable.

What do you see when you google pipeextender ?

------
nemild
A small quib: filter bubbles don’t require using personal data. But they are
about giving readers what they want.

In a very different context, I ran an analysis of terrorism coverage in the NY
Times to measure what a geographic filter bubble looks like:

How Media Fuels Our Fear of Western Terrorism

[https://www.nemil.com/s/part2-terrorism.html](https://www.nemil.com/s/part2-terrorism.html)

I also ran the same analysis for all the articles over a decade by geography
(and compared to population, GDP, etc):

Visualizing 10 years of International Coverage in the NY Times

[https://www.nemil.com/s/nytimes-international-
coverage.html](https://www.nemil.com/s/nytimes-international-coverage.html)

While filter bubbles are more pervasive in digital media (where we can segment
each user, including with personal information), they’ve also always existed.

------
scarejunba
The filter bubble is the best thing ever. I use search engines to find things
and Google finds them for me. I want it to be super tailored to me and show
reputable results.

I remember 2000s era search and looking at Page 2. Now I don't scroll below
result five 99% of the time. Thank you, Google.

I have to say, though. US Google is better than any other Google I've used.

~~~
specialist
Agreed.

The problem is the term "filter bubble" conflates personalization, relevance,
and recommendations.

I can do without the recommendation engines.

Source: Worked on a recommenders for mid-sized e-commerce site.

------
DrPhish
I'm hosting my own SearX [0] instance to try and eliminate search bubble and
control my search history.

SearX is a metasearch engine that proxies out search requests and randomizes
all browser fingerprints to make it difficult for any individual to be tracked
via algorithm. I don't know how effective it is of course, but I find I prefer
the search results I get out of it vs google, even if the image search
interface isn't as flashy.

I put my instance behind https and simple auth to allow me a bit of security
while using it outside of my private network.

If you want the privacy shield vs google/bing/etc and don't mind a middleman
having your search history, there are public SearX instances as well [1].

[0] [https://github.com/asciimoo/searx](https://github.com/asciimoo/searx)

[1] [https://www.searx.me/](https://www.searx.me/)

~~~
decebalus1
My search productivity greatly improved when I switched to self-hosted searx.
I tried to advocate this to my network of friends but with little success. I
run it in a docker container and it's just so easy to manage and the results
are so much better.

Being open source, you're free to fiddle with it anyway you want and I
consider it as a sort of condom for your privacy.

------
moultano
"With no filter bubble, one would expect to see very little variation of
search result pages — nearly everyone would see the same single set of
results."

This is the assumption underlying their research, and it is fundamentally not
true.

~~~
puzzle
The people that wrote that have obviously never run an actual search engine at
scale.

------
matt4077
The study’s result seem to be that users often get unique results. That’s not
the same as “personalized”, and it certainly isn’t evidence of “bias” as the
spreadprivacy.org-link suggests. A good faith interpretation would point to
google running learning algorithms on their results. That would also seem to
be a far better explanation for Google changing parts of the page layout, such
as the position of news and video results. The use of the term “bias” for
describing differences search results also trips my conspiracy theory
detectors.

------
jccalhoun
This doesn't show evidence of a filter bubble. It shows evidence of different
results. The filter bubble is the idea that we are in a bubble, cut off from
differing viewpoints.

(additionally, I am highly skeptical of the filter bubble's existence/effects
and the book was terrible - full of "mights" and "coulds" and few solid
facts.)

------
acd
I have a theory that filter bubbles are causing intolerance to other peoples
views. Ie most of the content we consume are through filter bubbles. In other
words that most people consume content that are tailored to them. Thus we have
less acceptance of things that are not similar as we are less exposed to
different content.

Filter bubble examples: Search services: Google, Bing Movies: Netflix Music:
Spotify, Apple Music recommendations News: Facebook Social media: Facebook
feeds

~~~
onemoresoop
That's exactly what they're causing because you have no easy way to search
outside your bubble and end up thinking that's the status quo in fact dis-
informing you by omission.

------
jimmytucson
When they say "bubble", I think of groups of users with fewer differences
within the group than outside the group, sort of like the Wall Street Journal
study showed. If they're finding variation among users, but not predictably
more or less variation between any two of them, then that isn't a "bubble" to
me, it's just customization.

Customization is troubling, but less so than bubbling. (Hey now...)

------
cowkingdeluxe
Does Google constantly run AB tests on links to see which has higher CTR in
given positions?

------
mastazi
> Most people expect both being logged out and going "incognito" to provide
> some anonymity. Unfortunately, this is a common misconception as websites
> use IP addresses and browser fingerprinting to identify people that are
> logged out or in private browsing mode.

Firefox offers integrated protection against browser fingerprinting, but you
have to turn it on because it's off by default:
[https://support.mozilla.org/en-US/kb/firefox-protection-
agai...](https://support.mozilla.org/en-US/kb/firefox-protection-against-
fingerprinting)

Fingerprinting protection is also available on Safari on Mac OS X Mojave and
iOS 12: [https://www.cnet.com/news/new-safari-privacy-features-on-
mac...](https://www.cnet.com/news/new-safari-privacy-features-on-macos-mojave-
and-ios-12-crack-down-on-nosy-websites/)

------
xte
Few sparse consideration: "push to extreme effect" or when you see something
that marginally interest you but keep seeing it because of "customized"
results substantially invite you to dig deeper and in case of some kind of
results push people to the extreme like when you search a thing from a left or
right party and in few time you get more and more "lefties" or "rightish"
contents.

That's may not influence too much normal, acculturated, adults but may
influence young and unacculturated people, thinks for example at modern urban
legend like "white sugar is poison", like "chemicals trails" and they "tam-tam
effect".

Another point "censor effect": we know well that a search based information
access is less detailed than a taxonomy based one, we experience that often
when we organize our mails, documents, files, alternating taxonomy and search
based UI. When our entire world will relay on search based UI instead of
taxonomy who control search may control knowledge. So it will became easily
"hide" something, "push" something else etc.

Normally this is not a problem, it start to became a problem when very few
search system became so ubiquitous and dominant.

"convergence": tied to the first, think only about feeds vs aggregators. With
feeds you search for specific stuff and stay up to date while you tend to
ignore thing not interest for you. With aggregators this "soft polarization"
effect get somewhat lost substituted by another (potentially driven) "hard
polarization" effect. As a result general information became less diverse (any
publisher try to be at top in any aggregator result instead of follow their
style) and people became more "extreme" in their information interest.

That's have far more implication than mere privacy. And if you add to the
sauce the actual communication systems status like Whatsapp, GMail etc...

------
erva
Purposeful mixing or mild randomization of search results also seems like a
decent way to help obscure the ranking algorithms to help thwart reverse
engineering.

------
amelius
I'd like Google to show the full input to their search algorithm at the bottom
of the search results, so I know exactly which information it used.

~~~
bduerst
That would allow some pretty rampant gaming of SEO, wouldn't it?

~~~
amelius
If Google shows "these results were filtered using the fact that you're a
Caucasian, in the age-group 30-35, with a predicted income of $50.000", then
how is that going to help SEO much?

~~~
bduerst
Search keywords are a commodity. Allowing anyone to peek under the hood is
going to lead to bad actors mass scraping and gaming, even if the search
result signals you're sharing are aggregated.

------
ChuckMcM
Being able to know what someone is likely looking for, is something that
really helps the search engine experience. "I find just what I'm looking for
and it is always on the first page!" is the sound of a delighted user of a
search engine.

The cost however is discovery, which is to say things you might be interested
in but didn't know exist. To enhance discovery you often need a wide band
curator that can surface "likely" interesting things without destroying the
experience of always finding what you want.

In the world of real goods these sorts of discovery curators are enthusiast
publications which might talk about the new things coming down the road, or a
restaurant critic that is trying the new restaurants.

Real human search and discovery is a pretty personal thing. And when it goes
on all inside your head/environment its pretty acceptable too. People putting
their favorite cookbooks in a more prominent place, wearing specific fashions
that they like while only really shopping at clothing stores that support that
fashion look.

When that information is at a third party, and dissectable by tools, then it
gets creepy.

Someone who doesn't "know you"[1] but typically wants to sell you something,
can find you and market to you, to help you "discover" something new on their
schedule instead of on your schedule. When that knowledge about what you like
and don't like, pay attention to and ignore, is weaponized into a tool against
you (ostensibly to help you see "great deals" that you might have otherwise
missed) whether it is a new job opportunity, fashion choices, the vehicle you
drive, or even where you eat lunch. That is where it gets annoying. And when
the version of you that you present to the world is quite a bit different than
the version of you that only you or your most closest confidant see, and
someone outside that circle gets a peek because of your search history and
what you have shown interest in? That is an existential threat to 'outing' the
real you.

That information is power; The power to influence you, the power to sell to
you, the power to expose you, the power to control how you see the world and
ultimately control your actions in that world.

If you could imagine a machine that as people used it, it condensed bricks of
pure platinum out of the air. It was a side of effect of the machines
operation. And now you tell the owner of the machine, you can't sell that
platinum, you need to just grind it up and throw it away. Well that isn't
going to happen, even if there is a big 'for show' grinding operation taking
place up in the lobby of the machine's owner. The owner might say, "I charge
you nothing to use my useful machine, I am going to keep some of the platinum
it produces to cover expenses.

[1] I'm using the phrase in the colloquial where a "known" person is someone
who is both familiar and has been granted a certain level of access to your
inner thought processes.

------
jondubois
I don't think that search personalization is all bad though. It can be used
for both good or evil.

If search results were perfectly consistent, some smaller websites might not
get any search traffic at all and most big corporation websites would get all
the traffic. It would greatly exacerbate winner-takes-it-all effects and
inequality.

Personalization allows for some small websites to start with a niche and
slowly grow to become more mainstream.

------
throwawaylolx
So what personal information is Google using in private mode? Just location?
Browser fingerprint? IP?

------
jondubois
I'm surprised to read that the number of clicks drops by 50% if the result
moves down a single rank. Personally, I rarely click on the top link. I
usually click somewhere in the middle of the search results on the first page.

I understand why the result page number matters but the exact rank having such
a huge impact is surprising.

------
siavosh
I imagine a company with the data to infer a sequence of likely link clicks
from one identified affinity group to another, can in theory manipulate the
displayed suggested link series for a large enough population to shape
behavior/thought on a significant societal and global level.

------
citilife
> Back in 2012 we ran a study showing Google's filter bubble may have
> significantly influenced the 2012 U.S. Presidential election by inserting
> tens of millions of more links for Obama than for Romney in the run-up to
> that election

Thank god someone is discussing this. I think it's a real shame that the
"media" is focusing only on the 2016 election is discussing how Trump
manipulated voters. Sure, they ran advertisements (and Russia did), but the
reality is this is no different than any of the recent elections.

Obama was right at the forefront of this tactic:

[https://www.nytimes.com/2008/11/10/business/media/10carr.htm...](https://www.nytimes.com/2008/11/10/business/media/10carr.html)

Yet, when Trump does it (perhaps better executed, or the platforms are better)
it's "manipulating an election!"

Please, continue this research and keep it unbiased.

~~~
FilterSweep
The obvious difference being state sponsored actors from a foreign entity
weaponized this “feature.”

Not sure how you’re equating the two, but it’s disingenous.

~~~
citilife
I actually would argue it doesn't matter if it's an internal or foreign actor
that's doing it. The results / outcomes are effectively the same, the problem
is the same.

It's disingenuous to claim a feature like targeted advertising is also
"weaponized". There's always been targeted propoganda, that doesn't make it a
weapon. Everyone is capable of self deception and their own decisions. If you
argue against that, then we probably should start debating whether or not
democracy is a good idea.

The problem here, is that we've gotten to a point _anyone_ can target _any
person or subset of people_. They don't even need to be a state actor. If we
view that as bad, then we should probably research it (not just the 2016
election, but all elections).

~~~
scrollaway
The point is about the wording. That this is _possible_ at all is bad
regardless of whether it's pro-democrat, pro-republican, pro-communist or
whatever.

But you complain people are calling it "manipulating an election". They're
calling it that because of _who_ is doing it (foreign agents) and _why_ they
are doing it (to gain control over a powerful enemy state). That is what makes
it election interference and it seems very purposefully blind not to
acknowledge it.

------
LiterallyDoge
This is a really cool application of research to describe something difficult
to grok for the average user (myself included) but which really highlights the
benefit of using an anonymized search engine with static results. It's like
saying, "There's room for us both." But with real metrics and data.

