If I had to tackle the notion of over-personalization in ~5 minutes, I'd say:
- If someone prefers to search Google without personalization, add "&pws=0" (the "pws" stands for "personalized web search") to the end of the Google search url to turn it off, or use the incognito version of Chrome. Personalization tends to be a nice relevance improvement overall, but it doesn't trigger that much--when it launched, the impact was on the order of one search result above the fold for one in five search results.
- personalization has much less impact than localization, which takes things like your IP address into account when determining the best search results. You can change localization by going to country-specific versions of Google (e.g. search for [bank] on google.co.uk vs. google.co.nz), or on google.com you can click "change location" on the left sidebar to enter a different city or zip code in the U.S.
- We do have algorithms in place designed specifically to promote variety in the results page. For example, you can imagine limiting the number of results returned from one single site to allow other results to show up instead. That helps with the diversity of the search results. When trying to find the best search results, we look at relevance, diversity, personalization, localization, as well as serendipity and try to find the best balance we can.
I saw Eli Pariser's talk at TED and was skeptical, although I did enjoy his example of Facebook starting to return only his liberal friends because he only ever clicked on the links his liberal friends shared. I had a number of concerns browsing through Pariser's book, but I would encourage anyone interested in these issues to pick up a copy; it's a thoughtful read.
I agree, I don't believe that over-personalization is an issue. (As I already said in another comment.)
But here's why some people probably don't like personalization: It's invisible. There is nothing on the results page that tells you whether your results are personalized or not. Sure, you can look at whether the browser is in incognito mode or you can look for some parameter in the URL, but these things require that you already know about personalization.
In contrast to personalization there are various indicators that a page is localized. The most prominent is obviously the language of the text. As soon as all the search results are in my local language it is very obvious to me that I got localized search results. I can also detect localization by looking at the Google logo on the homepage (localized versions have the country name in grey text below the Google logo), by looking at the language of the Google interface, by remembering the domain name I used to access Google and by looking at the sidebar on the left that even displays a guess of my location on the city-level.
There are no such indicators at all for filtered/personalized results. Every user around me starts with the same version of Google results. That's how everyone got acquainted with Google in the beginning. Same results for everyone. There is no reason for a user to question that until you see the differences by comparing search results, which most users won't. Someone who doesn't happen to work at Google or didn't hear about the "filter bubble" can't know that search results will start to diverge from vanilla results over time.
So personalized Google results violate the principle of least astonishment.
While I can't speak for other people, I think that the concerns that some people have about this are rooted in the fact that you can get trapped in some sort of feedback loop without ever knowing. If you could see that your results are personalized you could compare them to unpersonalized results and decide for yourself which you like better.
Here's a simple demo. Do a search in Chrome incognito mode and go to the bottom of the page. You won't see a link that says "View customizations." Now do a search in regular Chrome and check for that link. When I did a search for [matt cutts] in regular Chrome, I saw the "View customizations" link, clicking the link gives this message:
"Search customization details: matt cutts
When possible, Google will customize your search results based on location and/or recent search activity. Additionally, when you're signed in to your Google Account, you may see even more relevant, useful results based on your web history.
The following information was used to improve your search results for matt cutts:
Web HistoryOne or more items in your Web History were used to improve search results.
Manage Web History
Remove Web History from my Google Account
If you're curious, you can see what a search for matt cutts looks like without these improvements.
The 'More details' link on your search results page can be used to display this page for approximately 30 minutes, after which it will no longer show this page."
In other words, not only can you tell whether a search results page was personalized, you can click a link right on the search results to see exactly what criteria were used to personalize the results. And that page has a clear link to run the search again without personalization.
As I mentioned before, personalization is typically a minor effect in Google's search results and it's almost always an improvement. But for people who are worried about potential "over-personalization," we do provide easy ways to see when a search was personalized, why it was personalized, and do the search again without personalization.
Thank you for the clarification. This comes as a surprise to me. I did not know that.
In my defense, I couldn't know about the "View customizations" link because I do have web history turned off, so apparently I never saw any personalized search results. After reading the DuckDuckGo page I expected that everyone's search results get personalized, especially if I am logged in with a Gmail account.
It's obviously not your fault that I didn't know about that, but, on the other hand, you can never expect from a user to know the contents of any help page. Clicking on "help" links is not what most users do. (imagine smiley face here, I don't dare to do that on Hacker News)
Additionally, I think that the "View customizations" link is a bit misleading, because usually customizations (in terms of software) are not automatic. At least I would expect that customizations are something that I do.
Also, the link seems to be placed at the bottom of the page, which means that 99% of the users are probably blind for it. (I can't verify where it is actually placed, because I don't see it.)
After all, I am thankful for the great search results that Google offers. Thank you for your hard work.
Happy to discuss this, jannes. Your points are well-taken: when we first launched the ability to see why/how results were customized, we added a link at the top-right of the search results (the Search Engine Land article has a snapshot from those days).
But there's another guiding principle that things on the search results page need to "earn" their pixels. Since personalization is a second-order effect and very very few people ever cared enough to click the link and get more info, eventually that link made its way to the bottom of the search results.
I'm sorry, but a "View Customizations" link is nowhere near as clear as a simple statement like "These search results have been personalized." right on top of the search results page, where you can't miss it.
But the real solution is to use a search engine that does not track you. Even better is to use it in such a way that it can't track you (ie. through a Tor proxy, while taking other reasonable precautions).
It's interesting that the response to people possibly not noticing the link was to make the link less noticeable.
Instead, you could have tried to make it more prominent by (for instance) moving it to the upper left rather than the upper right of the search results, right above/below the ads.
Another issue that might be interesting to explore is to what extent users really understand what search customization is, and whether they'd care more or less about it being done automatically once they understood it better.
I have a feeling the vast majority of them probably wouldn't care, and take the attitude of "do whatever it takes to make the results you return more relevant, and I don't really care how."
If the flickr image is the actual size, then no wonder it was not noticed. No matter how much I customize all my interfaces -- and with the increasing pixel count of displays -- interfaces are constantly populated with immutable 8 point fonts. Any font less than 14 points is fine for 1985 and VGA displays; but not anymore.
After knowing that there was a "View Customizations" link, it took me > 1 minute to find it. It is in the most unintuitive place where 99% of the time I don't even scroll to. Sorry this is in no way advertised.
Google also filters special terms like bittorrent in instant search. This is part of that bubble and people don't even realize it. Thats the point, that most people won't notice, not that the views are not there.
Its like experts-exchange.com putting content below the long footer of the page, yea they can claim its there but many won't notice.
Here is a slightly different question, how do you get "no country redirect" to stick reliably?
I cant begin to describe how annoying it is that I am presented with a different language when travelling to a different country. All I ever want is Google in English but it keeps going back to a localized search regardless of the many times that I choose "google in english"
Good question. I just saw an expert in the hallway and asked him. The basic answer is that your preference is stored in a cookie, so the preference would be forgotten if you're clearing cookies. If you still have the same cookie and yet the "no country redirect" isn't sticking, that's a bug we could dig into.
By the way, I asked why the "no country redirect" isn't stored with your Google account rather with a cookie. The main reason he gave was that whether to do a country redirect is one of the first things we decide, and it's faster to use a cookie for that than to go looking up the user's account setting. Or at least, cookies have been faster up until this point. Hope that helps explain things.
I regularly see this without clearing cookies, so yes, I would consider this a bug.
There seems to be no discernible pattern as to why it works in one session, suspend laptop, go somewhere else and it stops working. Or it'll work twice in a row and when I return location a (both in the same foreign country), it stops working.
This annoys me more than anything - not just google, but for a large and growing number of sites. They completely ignore your browser settings and select language based on IP address. I installed the google international search plugin from mycroft which has solved my problem with google - but still suffer the myriad of other sites that ignore my browser's config.
I think in the past we saw a lot of people with their Accept-Language header set wrong, which is why we haven't used it. But we've been having a good discussion internally about the "my language won't stick" issues raised on this thread.
Some of the time it doesn't, e.g. now when Google has custom logos I'll get a search term in Dutch (I'm in The Netherlands) when I click on it, even though I'm using google.com in English when doing so.
I don't want to jump so many hoops just to do one damn search. Personally, I hate the country specific personalization, and perhaps it's useful to most people in my country rather than showing more US results, but I'm really not interested in those types of results myself. I wish I could just check a box in my Preferences and then be able to see the universal search results.
I know I can use /ncr at the end, but I'd rather not have to do that all the time, and I think you can't even make that the default search for the Omnibox in Chrome, which means I have to give up Omnibox, which I love using, in order to get away from personalization. And even then, I think it just means I won't see my country specific results, but it probably still personalizes my search results through other types of signals.
So fine, don't make universal search the default way to search, but just give me a checkbox so I can turn it on when I want to. I want to see the best results, period - not the best results for me (or whatever Google thinks are the best results for me).
"I want to see the best results, period - not the best results for me"
But aren't you a part of the relevance equation? The ideal results for a search like [bitcoin crash] should be different for a Japanese-speaking searcher in Tokyo vs. a German-speaking searcher in Munich vs. a bitcoin expert vs. a programmer trying to diagnose why compiling bitcoin is crashing vs. my Mom who has never heard of bitcoin before, right?
Good question; one person is annoyed by this endlessly a few offices down from me. It's hard to make sure that things are handled consistently sometimes, but if you use google.com/ncr or the "Google.com in English" link at the bottom of the home page, that should help. The "Language tools" link to the right of the search box should also let you set a cookie with your language preference.
And yet for some reason, the search link from Google Toolbar seemed to ignore that cookie and certainly ignored my account preferences, sending me to the localized search page regardless.
Short version of long story: I used to bounce my net traffic through an ssh tunnel to a hosted VM. The VM was moved to a new machine in south-east asia. Having Google Toolbar constantly send my searches to the localized engine, despite the cookie selection, my being logged in and my preferences being clearly set, was more of a day-to-day annoyance than having my traffic piped across the Pacific twice.
I appreciate that you went and asked someone for the clarification that this issue is about the preference being set in a cookie and not in user settings, but this doesn't solve the problem for many of us.
For YEARS, on a weekly and sometimes daily basis, always logged-in, always with preferences set to english, it is infuriating to routinely end up receiving results that conflict with your explicit language settings.
What, concretely, do we have to do to get someone at google to push a change from using cookies to a real user-setting to fix this absurdity?
You said they did it for speed. Giving me completely incorrect results in a language I can't even read 1 millisecond faster than giving me results that I actually care about is a win? This is over-optimization.
Personalization tends to be a nice relevance improvement overall
I agree completely. Reading that page, my first thought was "if (not saying I don't) Egypt is a place I probably don't plan on going to, why should I waste time looking at those links in my search results?"
Overall, I think personalization (wow, this isn't a word?) reaches its own form of market efficiency. If the personalization algorithms are bad, then people would shy away from the search engines that provide them. However, if they make sense, and return the most relevant results most of the time, then that's saving us a lot of time.
If someone prefers to search Google without personalization, add "&pws=0" (the "pws" stands for "personalized web search")
"Append a cryptic query param" is a terrible user interface, and it's a little silly to suggest this as a solution. Make it an option real people can discover and use.
when it launched
Just curious, why add this qualification? Is it significantly different now?
the impact was on the order of one search result above the fold for one in five search results.
That statistic needs a little clarifying. Query terms frequency follows a power law distribution, doesn't it? So if you're counting each individual term, the fact that 1 in 20 have altered results could very easily still mean a majority of actual searches are altered. And depending on how you calculated the 'one search result above the fold' number, it could very easily still mean that when a page is altered, it's altered significantly.
More interesting would be knowing these stats for just the fat head of the query term distribution.
Do you have a pointer to a publicly visible reference to the definition of some of the other query string parameters Google uses? Just thought I'd ask, given the context. I've noted various references people have cobbled together, but perhaps there's something a bit more... "canonical". Thanks for the above.
A much better alternative is https://ssl.scroogle.org , which doesn't have personalization since google can't tell scroogle users apart. It also has benefits like having a bit of privacy while you search.
I think Scroogle hits Google from a relatively small set of IP addresses. Be aware that Google is probably trying to localize for those IP addresses, so your results could be less relevant (in the same way that if you searched through a proxy in Germany, you'd be more likely to get results with a German emphasis).
Personally, if I have a search that feels a bit sensitive, I just hop into incognito mode in Chrome. Control-shift-N is an easy shortcut to open an incognito window in Chrome. And don't forget that you can use https://encrypted.google.com/ to do a search via SSL as well, which provides an encrypted tunnel between your browser and Google.
Your reccomendation is hilarious because it still does not give the user any privacy. Neither private browsing mode nor SSL will give the user any privacy from google, which is the real concern. You still log searches and IP addresses for an extremely long time.
One solution that actually does solve this is to use Scroogle or to access google through Tor.
EDIT: I feel that i should clarify that of course SSL is a very important feature to have for a whole slew of reasons and that i'm glad Google supports it, just that it wasn't relevant to my point.
"Your recommendation is hilarious because it still does not give the user any privacy."
I'm a big believer in prioritizing actual issues over perceived issues. Using SSL search prevents bosses, ISPs, and governments from sniffing your queries, and I consider those to be the largest threats to your privacy. Some ISPs sell their customers' query data and surfing patterns, for example. In contrast, when the Department of Justice tried to subpoena two months worth of user queries, Google resisted that challenge and won in court. Having worked at Google for 11+ years, I know that my colleagues care a great deal about our users' trust and privacy and work to protect it with features like SSL search, two-factor authentication, warnings when sites might be hacked or hosting malware, etc.
If you're that worried about Google, don't use it, but if you still want Google results but with as much anonymization as possible, I would choose Tor+incognito-Chromium instead of Scroogle for your searches.
Ironically, "the real concern" is personalized. The married guy searching for "hot local hookups" doesn't care what google knows, he just doesn't want it to show up in his browser's history. The junior high student reading The Big Book of Mischief in the computer lab doesn't want his school's network monitors to find out. The dissident in Tunisia doesn't want his government monitoring his Internet usage.
There's only a small set of privacy-conscious Internet users who should be concerned about Google, whether or not they're as impregnable a bastion of privacy as their employees might claim.
Yeah this is an important issue. Google apparently ranks results (for you personally) based on some 57 inputs, even when not logged into Google services. In short results suffer from a self-reinforcing feedback loop, forever constraining what you see.
I wonder how easy it is to get "clean" or "default" results from Google?
And I know it's been discussed many times , but just how easy would it be to maintain real anonymity across the web?
If you are using firefox, right click into the input field and select add keyword search, and add a keyword. Now you can easily search via scroogle entering the keyword and then the searchterms in the address bar.
Incognito/private browsing + going through DuckDuckGo (even for Google queries, with !g prefix) works for me. (Edit to add since I can't reply: going through DDG avoids Google redirecting to country-specific variation).
No it's a non-issue, and by the way Google probably uses much more that 57 'inputs' (signals) to determine relevance. Personalization is just a different term for relevant results and this is just FUD, the source of which is people who don't understand the technical aspects.
Personalization can, however, fail. I have a German IP address which causes Google to show me predominantly German search results. That’s not what I want at all most of the time.
There are ways to sort of get around that but they are cumbersome and they don’t always work right.
Google is pretty good at figuring out what to show you depending on the language of the search terms you are using. When there are German words in my search query Google will show me predominantly German results. That’s to be expected, that’s what I want. The problem is that Google seems to use my location (IP address, maybe also the language of the interface and whether I’m using google.de or google.com) and override that behavior so that even if I’m using english words in my query it will nevertheless show me predominantly German results.
Read my last paragraph carefully; I'm just pointing out the name doesn't matter, it's a real phenomena that doesn't go away by changing the name.
My personal stance is actually a great deal more nuanced, which is that you can't not be in a bubble. It is mathematically impossible. Any way of slicing the torrent of information coming at you constitutes a bias. The entire idea of "piercing the bubble" is an instance of English misleading you, it's a concept without a referent. The question is not how to "escape" the bubble, the question is how do we choose our bubble?
So filtering/personalization is always present; we entirely agree on that point. Is it excessive, though? If it's not excessive, then, almost by definition, it's not a problem - or at least is not a major one. There is actually a huge difference between "personalization" and "excessive personalization", which was what I was trying to get at.
(Also, paragraph != sentence)
 Excessive meaning "more than is necessary, normal, or desirable".
A paragraph may legally consist of one sentence. It may legally consist of one word in some cases.
You seem to be trying to draw me into defending a point I'm not making. I'm making a much more subtle one, which is that you can't escape being in a bubble (not the bubble, which I initially typed, because there isn't the bubble, there's all kinds of them), so in a way arguing about whether it's "excessive" isn't even the right dimension to argue on; the filter bubbles simply are. (Not "simply ar excessive", simply are; they simply exists regardless of whether they are excessive or desirable or anything else.) The question is, what should be done about that fact, rather than how do we prevent that fact from being true, and to be honest I'm rather ambivalent about the answer to that question, because the answer is dominated more by your preconceptions and pre-existing goals than anything interesting.
I'm gonna humbly disagree. There are many potential ranking factors in a search result: how well the query matches the document text, how important that text is in the document, how important the document is (PR) and so on. What I don't necessarily want is further ordering by what a machine thinks is my political inclination or world view.
I would agree with you if there was a button I could click to turn it off. Their algorithms aren't infallible by any means. Plus, ultimately they are a business and trying to sell, so they are going to eventually skew results towards things they think I may be willing to buy, rather than things I want to know. I think G is more susceptible than MS, but only because MS does make a living selling other stuff.
I'm not saying it's evil or wrong, just that if my first results don't appeal to me, a simple measure that might work for me would be to turn the filter off. Anyway, they could learn more about me if they let me do that.
Great! Put a button on the search page. Will "pws=0" allow the engine to "learn" from my corrections? I'm pretty sure incognito won't. But allowing me to "correct" the personalization might be useful to you.
This entire argument is simply without merit. Google is, and always has been, a filter.
If I type in "Barack Obama", it will not show me links about The Green Bay Packers NFL team. This is not because Google is conspiring to keep me from reading about the Green Bay Packers. It is because it is most likely that I am not looking for Packers links and will not click. The general idea is that Google starts with every single piece of content on the internet and filters to get the content it thinks I am looking for.
Now, this article complains that Google is flawed because it will more likely show some MSNBC over Fox News, or vice versa. The implication here is that you never click on Fox News when it is presented. Because if you did click on Fox News and its ilk from time to time, Google wouldn't start filtering it in the first place. The problem isn't that the search engine creates the "bubble", it is that the user does!
So if the scenario presented in this article offends you, then start changing your behavior and browsing more diverse
sites. Otherwise, don't blame Google for noting that you hang out in a very narrow corner of the internet, and presenting you links from that corner. It's just doing its job correctly in that case.
I heard about this book on NPR a couple of weeks ago . I really don't like this biased view of ML.
In summary what he said on the show was something like: At least the news shows (and news pappers, radio) gives the same information to everyone, so instead of showing Kardashians news they do show you Bin Laden news, even though, they know the Kardashians are more profitable... this is not the case with Google Search or Netflix, Yahoo, Bing, etc (all attacked by this author).
I think that ML helps more than it hurts and viewing it in a non-technical way is wrong, the author gives the impression that Google (the company and their execs) manually (via algorithms, but very manageable in his opinion) change the search results to not show things that they don't like, then it raises the question "do we really trust one company?".
Its arguable that the search results in Google or Netflix are optimized for profits, but how do you make profits in the customer industry? IMHO you do that by making their happier, showing useful results, for them, not for everyone.
I'm waiting for the time when I google: "what channel and time is Conan on"  and I get "channel 43 11pm" as the result. Of course that is very personalized, and the result will be just for me... but again I'm the one searching and I'm the one needing the results.
I think there's reason to complain that the reality is far from the ideal. Take Netflix, for example. I rated hundreds of movies, filled out the taste preferences to narrow down the genres I was most interested in, but the "Suggestions for You" were underwhelming, to say the least. Even worse, they simply never changed. So I went back into my taste preferences and checked that I "Often" watched every single mood and genre listed. Now I get a vastly improved range of suggestions, exposing me to some great movies, simply because Netflix is no longer hiding them from my view. I may have a unique individual taste for movies, but Netflix sure hasn't fathomed my criteria with their algorithm.
Exactly! it's just nonsense peddled by those who don't understand the technical aspects of search, people who don't realize that a search engine's prime function is to filter the millions of results for each query to down to the most relevant results for the users, and not the same 10 results are relevant to each and every user.
And user regardless of nay personalization can dig through any initial results.
It's just annoying an ignorant bullshit being disguised as a real issue.
Thx -- in my experience it really takes a week to get into it. If you can stick it out I'd love to get your additional feedback after that point.
Point of fact: it isn't MapQuest, it is OpenStreetMap served via OpenMapquest, which uses their resources to forward that project. You can read more about it at http://openstreetmap.com/ (left column) and http://open.mapquestapi.com/ - in any case, maps are relatively new and in process.
Confirmation bias is bad enough already, it's really a shame that powerful companies like Google reinforces it just for the sake of giving you more pleasing search results. (Pleasing and Good overlap, but they're not equal.)
Edit: I didn't intend to bash Google specifically. But they are faced with a choice in which their own interest conflict with those of their users. And as Capitalist Bastards are more and more accepted in our society, we don't blame them for the selfish choice. Maybe not a shame then, but at the very least a pity.
This sort of personalization is no different than using a user's click history to determine whether he means cycling or motorcycling when he searches for "biking". It's no different than using history to determine whether someone is looking for a television schedule or coding help when he searches for "programming".
One person's "confirmation bias" is another person's "relevance". Increasing relevance to the user will naturally result in confirming their biases, because there's (probably) no programmatic way to discern contentious subjects in which confirmation bias is applicable from non-contentious subjects where it's not.
The only "shame" I see here is people who ascribe some sort of devious intention to what's clearly the natural result of trying to solve the most important problem in search.
My phrasing was a bit harsh. I understand that Google (and others) are mainly out to make money. To do that they have to please users most. Anyway, they probably can't make the difference between "relevant" and "pleasant", because user's behaviour only grant them access to "pleasant".
As long as this is done in a neutral way (by delivering the same result to everyone), any confirmation bias will be averaged across entire populations, so this should be okay.
Personalized results however make the results noticeably more pleasant, and significantly more biased (this is probably unavoidable). Of course Google, Bing, and Co would shun that bias thing. Who can blame them?
I don't want blame Google specifically. I want to point out this old, common moral dilemma: make money, or don't hurt anyone? Google took the money. Many do. I'm not sure to what extent we should blame them, but clearly, the System™ has room for improvement.
I cannot see a problem here. Who exactly is being hurt by the "filter bubble"?
The end user is fine - they are more likely to see results they are actually interested in. If a user doesn't trust a source and won't click on their links, they'll soon not have to bother scrolling past them.
The sites themselves actually benefit as well. Sure, they may be bumped from the first page of results for users that are unlikely to visit their site, anyway, but the tradeoff is that they get a higher position for the users who may actually visit their site. It's an ideal trade for those being filtered.
I suppose that leaves the idea that the end result is a "biased" internet. I don't buy it. Google is not removing sites that disagree with them, they are re-ordering them for different users. If your profile wasn't factored in, then what options do they have?
They could order on popularity, but biasing towards popular opinion isn't any better than biasing towards my opinion.
They could randomize the order, this would be without bias, but absolutely useless to anyone.
They could judge the objective truth of sites, but that's far more biased than any of the other options.
The end user is not fine. He is more likely to see results that he actually agree with. See, the original confirmation bias will cause you to seek opinions you agree with more often than others. The search engine will then conclude that you are more interested in the kind of sources those opinions come from. That would be true, by the way, but then comes a point when a quick glance at your search engine result will show you more of what you agree with, and less of what you disagree with.
Now go use that as an estimation of popularity and veracity. I bet many people do, without knowing the result is strongly biased by their own prior behaviour.
Search engine, as the sole entry point of the web, do bias it. Page Rank for instance, could trigger a feedback loop: if a site is more prominent in searches, it will get more links. That will get them more search prominence, and feedback and foom.
Now is the popular bias better than the personal bias? I think it is. One would at least get to be exposed to other's opinions, instead of just his own.
If you just care about the economy of the web, in the sense of selling, advertising, promoting, buying… then of course the personal bias is currently best. That's the most efficient way to milk the tear$ out of eyeballs. The easiest way to reward the brains behind those eyeballs. When it's all about money, there is absolutely no problem with the method. But I have other values besides money. A very important one is respecting curiosity and search for truth. The personal bias doesn't.
The purpose of search engines is filtering content to give you what you want, which might be information, or it might be discussion, or polemics, or gossip. If you want to learn about some topic that has some element of subjectivity, familiarize yourself with the different points of view and read the writing of whoever you are interested in. There are lots of tools on the Internet to facilitate this -- Wikipedia, for example, is built around an ideal of giving people an objective survey of different things. If you type "climate change" or "Barack Obama" into Google and form an opinion based on the top results then fuck you.
The purpose of search engines is filtering content to give you what you want
The purpose of search engines is filtering content to give us what we asked for. Unless they developed mind-reading technology, Google doesn't really know what I want and attempts at guessing it will lead to substandard results.
*-- Wikipedia, for example, is built around an ideal of giving people an objective survey of different things.
A website where any amount of divergent opinions get edited down to a single article on the subject (target of edit wars and well-known editorial biases) is hardly "an objective survey of different things."
The purpose of search engines is filtering content to give us what we asked for. Unless they developed mind-reading technology, Google doesn't really know what I want and attempts at guessing it will lead to substandard results.
I think you really hit the crux of the issue here. My opinion is, these search engines are built around the UI paradigm of "type something into the box and go find that thing." The user doesn't have one box for finding discussion forums, one box for finding blogs that have people they agree with, one box for finding material that appeared in print publications, etc, and yet search engines are for finding all these things, if you want them.
As long as there's people typing "climate change" into Google, Google has to guess at what they are asking for because there ain't enough bits in the query to tell it. There's no a priori reason to expect that they are asking for the most informative and accurate links covering a wide variety of perspectives on climate change; many people probably aren't.
Regarding Wikipedia, well, that's why I said it was the ideal. You're never going to crowdsource perfect objectivity and truth from a million biased writers with ulterior motives, but they try, and they do an OK job on many topics.
How to do I know what filters form the top results if they aren't transparent? What would lead you to believe that familiarizing yourself with a different point of view would take you down a meaningfully different path through the search graph? None that I can see. Every search has a non-objective filter. The original page rank is one such. What would be useful is to make the tree obvious and manipulable as a separate object itself.
I agree that it would be cool to expose the different criteria that Google (for instance) is using to help reorder your results, to the degree that those criteria can be discretely identified, but I'm not surprised that they don't; that's a pretty significant portion of their secret sauce they would be publicizing.
I also don't think that mere transparency would really help solve any search engine "filter bubble" problem, if such a thing is real. Nothing would make Joe Google User take five minutes off whatever he came to search for to fiddle with some tree full of sliders on his result page.
Agreed. I wouldn't either 80% of the time. But often I'm making a very directed search that I would really like the "best" results for where best is usually defined as different than the results I'm getting.
When the filters themselves aren't transparent, it is a major issue.
I would gladly prefer a "pre-filtered" list, as long as I could tweak it. For example, if I'm searching for viewpoints that don't correspond with my political views, it'd be nice to be able to find those by disabling the "political bent" filter based on personalization.
To not do so is to constantly wear rose-tinted glasses... pleasant, but ultimately, dangerous.
In all seriousness, this is actually my main use case for the private browsing mode in chrome: to search google without the filter bubble(1)
It's quite shocking to see just how much those results differ from the ones I'm usually served, actually.
I know it's actually supposed to be 'awesome' to have every search tailored to _you_, but it just makes me feel uncomfortable that I'm not seeing the internet "the way it's supposed to be seen" - if that makes any sense.
(1): Or at least a smaller bubble, considering it still knows my location - even though i use google.com, my os, my browser, etc...
We often forget that, although it's obvious search engines filter results, the information we see on social sites is also filtered.
Consider users of Reddit. Now most of them would consider themselves very open minded and enlightened, yet their is active discouragement for radical ideas without due consideration as to their merits. It's just easier to downvote and look at Mario cake.
Overall, I think in a way we NEED filters to remove the faff, but be careful to keep a social circle which encourages radical ideas to be bought out into the light of logic and due consideration,
I think this is a real issue, and I am glad that DDG is addressing it. This is a more-compelling take on the tracking issue IMO.
What I'd really like to see, is a search engine to allow me to do both. I'd like to have a profile (that didn't use my name), and when I wanted to, I could click a result as 'useful'. This would go into my personal algorithm. I could then toggle between filtered search, or unfiltered search, whenever I like.
It might even be useful to build search filter categories. But I'd keep that a bit buried for the power users.
Incognito mode isn't just disabled history recording. It does not use any of your cookies from main session, and it deletes the cookies when it is closed. It also isolates your extensions from running incognito unless you elect for them to be.
I just had to explain this to my co-founder last night after he freaked out because our old site with a different name was showing up first in his search results with our new name as the query . With web history off the old site wasn't even on the first page.Now I just have to explain that no one searches for early-stage startups let alone cares if they change their names :)
That's not what he asked. He asked (facetiously) about the higher-order search bubble: ie, if I only ever search for articles about Erlang innards, I'm never going to read about The Kardashians, which is relevant to a sizeable portion of contemporary American culture. To truly pop my bubble, DDG ought to throw in completely random results every now and then.
It almost seems like DDG is trying to get me to think personalization == censorship. Sorry, but that ain't gonna fly.
You've just undermined your own search engine's potential to use such a feature in the future -- people will be spitting quotes back at DDG about how opposed on this feature they were. Know your audience.
It is well known that people seek out information that reinforces their own beliefs. And while it may not be ideal for an enlightened population, why should Google/Bing fight this natural inclination? Their job is to ultimately give their users what they are searching for, not challenge their users' opinions. Especially since there are tons of benefits to personalized search, and it may be hard to not throw the baby out with the bathwater.
This mentality ("give people what they want") results in the empty, sensationalist journalism that most people dislike. When I read the news, I don't want to be entertained, I want to be informed. The same goes for Internet searches.
The first example is totally borked. They first search for "climate change" (notice the ") and then search for climate change (without quotes). Of course the search engine shows different results for different queries.
Any filter at all will result in a "Filter Bubble" as defined here (except, I suppose, returning a randomly sorted list of all sites on the internet). Whether the filter is personalized or not doesn't change that fact. What is the actual benefit of everyone getting the same search results when looking for a particular term? Search engines are not news sites or research papers: biasing towards relevant results is not a bad thing.
DDG is quite good in regular search. I like their commitment to privacy and so switched my default search engine to DDG. But I noticed more and more that I use google maps almost as much search. I search for places and directions on maps end up going to google a lot. That is brilliance of Google. They built search verticals and so its tough to not use them. If only DDG could do something about that.
I'm glad I have been using DDG for over 6 months now. For the new user, here are some advices.
DDG is my default search engine. Basically, all of my searches are completed in DDG, only very few times I have to go to google to find what I am looking for. My suggestion is that you spend 5 minutes learning the bang! syntax, as it speeds up your search by alot, then set DDG as your default search engine for a couple weeks. You will see that the bang! syntax and the 0-click-info really makes the difference on search speed, and their results are pretty damn good.
Also a long time DDG user. I have it set as my default search provider in chrome, and it has sped up my browsing considerably. Shortcuts like "!w some article" or "!a some product" (takes you directly to Wikipedia or Amazon, respectively) are very handy. The only times I've had to revert to Google are for very specific forum trawling.
Long time DDG user here. I highly encourage people to try and make the switch. I often find DDG gives me the needed information without having to actually having to visit the site. I still find myself going to Google sometimes if DDG doesn't give me what I want, but it's smart enough to get me the most relevant information faster most of the time. Often enough that I don't feel the need to switch back.
In making the switch, I highly advise you to spend a little bit of time learning the keyboard shortcuts, especially the !Bang feature (https://duckduckgo.com/bang.html). You'll love the HN search options. =) There are so many, you won't be able to learn them all.
You can also still use Google if you want, or even Bing (or other engines). Basically, the !Bang syntax would mean even if you aren't using DDG, it enables you to quickly use whatever engine you really want.
I'm just a fan of DDG. Hope you find it as useful as I do. =)
I've set it as default on Chrome. Maybe google is better - I'll get more python libraries and less news stories about sheep (or Darwin laureates) getting swallowed by pythons, but I can live with that. Besides, I'm pretty good at typing "google" if I need it.
It's not happening to any extent, it's just a new FUD concept that is meant to sell books an confuse people, if you seek more info you can dig deeper into any set of initial results. Personalization == relevance.
I've been using DDG exclusively for a little while. Half the time the results are identical to Google. The other half is usually a collection of links from sites I would have searched on to research the topic.
I have Google Web History turned off and I'm seeing a pretty good mix of things in my results. E.g., I'm heavily left-wing, somewhat pro-gun control, with no personal interest in shooting, but my results for 'gun' are pretty much all gun shops and fansites. For Barack Obama I get mostly official sites, although some of the top news is critical (I'm Canadian, if it matters). Egypt gives me a mix of travel, protests, general info and ancient history. I don't recall ever turning it off, so is it off by default? And if so, what's the problem here?
"But aren't you a part of the relevance equation? The ideal results for a search like [bitcoin crash] should be different for a Japanese-speaking searcher in Tokyo vs. a German-speaking searcher in Munich vs. a bitcoin expert vs. a programmer trying to diagnose why compiling bitcoin is crashing vs. my Mom who has never heard of bitcoin before, right?"
Maybe, but that is not the point.
Besides, you are mixing localization (japanese vs german) with personalization (expert vs naive), and what is worst is that you are assuming that Google knows so well each user so as to be right (and that is either impossible, either extremely creepy), at every single instant of his life (a person can change interests).
Furthermore, to grab your example, how can Google know that an expert in bitcoin and expert in bitcoin compilation and crash solver, is not just interested in hearing about the "market" crash of bitcoin?
Google CANNOT read the users' mind. And even if it did, there would not be a need for "personalized search", as the "mind reading" would give enough search criteria to nail the results more easily (albeit, most people do not know exactly what they want, so it will still be an iterative process, which is good, as randomness is the seed for evolution).
So, back to the point, it is that in the quest for "adequate results" for each person, Google is turning web search into a non-deterministic event ().
Imagine the web being a library, and the search being searching for the library's book database, why would the search for a given book return different results to different persons? It should always return the same results, if the person doing the search is not satisfied with the results, then she/he will add more criteria. In other words, let the person do the filtering!
Once that Google accepts that in his quest for "better results" (where 'better' is a concept decided by solely Google and whose ranking parameters and algorithm are unknown) there is a potential (probably demonstrable already) for a "filter bubble" with positive feedback loop on user behaviour, which, as with any positive feedback loop, can go out of control, exacerbating certain ideologies and fueling extremisms.
And there is a fundamental difference between a "self guided" (as in self controlled) filtering, where users would knowingly filter out results in order to find those that they like, and a "google guided" (as in externally controlled), filtering.
() strictly speaking, search will not be deterministic as the web is a dynamic system and it grows, so search results can vary with time, but they should not vary from person to person at a given time.
Of course Google can't read minds, that does not mean they should ignore information that they have when deciding which results to show or what order to show them in. Sufficient data to provide a better filter for a given user is not the same as mind reading.
You are absolutely right that more criteria should be used if a user is interested in better results, but I fail to see why a deterministic base case is superior. If I never click on news links and always click on travel links, it makes perfect sense for Google to assume that my search for "Egypt" is looking for information related to travel to Egypt. If I am not following my usual search patterns, I can look on the second page, or I can disable personalized search, or I can search for "Egypt News" all of which would give me better results.
Why is it preferable to always make everyone clarify their searches when there is sufficient information to narrow the search down somewhat without requiring additional intervention by the user? This is usability 101 right here.
Works fine, but how do I return to country redirect? Am in Germany, used it for some global searches, now want my localized (German ) searches back. Must be missing something obvious here, please help, thanks!
From the search engine revenue perspective, how much can be the loss in case they provide a preference option to show "generic results" (like "safe search option off") - with an easy to switch ui between the two?
There's a personalization / privacy trade-off that needs to be considered. It is annoying that personalization cannot be achieved locally on the user machines or browser. Filtering / re-ranking results at home offers much privacy. My little personal project is to have the ML for personalization done at home, seeks the Seeks Project, http://www.seeks-project.info/
Personal control over the personal bubble matters...
in our research study, we found the impact/presence of personalization very strong: after about 3'000 search queries, in some cases more than every search query received personalised search results. out of the 10 blue links in some cases we found 6.4 personalised: see for yourself
(search for Hypothesis 1 to get to the data)
I don't like the direction Google is taking with Search. My view of "improvement" is very different from Google's view. I have the feeling that the more features they add to Search the worst the service gets. I miss the old simple Google Search. I think there is a big oportunity here for another search engine focused on: simplicity, speed, relevance (algorythm) and unfiltered. Back to basics.
In the end (1) technical people will find a way around in when they need to, (2) The conspiracy-theory loving will use the "Filter bubble"as an argument for their own ends, and (3) The rest-most people will just not care... I think the more interesting question is whether this third group will veer off to become more biased to their own world-view as the article suggests.
I understand the quest for a larger userbase but please don't use this sort of FUD that is being peddled by those who don't understand the technical aspects of search and are trying to sell their books.
A search engine's prime function is to filter the millions of results for each query down to the most relevant results for each individual user, and never the same 10 results are relevant to each and every user.
There is little difference between personalization and the relevance of search results.
How would you go about ranking then? alphabetically?! it's a matter of tuning the relevance 'dials' and it's all in early stages so a solution to this imaginary problem is more research and not to hide behind bullshit terminology.
And as a bonus a user (regardless of any personalization) can dig through any initial set of results if she seeks more information.
So please don’t buy into this misleading and ignorant bullshit that is being disguised as a real issue.
Search queries have different purposes. Personalization based on my preferences and past search works wonders for known-item seeking and for re-finding.
But for exploratory search and exhaustive search personalization works as an echo chamber (or filter bubble if you will) .
Then there is the problem of change and inertia. People change, their preferences vary in time. Personalization has an inertia that causes the recommendation engine to always be behind the actual preferences. It's more visible the better the recommendation engine.
I'm not saying personalization is problematic. There are problems with using it all the time and stealthily, without a clear possibility to turn it off.
I agree, it can be remedied by better algorithms - such that can recognize when I'm doing a search for a known item and when I'm performing exploratory search. In the first case, personalize away, in the second case, please show me everything.
The challenge is that people switch very quickly between these search modes and don't even think about it.
"There is little difference between personalization and the relevance of search results."
Even if you're a Sailor Moon fan, you might actually be looking for information about the rock rotating around this planet when searching for "moon". In the same vein I might be a liberal but be looking for a multitude of opposing views and ideas when googling "core values of NRA members".
Oh btw, we engineers, our algorithms, aren't smart enough to foretell people's inclinations based on their past habits, we also can't read minds. The increasing use of personalization as a factor for relevancy will, imho, lead to user dissatisfaction, precisely because they make shoddy assumptions.
Believe it or not, but some people also want to be surprised and learn new things. YES. We exist!
The article makes the assumption that personalized search results are bad for you, then goes on exemplifying it, but does not say or demonstrate in any way why personalized search results are bad, especially since it doesn't give any context about the person making those queries.
Search ranking is a matter of context. "Egypt" does not mean anything, other than the name of a country, and different people mean different things. When you're searching for "Egypt" and want to get travel tips, if you don't get them you can always expand your query to "Egypt travelling tips". But even that is a pretty shallow request and you can further expand it like "Egypt travel guide to the pyramids".
The search engine's job is to find what I'm looking for. And IMHO the current state of the art is a little behind my expectations - I would have expected these personalized results to be far more effective than they are by now.
It's true that sometimes I'm looking for specific things, but other times I'm searching to get an idea of "what's out there", to fill gaps in my knowledge and make sure I'm appraised of what's going on in the world. It'd be nice to at least have some control over what kinds of searches I'm doing. For example, if I search for my own name, what I want to know is something like: what are the most common search results other people get when they search for my name. The same is usually true if I'm doing "related work" type searching for research; I explicitly want to find stuff outside my immediate area of specialization, in order to make sure I'm not missing anything that other people would've expected me to cover.
I understand that, but if the search results are not personalized for you, then they must be personalized for the common denominator, as in the hive, the majority's opinion, the status quo.
For general search queries, the long tail goes out the window anyway, and I haven't done or seen any quality metrics for DDG, but I doubt their search results are better for exploration or getting opinions different from your own.
I expect my search to be faithful to the terms I enter. If the results aren't satisfactory, I'll narrow the search with additional terms. Withholding information based on some guess at my political leanings is dishonest. I'd be appalled if a librarian did it; why should I expect less from a search engine?
If you went to a librarian and said "show me Egypt", she would have nowhere to start. A conversation would then ensue about what you are interested in.
Keep going to the library, keep having those conversations, and eventually, the librarian will take you straight to the books you want when you ask for Norway. If you want something different, you'll have to explicitly ask.
What Google is doing is no different. It just happens to have a lot more conversations with you.
What Google is doing is no different only if you can, in your ensuing "conversations," correct a misinterpretation that may have happened somewhere along the way. If your ability to do that is hampered because the only way to do it is by clicking on something, you don't see the thing to click on which will do this, then the filtering is not helping (and actually works against Google's and my mutual interests).
Not sure where you are going with your dichotomy- if you are unable to correct in either case (because you don't get the option of clicking on a correct answer) then it fails. I suppose with enough perseverance you could trick the search engine into showing you what you want- if you know it exists in the first place- but then, (1) just enter the very specific term to start with, and (2) this defeats the attempts to become more relevant.
I agree with you that these seem to be small changes. So far.
That's a good point, but I think it kind of misses the forest for the trees. It's true that personalization might sometimes lead Google to mistakenly show you less relevant results, but it seems pretty certain that the opposite would also be true — a generic page will sometimes show less relevant results than a personalized one. One search result is going to be shown ahead of another either way. If showing a less relevant result ahead of a more relevant one is considered "hampering," I don't see any reason this "hampering" is unique to personalized search.
I think it would come down to a matter of degree. A slight change in the order of results is pretty meaningless either way. But major alterations, where some pages are relatively inaccessible, could obviously be an impediment to getting the results we want.
Let me try this example: Suppose your searches are primarily academic. Let's say that whenever you search the term "momentum" you are looking for something scientific- ballistics, elementary particles, whatever. But one day you are writing a blog in which you want to search for background, but you need to use a non-scientific meaning of the same word. Perhaps a psychic used the term and you are debunking. The particular case is irrelevant. The point is tha if personalization is too aggressive you may not find the info most relevant to your interests.
This isn't privacy FUD or anything, just a pragmatic warning. The FUD comes when you start thinking about how their algorithms might actually decide which results you are "interested" in. Thinking of Google, what bu siness are they in? What about you behavior on the web most interests them? Would they decide that the most interesting things to you are the ones on which you clicked the most ads?
But whatever change personalized search makes relative to generic search, the opposite change will occur going from personalized to generic. To penalize personalized search when the change is of equal magnitude going either direction doesn't seem fair. As long as the personalizations are just a transformation and not a subtraction, the two options are just mirror images of each other. They have the exact same kind of failure condition. The question is just which one's failure conditions are more likely to occur.
As a counter-example: Suppose your searches are primarily academic. Let's say that whenever you search the term "momentum" you are looking for something scientific- ballistics, elementary particles, whatever. But most people aren't looking for scientific info, so you constantly have to dig and dig to find anything relevant on Google. The point is, if the search technology is too impersonal, you may not find the info most relevant to your interests.
As long as the personalizations are just a transformation and not a subtraction
This is the crucial point. As long as it's accessible, I can get what I want by search refinement either way (but I would imagine that it would be easier for me to refine in the space in which I am familiar than to refine in the unfamiliar space). Giving me the (easy) option to turn it off is a good idea for search.
The 'political leaning results' are just a tool the author used to try to explained his (ill-informed) point, there are no evidence to any such degree of personalization specially not to any 'bubble' inducing extent, it's all very sensationalistic.
And for the record the political example was about facebook wall.
I think you missed the point. Yes DDG ranks, but it may rank differently. Of course, there is an implicit slam on them here: instead of showing you what you want the vast majority of the time, they show you other stuff. This is good on occasion (so you don't live in an echo chamber), but if this was all the time it would mean lower quality search results for simple things.
This is exactly why I stopped using DDG as my main search engine. My workflow was often like this:
- Search for term on DDG
- Look at results, find nothing
- Try to find better term
- Give up and use original term in Google
- Find result
I tried to use DDG exclusively two time (once 8 months ago, once ~4 months ago), but the result was the same. I don't know how much personalization affects this, but Google just gives me the best results.
Why do you think it is an issue? I think everyone already filters intuitively. It's called having a "bullshit filter". I believe it makes not much difference whether it's automated or everyone does it manually.
I think the chance that I would click on a bullshit source like Fox News is very low because I don't trust them. Even if Google brought it up as the first result.
My point is that people already only click on search results that they agree with. Nobody wants to see stuff that they disagree with. That is human nature.
So living in a filter bubble already happens without automatic filters. It happens in our brains and we call it personality.
Maybe we don't want a filter that sucks, plain and simple? Whether the filter sucks or not is completely subjective, but it sucks for me, so there.
Changing the language not that intuitive in Google, it's a drag. I have to do it all the time when I clean my cookies, and sometimes I still get crappy "pt.wikipedia.org" results on top instead of "en.wikipedia.org" just because my browser isn't in english and I don't want to change it just because of Google.
I might want to type www.google.it and search for Berlusconi news in Italia, but it won't let me, because it just show me brazilian news. So yeah, I can't even get to another bubble.
Also, apparently Google filters by IP, since searching via Tor or a VPS renders very different results. I dislike this for personal reasons because it breaks the internet for me.
Also, I might not like the fact that Google collects stuff about my search habits, which is also a valid concern.
So, there, it's an issue for me. Might be an edge case, might be an exception, but I think that I deserve to know why this happens and the alternatives.
"There is little difference between personalization and the relevance of search results. How would you go about ranking then? alphabetically?!"
Oh, you know, maybe using Google's much vaunted "PageRank" algorithm, which is supposed to take in to account things such as links on other pages back to the page being considered for inclusion in the search results.. the more such links there are, the better the rank is supposed to be.
The above description is obviously an oversimplification, and I don't have access to Google's PageRank algorithm anyway, so couldn't tell you what it actually was if I wanted to, but it's something along those lines, and need not take in to account your previous search history in any way or "personalize" the search for you.
That agrees with what I said. Unless you don't think that much growth in searches over such a short period is significant.
It's even more significant when you look at where it was a year ago. 1.2 million in April 2010, 5.9 million in April 2011, 6.3 last month, and already 3.8 a little over halfway through this month. This is significant when you consider the lack of marketing and that DDG is run by one person.
It's not significant. In the world of search engines, you need to be the #1 or #2 player to earn major revenue. People aren't gonna wanna waste any time/energy advertising on your platform if you're getting just a handful of searches per month for say... mp3 players or cellphones. DDG is just betting on an acquisition, but it's just a wrapper around the Yahoo BOSS API.
This assumes that 'filter bubble' is something more than a nonsense term.
There is little difference between personalization and the relevance of search results.
How would you go about ranking then? alphabetically?! it's a matter of tuning the relevance 'dials' and it's all in early stages so a solution to this imaginary problems is more research and not to create bullshit terminology in order to sell some books.
Most people don't realize that Google and other companies are doing this. That's my main problem. It's not about selling books in my mind as much as it about communicating why something is in a search list for person A vs. person B. I don't want my Internet censored.
People don't realize many things about search engine from indices to ranking algorithms, they do realize however that a search engine is to return the most relevant results for them and that is were personalization/relevance fits in.
I suspect DDG is having trouble making money hence the need to make a commotion about filter bubbles and generating some linkbait. No surprise... < 1% of a search engine market is not gonna even make you much at the end of the day.