Hacker News new | comments | show | ask | jobs | submit login
Magic keywords on Google and the consequences of tailoring results (gabrielweinberg.com)
151 points by milliams on Nov 5, 2012 | hide | past | web | favorite | 51 comments

I'm happy to give more context on this. Some people don't put all their information needs into a single query. For example, instead of searching for [iphone wikipedia] to find the iPhone page on Wikipedia, they'll do one search for [iphone] and then their next search will be [wikipedia].

Google tries to help with those sorts of search sessions. For 0.3% of queries, if we see a search for a query A and then another search for query B, and there appear to be good results related to both A and B, then we may surface those results.

For example, I just did the search [iphone] and then the search [wikipedia]. In addition to the regular results for Wikipedia, Google also surfaces the page http://en.wikipedia.org/wiki/IPhone . A good way to see that Google is doing this is to look for a phrase like "You recently searched for iphone" under the newly-surfaced results. Go ahead and try it with the search [twilight] and then the search [wikipedia] for example.

Between Gabriel's article and the WSJ article, words that are reported to provide this behavior include iphone, nexus, obama (but not romney, because there wasn't enough information for this word at the time the data was generated), tablet, twilight, computer, health, speech, iraq, sports, social security, and stock.

Just to reiterate, this algorithm affects 0.3% of searches on Google. Most Hacker News readers are savvy enough to search for [iphone wikipedia] instead of breaking that search into multiple queries. However, if you don't want Google to surface additional results that might help with your current query, Google has a support page telling how to turn off search history personalization: https://support.google.com/accounts/bin/answer.py?hl=en&...

As an aside, I wrote a blog post ~4 years ago to preemptively debunk the idea that Google skews our search results for political reasons: http://www.mattcutts.com/blog/google-search-and-politics/ We simply don't do that.

When you say the algorithm affects 0.3% of searches on google do you mean you select each query with a probability of 0.003 and pass it through this algo, or this algo is used 100% of the time when query A is followed by query B and these combinations account for 0.3% of all searches. If it is the latter then the fact that Obama is a magic keyword may mean that you are biasing a very high percentage of political searches. I am sure election involving an incumbent is an edge case which is very difficult to account for, but now that we know of this I hope google will try to correct the results to remove these specific accidental biases in the future.

The solution for most of these sorts of things is just to refresh the data more quickly. Lots of queries, particularly head queries, are pretty stable in their characteristics over time. For a system that isn't absolutely critical to getting the query correct it's perniciously seductive to think that you can just push out the data once, then refresh it every quarter or so.

We used to have a problem with spelling where some news event would make a person with an uncommon name famous, but google would mistakenly correct it to a more common but incorrect name just because the spelling system hadn't ever seen this person's name before. We've fixed that issue and many other freshness related things: http://googleblog.blogspot.com/2011/11/giving-you-fresher-mo... but this is an ongoing area of focus throughout a lot of our systems.

It's an interesting problem because for many things recomputing the data faster will only fix a handful of queries, so from a raw impact standpoint hardly seems worth it. However those queries end up being ones that are in the news and related to things that people care a lot about.

I get what you're saying, but I would absolutely love a completely plain version of Google, where I didn't have to opt-out of anything ever again. Yes, sometimes it might be nice to get results that are in my neighbourhood or relevant to my interest, but sometimes I wish I could get what Google thinks is the globally most interesting response. Iterating and expanding on the query actually lets me learn about the subject.

So pretty please: http://vanilla.google.com . Just like the old days :)

What you're saying, though, is that Google is making an inference. DuckDuckGo does not want to infer anything and I tend to agree. Also, why aren't all of the inserted links highlighted with the, "you recently search for" line? Why wouldn't Romney be a magic keyword? I appreciate your explanation but it doesn't really explain anything.

I don't work for Google or have any inside informatgion, but I guess "magic keywords" become such after a certain (very large) threshold. Obama, being the president of the United States for four years, has probably crossed that threshold long ago, but it only became a political issue since the election process began.

You really could have made it more obvious and opt-out. As in, have a list of keywords that google also considered somewhere on the top, along with smal "x"s next to them.

> google

> obvious and opt-out

I'd tell you to pick one, but that implies choice. You can only pick google.

No, google needs to work for the masses and confuse them as little as possible so adding even more sub-options doesn't seem like a good idea; specially when it only affects one or two results.

0.3% of a big number is still probably pretty large.

I'm tired of Gabriel Weinber telling me how bad Google is and therefor why I should use DDG instead. I'm happy with Google, I'm glad that when I search for "ruby" I get the programming language and that when my brother who works in jewelry searches for "ruby", he gets the gemstone.

But seriously, tell me why DDG is awesome and unique, stop bashing the competition. It was fun at the beginning but now it's getting old. Like basecamp bashing microsoft project at the beginning, it was ok, when was the last time you read 37signals still writing about how bad MS Project is? Exactly.

Stop spreading FUD about Google and tell us what's great about DDG. Google Search is an awesome product that changed the world, it gives great results and at much greater speed than DDG. Beat them at that and I'll give DDG a go. Not for any scary reasons.

Hi, Gabriel Weinberg here. Sorry to offend you, as was of course not my intention.

I also of course don't think this is FUD at all. As far as I know, this is the first quantitative study of any kind about the filter bubble. I think it is hard to dismiss a concept out of hand when it hasn't even been effectively studied yet.

Also, the filter bubble it is not an effective marketing message because it requires too much education given it is a complicated subject no one knows about.

I also think tailoring can be just fine when it is opt-in and you are in complete control of your data.

As for DuckDuckGo, we've been telling people plenty about new stuff. Most recently we've been focused on our open-source plugin platform -- DuckDuckHack, http://duckduckhack.com/ -- where you can hack the search engine. And people have been doing it and making cool stuff, e.g. https://duckduckgo.com/?q=khan+math and lots of others: https://duckduckgo.com/goodies/.

Thanks for your answer. I'm not offended, I'm just getting some kind of fatigue reading your anti-google posts. I think insinuating Google care more about Obama than Romney is bordering on FUD, like all of your anti "bubble" articles are always bordering on FUD or at least intend to scare people a bit. You can't deny you're kind of aware of that, aren't you?

It's been more than a year now since you're writing anti-google and anti-personalized results articles. It's getting old and people love personalized results. I don't think it should be made opt-in as I think Google (and any product) should make the best option opted-in by default, and personalized results is the best option as it offers the best experience for regular users. Why would they offer a worst experience by default? It doesn't make sense. Also, most people don't know or care enough about changing options so they'd just keep the worst experience by default.

Let's take my brother's example again, if he searched for "ruby" and got results about rails and ruby.rb first before the gemstone because they are more popular than ruby the gemstone, my brother would just be confused and may even waste time clicking on the rails link first. So what should google do, propose a link that would say "opt-in to have search results that actually make sense to you". Do you see how absurd and bad UX that would be?

People who care about these kinds of things already opted-out, in fact, the most paranoid hackers I know don't even use DDG, they use Google+Thor and turn JS/cookies off. It's cool that DDG is adding more features but that's not anything I care enough to use or I could just write a quick google chrome script that'd do the same.

why do we have to have an assumption that personalised means doing so in a negative way like enforcing a political affiliation? perhaps the engine can be clever enough to realise that some things like localised weather are a reasonable personalisation, whereas politics are more dangerous and should be less personalised.

it seems to me that both google and ddg have problems here, ddg in not personalising at all, and google in personalising things that really shouldn't be.

> ddg in not personalising at all

I think DDG has some personalization based on your location, for example if I search "weather" it shows me the weather near me.

I question your methodology, to be honest. Starting with such loaded search terms is the opposite of the scientific method, since you haven't established any kind of neutral baseline. Picking candidates' names is also problematic, since (as pointed out by numerous people) Obama has been President for a long tome whereas Romney has only been the Republican nominee for 7-8 months. It would have been more rational to start with, say, Democrat and Republican which are comparatively 'timeless' terms. Also, I don't see any indication that you attempted to offset bias in your sample set.

I don't mind it as a bit of election-related marketing, but it doesn't strike me as very rigorous.

Hi Gabriel. Glad you responded here!

I've been generally curious about DDG for awhile now and I'm still trying to figure out why I should make a switch. If I understand correctly; DDG is trying to solve a problem in internet search so that people can't find things easier on the internet without sacrificing knowledge about one's self or behaviors, and your organization believes that that problem is largely due to the opt-out nature (or rather how you are automatically opt-in'd to everything) of Google's platform.

Also, the filter bubble it is not an effective marketing message because it requires too much education given it is a complicated subject no one knows about.

I guess, I, like many others who have expressed interest, don't see the problems with this. More commonly, I can't say my mother or my boss or anyone else I know is encountering this as a problem or a challenge. I'm happy to be wrong.

I find this whole topic absolutely fascinating, especially after having recently read Nudge by Richard Thaler, which basically is a whole book which talks about the value of libertarian paternalism. In other words, how opt-out choice architecture can be much more suitable than having completely free choice. Would love to know your thoughts. Great blog, this is fascinating stuff!

Your comment is why speaking truth to power is so hard. Don't say you're "fatigued" from hearing him. If he's right that this is a pernicious and dangerous trend from Google, then he should never stop shouting it.

Assume that Gabriel is 100% sincere and explain why his concerns are unwarranted. Personally, I find the "it's only .3%" of searches argument fatuous. That's the current rate. You can be sure Google is trying to "improve" that number. It's not clear to me what the consequences for this sort of search segmentation down the line are, but it definitely worries me. Confirmation bias is already a gigantic drag on human progress. I don't need the chief intellectual discovery engine of the world reinforcing it!

Allow me to give you a perspective from someone who strictly uses DDG (well, I sometimes use blekko).

Their search is lean. It doesn't feel exagerated by a lot of advertising optimization. It has the !bang feature that allows users to quickly refine searches. Though Blekko uses the /search-term which is also very useful. Back to DDG, their user experience (in my opinion) is better. Things are more clear and concise. Results for programming related stuff are very very good. In fact, if Nuuton manages to be half as good as DDG, I will have considered it a success. Not sucking up to anyone here, but I just love the damn thing. Been an user for more than a year.

I'm tired of Gabriel Weinber telling me how bad Google is and therefor why I should use DDG instead.

Perhaps avoid his blog next time, since he's clearly passionate about it and likely to do similar articles.

> I'm glad that when I search for "ruby" I get the programming language and that when my brother who works in jewelry searches for "ruby", he gets the gemstone.

I'd like to see screenshots of ^ that, if you are able to get your hands on them, please.

I'm trying to decide how concerned I should be about Obama being a magic keyword and Romney not. On the one hand, Obama has been president for nearly four years; the number of searches for him will have been high and consistent. On the other hand, Google is so central to information flow today, having it decide which candidate should get special treatment is disturbing.

I'm mollified somewhat that the Obama magic results that came up when searching for Romney were from Fox News[1], but I still feel a faint disease at the whole concept.

I've been using DDG for several months now and like it. It hasn't matched Google on technical searches, but on general information searches, I prefer the results to be less biases by me.

1. I would personally prefer to never see a link to Fox News. However, if every search for Obama was extolling his virtues and every search for Obama was extolling his virtue, my unease would be outright disgust and I would be contacting my Congressman. My unease comes from not knowing why they might show me something, other than to echo back what I already think, wchich is noot a very useful set of information. I already know that.

Google sees how people followup queries over time. So they know the relative frequencies of the four distinct progressions:

  [Obama], [Iran], [Obama Iran]
  [Obama Iran]
  [Romney], [Iran], [Romney Iran]
  [Romney Iran]
The triggering could be as simple as the fact that {[Obama], [Iran], [Obama Iran]} happens more often than {[Romney], [Iran], [Romney Iran]}.

And that difference could be because people searching one path are slightly more likely to keep refining simple queries ("try, try again"), versus other people more likely to combine-up-front ("measure twice, cut once").

Most of the stuff discussed in this post didn't bug me, but then I got to this part:

As the Wall Street Journal confirmed in its own study, Google has been significantly altering its search results to highlight Obama-related results, but not Romney-related results (more on that later).

These Obama-related results are being inserted because obama is a magic keyword on Google. A magic keyword is a search that can transform the Google results of later searches.

Incredibly shady. Changing the results of later searches based on things I searched for in the past is the complete opposite of what I want a search engine to do.

Changing the results of later searches based on things I searched for in the past is the complete opposite of what I want a search engine to do.

Really? If I search for Python stuff, I want it to learn that I mean the language and not the animal. I want it to learn that when I type "Socrates", I mean the Greek philosopher and not my ex-Prime Minister.

Learning from the the user is all about changing the results based on past searches, and I think it can be very useful. Just not always.

I completely disagree, and I suspect a lot of Google users do too. Anyway you can turn off those features from here: https://support.google.com/accounts/bin/answer.py?hl=en&...

I wonder if the effect times out? If so, it could be reasonably construed as an attempt to present more relevant results across a session.

If, on the other hand, Google accumulates history until you sink into a morass of similar results, it's clearly harmful.

In the original WSJ article (yc: http://news.ycombinator.com/item?id=4741394), it's confirmed that this effect has a limited duration.

  > A Google spokesman said: "We aim to get users the best
  > answers as fast as possible" using techniques such as
  > examining "related searches." He said the goal for the
  > feature is to provide better results in a situation
  > where, for instance, a person who searches for "Harry
  > Potter," and then for "Amazon," actually wants "Harry
  > Potter" results from Amazon.com Inc. He said that the
  > technique saves Google users time and provides better
  > answers, but affects only about 0.3% of the searches the
  > company conducts.

That's not confirmation, that's a public relations response. Confirmation of limited duration would be actually running the experiment until the effect actually disappears.

    for instance, a person who searches for "Harry
    Potter," and then for "Amazon," actually wants "Harry
    Potter" results from Amazon.com Inc.
I wonder, is this really the case? It seems strange. Is there any research available about the frequency of such strange searches?

Here's a nice infographic for this: http://dontbubble.us/

Google has been serving personalized results for a while now, but this is probably first time it attracted such widespread attention from media.

I ran the search and I am not concerned at all... Google shows poll results with bright red and blue colors drawing the eye... News stories show up first, and pictures of Romney, and his website outrank these related results.

The "you recently searched for obama" results appeared below the fold and the story was quite frankly very unlikely to be noticed unless it was something I was looking for specifically, which is probably google's algorithm's intent.

I tried running the search in battleground states and the same results showed up and I tried a bunch of searches like "Romneys plan" "why vote for romney" "who should I vote for" "vote for romney" and none of them showed obama results...

P.S. I am a libertarian who voted for Romney and I approve this message.

"65% of people said personalized search was a 'bad thing'"

So does this guy ( http://www.ted.com/talks/eli_pariser_beware_online_filter_bu... ) on TED and will easily convince you if you aren't already.

if people really wanted to find the other side of the story its available to them. I absolutely love personalized search. When I search for something I want local results, I want results that are most relevant to me. Its been proven that people would only go to sites that support their ideas and beliefs already. So showing unbiased results make no difference.

So if things are already messed up, we should just throw our hands up, pander, and go along with it?

whats messed up ? I stated I like personalized search because 9/10 times I'm googling something its related and I'd want to be personalized , either by local or a topic i've searched before. Not personalizing results will probably make search worse and people less likely to search in the first place.

Assigning a moral value to Google's algorithms here, or to personalized search in general, doesn't make sense. I like personalized search - when I change main operating systems, as I do every few months, my information for debugging is always for the right os. If I look up hashmaps, it usually gives me results for the programming language I'm using that day. In the middle of a study session for linear algebra, I can just search for inverse and get a useful wikipedia page. If I want results that are not personalized like this for some reason - more balanced coverage, I actually want to know about pythons, whatever - then hopefully I have the discipline and ability to go to incognito or another search engine. Maybe I don't know about them, in which case if I'm politically engaged I will find out from a fearmonger. But the reality is that there are two products, each ideal for different people and use cases, and neither of them is evil. It has always been difficult to get 'unbiased' information, whatever that is. With Google, I have a chance to see beyond my local papers and the people I know in one way, and with DDG it's another.

"We chose to use these keywords (abortion, gun control and obama) because they are both a) searches where many people want unbiased results..."

I suspect that someone searching for these keywords are looking for reinforcement of their predetermined beliefs. Which is exactly what personalized search would do.

[edit] I've been downvoted so I feel I should clarify. I'm saying that these keywords are "magic" for a reason. This test should be performed using keywords people actually want unbiased results for. A good example would be a good javascript library for making graphs. (something I searched for yesterday and wanted to see an unbiased comparison). These keywords return biased results because they are highly polarized topics and you probably situate yourself on one of those poles - not in the middle. If google knows this about you, good for them and you because you found what you were looking for.

Filter bubbles are troubling for exactly the reason you don't seem to have a problem with.

It's not particularly good for the world to have Obama searches be filled with either Fox news results or MSNBC (pro-Obama/Dem) results.

I wish Gabriel had inserted a map of where the screenshots were taken. My hunch, which could be wrong, is that magic keywords are geographic too. In a state like NH which leans strongly, maybe the algorithm automatically identified Obama as a magic word. (This hunch is based based simply on ddg being not-quite-mainstream, so maybe its users are more democrat than the country?)

I think it'll be interesting to see the Google response to this. I think the most important question is whether identifying magic keywords happens automatically, or is programmer input involved?

DuckDuckGo is at it again, this time smearing politics into tailored search.

What if I turn off Google Web history?Doing this should resolve tailoring result,or not?

Google uses the word 'personalization' as a euphemism for 'censorship'.

No Free Lunch!

Just so that everyone is on the same page on what is being discussed here: this is a “study” by a competitor who can't implement personalised search and thus is spreading FUD to spin that as a differentiating advantage. It might be a good marketing strategy as he managed to get coverage by a couple of outlets during the elections frenzy in the US, but this "filter bubble" buzz-phrase is nonsense. Don't fear personalized results they are just another way of sussing relevance, and as mentioned in that post it concerns a tiny number of queries and has minimal effect on the results page.

> by a competitor who can't implement personalised search

I believe the main idea behind using DDG is that it does not personalise, collect data, etc. It's a feature rather than a bug. Your attack on DDG seems to ignore some facts...

(see https://duckduckgo.com/about.html)

"We also saw in Google chrome that magic keyword transformations sometimes jumped incognito sessions, meaning that if you started a new incognito mode, got a transformation, then shut it down and started a new incognito mode you could sometimes see the same transformation again (without searching for the same magic keyword again). This weird behavior was not reproducible in Firefox's private browsing mode."

If that is FUD, that is some pretty damn creepy FUD.

Yeah, that jumped out at me as well, and actually seems somewhat irresponsible without more data (like a log of actual http traffic).

AFAIK (when I last checked), Chrome keeps no data between incognito sessions. If data is being kept, it should be extremely easy to spot and a bug should be filed. "This weird thing happened" isn't really a sufficient response to that theory.

If there isn't data being kept and transmitted, it means either the testers left open an incognito tab or something without noticing (easier to do than with private browsing, since private browsing closes all non-private browsing tabs until private browsing is over), or that google is tweaking results based on search history saved server side by ip address (and maybe other id-able browser characteristics), but only if the browser identifies itself as Chrome, not e.g. Firefox. There doesn't seem to be any real advantage there (and you'll likely just end up polluting anything but the coarsest of clustering), but that's just speculation.

I'm not able to reproduce the behavior in Chrome (though I'm having trouble reproducing even simple magic keyword behavior in incognito mode in the first place), so more data would be appreciated.

I don't fear personalized results, I just don't like them. I switched to DDG because google results have been getting progressively worse and worse for quite some time. Rather than jump through hoops to try to convince google to just search for what I asked instead of what it wants to infer I meant to ask, I just use a search engine that does what I ask it to.

Maybe Google is not the problem if a voter's idea of political engagement is typing in "one word" and voting for whatever pops up?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact