Hacker Newsnew | comments | show | ask | jobs | submitlogin
Google Recently Made A Silent Shift To A New Search Algorithm, “Hummingbird” (techcrunch.com)
185 points by mumbi 288 days ago | comments


bane 288 days ago | link

I've noticed that I often just started asking google questions instead of trying to assemble a sequence of words that I think will divine the results I'm looking for.

It's always worked reasonably well and saves me from trying to come up with a search.

At the same time I've noticed that coming up with a sequence of search terms has been working worse and worse in google over the last couple years. I frequently get results for whatever google thinks I was searching for, especially if my original search terms resulted in very few results or no results, I'll just get a result page anyway except it's almost never helpful.

Perhaps this is an attempt to make Google more Star-Trek/Watson-like, and it's great for those use-cases. But for the other cases, like looking up specific serial numbers or whatever, it's a mess.

-----

furyofantares 288 days ago | link

> I've noticed that I often just started asking google questions instead of trying to assemble a sequence of words that I think will divine the results I'm looking for.

This is an interesting area where I found myself playing catch-up to the less technical people in my life. For a long time I saw them typing questions into google and getting bad results and my recommendation to use keywords never really stuck, probably because they didn't have the same mental model of how searching worked that I did.

Then along came a few sites that targeted question-askers and sometimes if you asked a question you'd find someone else asking the same question along with some answers of high variable quality. I didn't discover this myself, because I never typed questions into google -- I had to see observe the less technical people getting better results than me occasionally to pick up on this.

Still, they got worse results most of the time, and while it was a new tool in my arsenal, it wasn't usually what I was looking for and I used it sparingly.

Then when the first iPhone with Siri came out, I bought it for a family member and demo'd it. I did all the stuff the commercial was doing to make it seem like magic.

So my family member takes it and starts talking to it like it's a human. And I instantly regret what I've done, because I knew some keywords it would pick up on to look like magic, but I've made it look like you don't need to know the keywords, you just need to talk to it. So there they are, talking to it like a human, and I'm expecting it to fail. But instead, when they say "call my sister" it replies with a prompt asking who the sister is. I'm sure Siri is still a large number of special cases, but it's large enough that thinking of it that was failed me.

Similarly, I noticed one day that people asking google questions were getting better results than me sometimes even when they weren't looking for other people asking/answering the same question. So I've started asking google questions more often, but again, it required me to observe a less technical person doing so.

-----

krelian 288 days ago | link

I remember back in the old days search engines were explicitly saying that words like "and" "how" etc.. were ignored when returning the results for your search. These days these words are becoming the key to get you the results you are actually looking for.

-----

1qaz2wsx3edc 288 days ago | link

I'm going to answer this from a technical point of view. I find search is best done by using precedence.

For example say my question is: "how to use rails with devise and omniauth", I break it down to the group in which I think nets the most results should be the first keyword and so on.

1. rails 2. devise 3. omniauth

rails+devise+omniauth: https://www.google.com/search?q=rails+devise+omniauth

rails+omniauth+devise: https://www.google.com/search?q=rails+omniauth+devise

omniauth+devise+rails: https://www.google.com/search?q=omniauth+devise+rails

how+to+use+rails+with+devise+and+omniauth: https://www.google.com/search?q=how+to+use+rails+with+devise...

The top results are normally similar. With exception to the "how+to+use+rails+with+devise+and+omniauth" search, which is rather different.

From my standpoint the "rails+devise+omniauth" yields the best results as the source of truth is closer to the github documentation "OmniAuth: Overview · plataformatec/devise Wiki · GitHub", over third party information "#235 Devise and OmniAuth (revised) - RailsCasts".

Using "how+to+use+rails+with+devise+and+omniauth" gets rather off topic after the 3 or so results.

Another tactic I use is searching like a command line. For instance: "wiki list breaking bad", or "imdb iron man". This don't work with smaller properties, so "site:example.com" is another awesome tool.

Just my two cents.

-----

hrkristian 287 days ago | link

Quotes do not require the plus sign as a placeholder for space, unless I'm missing something. I often use quotes to search for specific Linux error messages with great success; any hit that isn't a direct match gets shafted.

-----

bane 288 days ago | link

> it required me to observe a less technical person doing so.

That's a really fantastic observation. I suspect that lots of these changes are meant for those kinds of people (who happen to make up the majority of the world) and not for tech folks at all.

But yeah, I notice that if I ask a question, quite often the results I get are on question asking sites like Stack Overflow -- it turns out I usually get very good results that way too ;)

-----

ivanbrussik 288 days ago | link

so what you are saying is that Ask Jeeves was on to something?

-----

halfninety 287 days ago | link

Search what people are likely to say when discussing this topic, it may be in the form of a question and maybe not.

-----

cromwellian 288 days ago | link

If you want to search for a specific string, force it by putting quotes around it, e.g. "376718578383"

-----

lbenes 288 days ago | link

I really miss the + operator in Google searches. Back when Google disabled it to make their Google+ easier to search for, someone posted some good examples of where " " fails to reproduce the + search behavior.

-----

rcavezza 288 days ago | link

Is that different from using a capital "AND"?

-----

cynwoody 288 days ago | link

Looking at Google's Search Operators page,† I see an OR, but no AND. I thought AND'ing was the default.

https://support.google.com/websearch/answer/136861?hl=en

-----

sp332 288 days ago | link

Completely different. It's a unitary literal operator, so if you put +shelf you would only get pages that had the word "shelf" and not shelved, shelving, bookcase, etc like you get normally.

-----

cynwoody 288 days ago | link

I thought they got rid of the + operator a couple of years ago, recommending instead that you surround the formerly plussed word with quotes or use the Verbatim option under Search Tools.

However, I just tried a search for "anagram +python" and got 67 results vs 139,000 without the plus-sign. Does that mean they quietly restored the operator?

-----

criley2 288 days ago | link

I haven't tried the "+" operator in a while but I know the "-" operator has continued to work.

I always use it to filter out commercial or store results when a query triggers some over-zealous SEO and throws up pages of junk. A quick "-buy" or "-store" usually cleans the results up.

-----

pbhjpbhj 288 days ago | link

I find now that the minus operator doesn't work half the time either.

Google results are getting worse and worse IMO and there don't appear to be any tools left in the available query language to rectify it.

-----

madeofpalk 287 days ago | link

> Google results are getting worse and worse IMO

They're optimising for normal people who don't formulate query strings in Google, but they ask questions or just search.

-----

saraid216 288 days ago | link

> I find now that the minus operator doesn't work half the time either.

I honestly suspect Google is A/B testing search results somehow.

-----

criley2 287 days ago | link

I honestly suspect no power users or technical users are included in A/B testing, or if they are, they're a realistic minority whom does not have changes made for them....

-----

dylz 288 days ago | link

+shelf, search tools, type, Verbatim

-----

reeddavid 288 days ago | link

Google often ignores quotes. I've had success using "intext:someword" (without quotes) to force Google to only return results that include someword.

-----

photorized 288 days ago | link

Loose matching = more revenue.

-----

cromwellian 288 days ago | link

Why was this voted down? It is the correct answer for searching for serial numbers, for example an Adobe Photoshop serial number: "1325-1576-6224-7891-3222-6645"

-----

Florin_Andrei 288 days ago | link

They've been messing with operators so much, I've lost track what works and what doesn't.

Apparently the double-quote still works. Well, that's good.

-----

Amadou 288 days ago | link

Double-quote is "sort of literal" (google will perform automatic stemming on single words in double quotes, but not on phrases) if you want really, really literal use "intext:" or "allintext:"

-----

contingencies 288 days ago | link

Good tip.

-----

6ren 288 days ago | link

All is not lost! If you try their sample question, with quotes, it only turns up 4 results - with the article on top. https://www.google.com.au/search?q=“Which+is+better+for+me+—...

But it seems inevitable that as google targets what most people mean, they will target less what niche people mean (us). They could do both, with explicit operators etc, but would double their indices, and not worth it (for them). :( At least it creates opportunity for the google-usurper.

-----

porso9 288 days ago | link

Yeah, people really need to know how to google effectively. You can also add "-" before a word to cut out results with that word. (example: "cats -dogs" will return searches with cats but take out ones that also include dogs.)

Then there's "file:" and a file extension name to find searches of that filetype. (example: "AP biology file:ppt" returns all ap biology related powerpoints.

Next there's "site:" and a website. (Example: "site:reddit.com fedoras" searches for fedoras on reddit, a very common thing. Useful if a site's search engine sucks.)

-----

cynwoody 288 days ago | link

In my experience, as a rule, site search engines generally suck. And the exceptions to the rule often turn out to be farming the search out to Google one way or another.[1][2]

I often simply ignore site search boxes in favor of using the "site:" operator.

[1]https://www.google.com/cse/

[2]http://www.google.com/enterprise/search/products/gsa.html

-----

the_watcher 288 days ago | link

site: is one of the more useful search operators I've discovered. It's almost always better than a site search button for me.

-----

ivanbrussik 288 days ago | link

is file: different than filetype:

-----

eieio 288 days ago | link

file: doesn't appear to be a documented command, while filetype is [1]

Searching for a few random queries with either "file:pdf" or "filetype:pdf" shows that both return pdfs, but the results that I get are different. Not sure what to make of that or which is better.

[1] http://www.googleguide.com/using_advanced_operators.html

-----

James_Duval 287 days ago | link

For me, "file:pdf" returns .pdfs, .pdf documentation, .pdf information...all kinds of things which aren't .pdf.

"filetype:pdf" seems to be the only command which reliably returns just .pdfs.

I'm in the UK, your mileage may vary.

-----

ecuzzillo 288 days ago | link

It often gets completely ignored.

-----

shamshiel 288 days ago | link

Yup, there have been plenty of times I've put quotes around my search terms and Google decided I didn't really mean to put quotes around them.

-----

nimble 288 days ago | link

What do you think Google should do with the following search:

   Who said, "To hair is human?"

-----

mkr-hn 288 days ago | link

I would want Google to leave it alone and suggest what it thinks is correct as an option, like it used to.

-----

nimble 288 days ago | link

I'd rather see:

  exact: Who said, "To hair is human?"
Quotes are too easy to be used accidentally by laymen.

-----

pbhjpbhj 288 days ago | link

Shift+2 just slips in their typing without any intention?

-----

eitland 287 days ago | link

I summarized the madness a while ago. I think today the situation is somewhat better but here it is: http://techinorg.blogspot.com/2013/03/what-is-going-on-with-...

@nimble: what about when you intentionally search for a misspelling?

-----

chimeracoder 288 days ago | link

Wow. I was definitely not expecting this: http://i.imgur.com/9Zpqota.png

Thanks for pointing that out.

-----

pbhjpbhj 288 days ago | link

That's a massive fail for Google IMO. Why isn't it matching phrases for the search recommendation? Surely "to err is human" popularised by Pope [a million plus results] should be suggested before "to heir is human" (a minor computer game) [60k results].

-----

nl 288 days ago | link

Interesting!

"heir" is a much closer match for "hair" in terms of Levenshtein distance[1] (which indicates a typing error) and soundex[2] (misheard). But clearly "to err is human" should be offered as a possibility too.

[1] http://en.wikipedia.org/wiki/Levenshtein_distance

[2] http://en.wikipedia.org/wiki/Soundex

-----

JTon 288 days ago | link

I only recall it being ignored when the string in quotes returns no hits

-----

adanto6840 288 days ago | link

I get a lot of "Showing results for X instead" when searching for model numbers or similar.

Usually you can click "search for {what I actually typed}" but it's still annoying. I liked it when +'s and double-quotes consistently did the same thing every time.

Maybe there is a setting that does this still? I've tried playing with a few settings but have had mixed results / consistency...

-----

moultano 288 days ago | link

If that happens it's a bug, and I'd love to have some examples to pass along. I'll make sure they get to the right people. Looking through your search history: https://history.google.com/history/ might help you find them.

-----

ecuzzillo 287 days ago | link

Unrelatedly terrible results: https://www.google.com/search?q=mac+ros+pcl+download+hangs+f...

-----

moultano 287 days ago | link

Are these results better? https://www.google.com/search?q=ros+pcl+download+hangs+mac

-----

ecuzzillo 287 days ago | link

Yes, of course, but I and everybody I talk to find it incredibly obnoxious that it ignores the space.

-----

moultano 287 days ago | link

I just wanted to confirm so I could document the issue properly. I wasn't sure what the original query was looking for. Thanks for reporting it.

-----

ben0x539 288 days ago | link

I think the intention isn't for searching for literally that sequence of words, though, just something approximate-ish question-ish.

-----

SeanDav 288 days ago | link

Quotes can help but they are often ignored.

-----

bfish510 288 days ago | link

also use - and + to force or remove certain results. (You can use this to limit areas of a website for example)

-----

curveship 288 days ago | link

- still works, but they removed + a couple (few?) years ago. Instead, they recommend you put the single term in quotes, so a search for 'linux +powerpc' would now be 'linux "powerpc"'.

-----

cynwoody 288 days ago | link

That, and they also have the Verbatim option under Search Tools, but using it seems to conflict with other Tools, such as date filtering, and also applies to the whole query, not just one or two terms.

-----

angersock 288 days ago | link

You're making me miss alta-vista. :(

-----

photorized 288 days ago | link

That doesn't always result in strict string matching with Google.

-----

SeanDav 288 days ago | link

Google is increasingly becoming an "Ask Jeeves" clone. I have very little control of my searches and Google just returns what it thinks I meant, rather than what I actually typed.

I would love an advanced search mode with some simple boolean logic, even if it took several seconds to return the result, for those times when you really need to sift through a lot of garbage to find the results you really want.

What is the point of Google returning results in 0.000000001 seconds if they aren't the results I wanted and I have very little means of refining the search. The moment a new search engine comes with this functionality and a decent indexed base, it's bye-bye Google for me.

-----

moultano 288 days ago | link

If you have any example queries I'd love to debug them. You might find them here: https://history.google.com/history/ if you have search history turned on.

-----

SeanDav 288 days ago | link

That's great! I do not have search history turned on but will definitely find you some nice examples of what I mean going forward.

-----

stusmith1977 287 days ago | link

Does "verbatim" mode approach what you need? Google won't attempt to guess what words it thought you meant in that mode.

-----

codemac 288 days ago | link

This is why I basically use DuckDuckGo for any search where I know exactly what terms I want to be super weighted, and Google for things like: "What is the weather in Amsterdam?"

It's worked out really well. That plus the !g operator in ddg means I get the best of both worlds, I'd suggest you try it for a day or so and see how it goes.

-----

lnanek2 288 days ago | link

Same here. I was struggling over some results a couple times today where I entered keywords and it seemed more like Google was trying to answer a question. Wish I could turn the new thing off. I don't use Google that way.

-----

Amadou 288 days ago | link

But for the other cases, like looking up specific serial numbers or whatever, it's a mess.

It is a boatload of extra typing, but the "allintext:" operator is your friend for those kinds of searches. I wish there was a URL modifier that could force google to assume allintext: on the search - that way I could have two different google search engines in firefox, one for literal and one for more "semiotic" searches. Maybe there is and I just don't know it?

http://www.googleguide.com/advanced_operators_reference.html...

-----

NSAID 288 days ago | link

You could set that up in Firefox. Create a keyword search for Google[1], and replace &q=%s in the link to &q=allintext:%s

[1] http://support.mozilla.org/en-US/kb/how-search-from-address-...

EDIT: Looks like Chrome works in a similar manner: https://support.google.com/chrome/answer/95653

-----

agibsonccc 288 days ago | link

You only need to look at google's knowledge graph alongside google now to see that that's where they're going. Many Question Answering Systems are able to handle many kinds of questions with relative accuracy now. With all of the different kinds of data sources out there, I'm not really surprised.

That being said machine learning is far from perfect. Allowing for user correction is still an immense must.

Let's hope basic search operators like quotes don't go away anytime soon.

-----

bumbledraven 288 days ago | link

Turning on Verbatim mode can help. To do this, after you search, go to Search Tools -> All Results -> Verbatim.

-----

Zoomla 288 days ago | link

I have noticed the same and I am having more and more difficulty to find exactly what I want in the first result page. The disappearance of the minus and plus operators is probably the major reason for me not getting the results that I want.

I wish the 1998 Google page that made the front page on HN was real (with updated index of course).

-----

obilgic 288 days ago | link

I operate 2 websites that are exact clones of each others, the only difference is the domain. This algorithm change literally shifted %70 of the traffic from one of them to the other one, in a matter of hours.

Edit:

    * Links to the websites, I would say almost identical
    * Same number of the pages are indexed by google, around ~5 million
    * Domains are almost same, no keyword difference
    * They both have same pagerank
    * Domains are registered together
    * Sites are hosted on different ips
    * Total traffic sites get is around ~40k/day unique
    * By this change, total unique increased by %10

-----

rossjudson 288 days ago | link

Sounds like the algorithm noticed the duplication, and picked a "winner". The new winner isn't the same as the old one.

-----

boomzilla 288 days ago | link

While we are on this topic, does anyone know the current state of art for text duplication detection algos? I understand that Google used LSH but they must have made a lot of progress since.

LSH: http://en.wikipedia.org/wiki/Locality-sensitive_hashing

-----

rossjudson 288 days ago | link

I'm not sure about the algorithm in use, but I what I hope is happening is that Google is now looking for the earliest publication of content when deduplicating. Most copycat sites have to copy their text from something existing, and if Google has already indexed that, they know that later versions of it are copies (and can presumably be knocked down in rank).

-----

ehsanu1 288 days ago | link

in a matter of hours

TFA mentions this update happened weeks ago. How are you able to pin this down to the hour and correlate your traffic changes to the algo change if we don't know when the algo change happened exactly?

-----

obilgic 288 days ago | link

Shift happened weeks ago, since then I am waiting for any kind of update from google about the algorithm change. Also I am not arguing what time they have deployed the new algorithm, I am just saying that the traffic shift happened in a matter of hours.

-----

Arelius 288 days ago | link

I think he was just hoping that you could pinpoint the change to a specific date.

-----

chaz 288 days ago | link

I think I saw Hummingbird happen to one of my sites, but my traffic shift happened over the course of 5 weeks, with a steady ramp up in traffic. Very different from the near step-function you saw.

-----

Kudos 288 days ago | link

I doubt the incoming links are also identical.

-----

DGCA 288 days ago | link

Do you have any guesses as to why?

-----

agumonkey 288 days ago | link

google used to care about matching domain and content, maybe it identified the 70% one as a more realistic name.

-----

hackinthebochs 288 days ago | link

That's amazing. Do you have an idea what the difference is? Something like a hot keyword in the domain, or even a closely related word?

-----

wehadfun 288 days ago | link

It seems to work:

Where is disneyland - Shows a map and has "Get Directions" https://www.google.com/#q=where+is+disneyland

How big is disneyland - 160 acres https://www.google.com/#q=how+big+is+disneyland

-----

alayne 288 days ago | link

https://www.google.com/search?q=how+did+walt+disney+die

-----

turing 288 days ago | link

That's cool, haven't seen this one before. Thanks for sharing :) Played around with it a bit, and it seems to have pretty good coverage; got everyone from Frederick Douglas to Dijkstra.

-----

riffraff 288 days ago | link

yesterday for the first time I noticed the inline translation e.g. "cat in portuguese" -> gato.

Not sure if it's new.

-----

rmckayfleming 288 days ago | link

Google has an interesting localization issue when it comes to Canada. If you ask it for someone's height, it gives it in metric, but everyone here uses imperial for body related measurements. The same goes for area, I hear acre a lot more than hectare, but "How big is disneyland" gives me 65ha.

-----

crucini 288 days ago | link

There must be a name for this syndrome - tendency of US-based developers to over-exoticize foreign places, taking at face value all official units of measure, languages, etc.

-----

benvd 287 days ago | link

When it comes to grammar, the term hypercorrection is typically used. Seems like it could be applied here as well.

http://en.wikipedia.org/wiki/Hypercorrection

-----

camus 288 days ago | link

yep , it works , i tried :"how tall is the eiffet tower?" gave me the result ( ~=324 meters).

"What is PI?" shows me a calculator with the result.

-----

nostromo 288 days ago | link

If people use your product all day, everyday, and you release the biggest overhaul of your product in 4 years, and nobody notices, is that a good thing, or a bad thing?

It makes me wonder if Google has become so "good enough" that more and more engineering effort will be spent for smaller and smaller returns.

-----

bane 288 days ago | link

Playing devil's advocate, they say in movies that the best special effects are the ones you don't notice at all.

-----

baddox 288 days ago | link

I think that's more applicable to less obvious things like the audio production in a film.

-----

chc 288 days ago | link

It applies to a lot of special effects too. Many — probably most — special effects are striving to look organic. For example, did you notice that half of Titanic was shot in front of a green screen? Were the special effects in the Adventures of Superman TV series better because their chromakey work was more noticeable?

It isn't that the special effect doesn't have an impact, but that you don't specifically notice the effect because it blends seamlessly into the rest of the film.

-----

JonnieCache 288 days ago | link

Oh, it goes a lot deeper than that my friend. It's now normal for standard-issue comedies and dramas to be chroma keyed, comped, etc:

https://www.youtube.com/watch?v=clnozSXyF4k

https://www.youtube.com/watch?v=WhN1STep_zk

-----

baddox 287 days ago | link

> For example, did you notice that half of Titanic was shot in front of a green screen?

Okay, I guess it depends what you mean by "notice." I certainly was aware that much of Titanic was shot in front of a green screen. No matter how photorealistic special effects become, it's often fairly obvious what's real and what's not.

-----

notatoad 288 days ago | link

i think it's unequivocally a good thing. Google has a product that works, they aren't trying to change the functionality. People's habits online and the type and volume of data that google indexes has changed, and google needs to update their algorithms so that the product functions the same despite the new inputs the algorithm is receiving.

-----

agumonkey 288 days ago | link

Sometimes the good things are the one you don't notice. Maybe they noticed an increase or different usage patterns on their side.

-----

niuzeta 288 days ago | link

This. One prime example I can think of is Gmail. I've used it pleasantly, while all the under-the-surface updates happened. I was pleasantly surprised when one day, Gmali client to remind me that I'd forgotten to attach the file I've said I've attached. It's one of the subtle updates which simply enhances the experiences, like performance changes.

Unfortunately the opposite example can be found in the same product, Gmail; when they visibly and ostensibly changed the UI, it was immediately noticed, and widely hated.

-----

agumonkey 288 days ago | link

It's as if life forms have evolved communication means for the only purpose of complaining about regressions :)

-----

tsycho 288 days ago | link

Assuming "no one notices" as a good proxy for search quality remaining the same, I would argue that Google's getting better since the playing field is getting harder...

* There are a ton more people actively researching ways to reverse-engineer and game the system.

* There are a ton more sites and more content getting generated, including bot-generated ones which are of dubious quality.

-----

moultano 288 days ago | link

Looking at it from the inside working on search, I see the returns as actually getting bigger and bigger. As Google gets better, people get more confident in issuing more complicated queries, which ups the bar again for the types of things search has to be able to do.

-----

chaz 288 days ago | link

I think people are just bad at quantifying their own habits.

-----

sjwright 287 days ago | link

It's likely that the new "signal" was progressively added to the existing mix, to avoid sudden jarring changes. Boiling a frog, etc.

-----

bsullivan01 288 days ago | link

If people use your product all day, everyday, and you release the biggest overhaul of your product in 4 years, and nobody notices, is that a good thing, or a bad thing?

How do you know it was the "biggest overhaul"? Google is known to use every moment to hype itself.

Also Google has been doing these changes since 2011 almost monthly. Now sites that sell things or make money have almost given up on "free" traffic from Google, they know it's pay to play (outside Product Search it's not openly pay-to-play but you get the message after a few 70% traffic reducing updates).

-----

drewjoh 288 days ago | link

They were just quoting the article: "While they did say that this was the biggest overhaul to their engine since the 2009 “Caffeine” overhaul (which focused on speed and integrating social network results into search) and that it affects “around 90% of searches”, there wasn’t much offered in terms of technical details."

-----

ivanbrussik 288 days ago | link

Penguin .001% of search results - EVERYONE PANIC Hummingbird 90% of search results - No one cares

-----

More



Guidelines | FAQ | Lists | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact

Search: