
Google may rank sites for queries that don't appear on the page at all - whalabi
https://unlikekinds.com/article/google-ranking-keyword-not-in-content
======
scoutt
Google has become more and more useless lately, at least if you are searching
for specific tech info and not "10 weird hacks to". For example, I've been
doing lately some Android AOSP and BSP customization for a custom board. To
get useful information and avoid "top 10 Android tricks" and Android
hacking/cracking forums I often end up with a query composed of 10% search
terms and 90% filters.

Or if you do a search for _name surname_ it may return results for _" name"_
with _" missing: surname"_. That is, how on earth if I search about (for
example) Peter Petrelli a result about Peter Thiel would be the same thing?
That's a full 50% of my search query that Google is discarding...

~~~
IpV8
I installed a browser extension that adds a blocklist to google's search
results. So every time I visit a '10 wierd hacks to...' type of page I block
the entire domain from ever showing up in my results again. It take a while to
filter them all out, but it drastically increased my search experience.
Anyways I use duck duck go nowadays so now my plugin is irrelevant...

~~~
stefanfisk
Which extension are you using? Personal Blocklist broke for me a while back,
and now my days are filled with scrolling past w3schools.com to get to
developer.mozilla.org

~~~
dhimes
A duck duck go search of something like

<what I'm looking up> javascript mdn

give me developer.mozilla.org at the top every time. It's somehow better than
site:developer.mozilla.org for some weird reason.

~~~
frosted-flakes
Just put !mdn in your DDG search and it will take you right to MDN. Same with
!g for Google, !w for Wikipedia, etc.

~~~
Nition
In Firefox, right-click the search box at developer.mozilla.org and select
"Add a Keyword for this search". Put a keyword, something like m.

Then you don't even need a search engine. Just type

    
    
      m your search term here
    

in the URL bar to search the site directly.

~~~
dhimes
Wow! That's wicked cool- thanks for the tip!

EDIT: I'll add for others that this keyword is stored as a bookmark, so you
can find it in your bookmarks and edit/delete/etc.

------
softwaredoug
I work on (non Google) search engines. This kind of thing has been happening
for decades. Whether it’s synonyms, entity extraction, or embeddings to
capture conceptual relationships, I’m surprised this is surprising... but then
again I’m a search nerd :)

~~~
jakubp
On Google, at least a few years ago and back, when you put a query "X" you
were guaranteed to see X in the HTML contents of the page. Or at least I can't
remember getting a result that didn't satisfy that condition. Now it's common
to not find X there.

~~~
JoeSmithson
Can you provide an example of this? People always say this, but never provide
the query.

~~~
lukeschlather
This is an example from yesterday:

"azure" devops agent "checkout" "freeze"

I'm obviously searching for something very specific here, and Google helpfully
decided to show me results omitting "freeze" and "checkout" so I had to quote
them.

I also notice that it looks like Google thinks that check-up is the same thing
as checkout, which in context are not synonyms, though this would require
Google to infer "git checkout" but I didn't include git because that would
introduce another universe of unhelpful results.

[https://dzone.com/articles/getting-started-with-jenkins-
the-...](https://dzone.com/articles/getting-started-with-jenkins-the-ultimate-
guide) was included which doesn't include "azure" at all, though this is
obviously Google being cheeky.

Of course I could also call this somewhat sinister, as Google is basically
saying I should ditch a direct competitor to their services and instead use a
self-hosted service. Or maybe I'm projecting since _I_ would like to ditch
Azure DevOps and use Jenkins, but in any case there's an example.

Feel free to play with it:
[https://www.google.com/search?rlz=1C1CHBF_enUS820US820&biw=1...](https://www.google.com/search?rlz=1C1CHBF_enUS820US820&biw=1396&bih=686&ei=JbGTXIfPMsv9-gTdlbroCQ&q=%22azure%22+devops+agent+%22checkout%22+%22freeze%22&oq=%22azure%22+devops+agent+%22checkout%22+%22freeze%22&gs_l=psy-
ab.3..0i71l8.0.0..79010...0.0..0.0.0.......0......gws-wiz.JV1CR-xncd4)

Of course, in this case it was a networking problem so I do have to admit that
usually when Google starts ignoring what I typed it's because the answer to
the question is something else entirely. Though sometimes I know exactly what
I'm asking for and Google doesn't get it.

~~~
JoeSmithson
Wow THANK YOU! I have been asking for an example of this for literally years.

I think possibly what is happening here is that the index believes "azure" is
on the page.

Consider the second result here:
[https://www.google.com/search?q=%22azure%22+%22Allows+adding...](https://www.google.com/search?q=%22azure%22+%22Allows+adding+Secure+Shell%22+site%3Adzone.com&rlz=1C1CHBF_enGB814GB814&oq=%22azure%22+%22Allows+adding+Secure+Shell%22+site%3Adzone.com&aqs=chrome..69i57j69i60.497j0j4&sourceid=chrome&ie=UTF-8)

(the one that starts java.dzone.com but leads to the same article).

It appears to have captured at index time a phrase "Microservices and
Serverless on Azure" that is no longer in the artcile (possibly a link to
another article).

Perhaps a Googler can explain exactly what is happening.

------
kjhughes
This has been happening for many years now. In fact, Google introduced
"verbatim" search back in 2011 for those who wished for search results that
more strictly contained the exact search terms:

[https://searchenginewatch.com/sew/news/2126346/google-
introd...](https://searchenginewatch.com/sew/news/2126346/google-introduces-
verbatim-searching)

Verbatim search is still available today. Searches from the URL bar can be
made to use verbatim search by default via this search string:

[https://www.google.com/search?tbs=li:1&q=%s](https://www.google.com/search?tbs=li:1&q=%s)

~~~
JohnFen
Verbatim mode is very insufficient. We need actual logical operators to go
with it.

------
lemoncucumber
This headline is silly. Google has ranked sites for text that doesn't appear
on the page basically for as long as Google has existed:
[https://en.wikipedia.org/wiki/Google_bomb](https://en.wikipedia.org/wiki/Google_bomb)

------
JohnFen
I wonder if this is related to the fact the quality of Google's search results
has seriously declined for me over the years. I keep getting the feeling that
Google is trying to second-guess what I'm looking for, and not only getting it
wrong, but getting it more and more wrong as time goes by.

------
taneq
The annoying thing is when you get pages of results for (popular keyword)
which is maybe tangentially related to (alternate interpretation of keyword
you used).

~~~
stordoff
It's frustrating because it just translates into making Google harder to use,
and multiple searches for the same thing are now common. I often see, on a
four word search, the top five or so results all having "Missing: word3
word4". I often search for "<thing> <thing I would like to known about it>",
and Google goes "Here's some general information about <thing> instead". It's
not _useful_.

~~~
XCabbage
But does this happen because Google is ranking those results above ones that
matched both words, or because _nothing_ on the web matches both words? I'm
often unsure what's going on when I experience this.

~~~
taneq
There used to be a time where if you typed three words and got zero results,
it was "no page on the ENTIRE INTERNET contains those three words." Now it's
"these are what the algorithm thinks you want".

------
dumbfounder
It has always been the case that the search terms do not necessarily appear on
the resultant pages, this is actually the cornerstone of their algorithm. They
use the text in the links and nearby text to apply to the site it is linking
to. This was from day 1. It was why you would get good results for categorical
searches like "search engine", "shopping site", "best auction site", etc. It
didn't say those terms on the sites that showed up high in the list, but
that's how everyone described them.

------
babyslothzoo
The more Google uses this type of presumptive search behavior rather than what
I'm actually searching for, the more I find myself using Duckduckgo and Bing
for finding specific things with specific search queries.

If I am searching for "xyz abc" I am searching for those specific terms for a
reason, so please don't present me with results for the entirely unrelated
"123", and no I am not talking about showing synonyms.

------
ComputerGuru
Recently discussed here:
[https://news.ycombinator.com/item?id=19380212](https://news.ycombinator.com/item?id=19380212)

Posted about it at length here: [https://neosmart.net/blog/2016/on-the-
growing-intentional-us...](https://neosmart.net/blog/2016/on-the-growing-
intentional-uselessness-of-google-search-results/)

~~~
aboutruby
This is a link to a thread (created by the parent commenter) with 15 comments
on a post about DuckDuckGo.

Link is from 2016 obviously.

------
kaikai
I always wrote elaborate search queries with most important terms first, no
filler words, etc.

Lately I'm using more natural language searches, because they return better
results. Probably they're optimizing the engine to return the best results for
the most people, and most people want to know "how do I turn on the furnace"
not "[model number] furnace manual"

------
opportune
I don't mind google replacing keywords with similar word2vec or doc2vec
embeddings (or some other statistical fuzzing) that they found typically
increased relevancy/click-through on users. Not sure why anybody really does
other than the fact it makes the reasoning behind why certain results show up
more opaque.

------
eli
Hasn't google has ranked pages for text that appears in links _to_ them but
not _on_ them for a long time?

~~~
goodcharles
Yes this is true. Google will rank web pages, based on the inbound links
pointing to the page, even if Google hasn't crawled the page itself.

This happens regularly when a web page is blocked from crawling via Robots.txt
file. Google still indexes and ranks the page, but Google has no idea what is
on the page.

Want to keep a web page out of rankings? Allow Google to crawl it (by not
restricting via robots.txt), but use a NoIndex meta tag or X-Robots-Tag HTTP
header to indicate that the page should not appear in search results.

~~~
dazc
Not only will google index all those pages you are blocking them from but will
also penalise you when they have indexed x number of them.

Fix the problem with X-robots-tag, request removal from index (*while such
option is available) and then wait for months for the penalty to be lifted.

------
mtsx
Sir, 4843218317 appeared on your article so many time.. 48.. 2 times..32 3
times..18 once ..83 once...31 once...17 4 times...(maybe,its only for number
search :) )...i'm a hacker from Bangladesh.{ha ha :) }

------
Neil44
I’ve seen this when googling the numberplate of an old Ferrari. Google knew
what model it was and returned results as if I’d searched Ferrari 512 TR or
whatever. I was pretty impressed / freaked out at the time.

------
groestl
Here is how this works: [https://www.link-assistant.com/news/keyword-
refinements.html](https://www.link-assistant.com/news/keyword-
refinements.html)

Basically, Google tries to anticipate query refinements, improving the search
as a skilled user would do. Most users are not skilled though, so this leads
to an overall better search experience for them, but makes the experience for
skilled users worse, due to false positives.

------
hartator
> On perhaps Google is linking pages based on how searchers move around the
> web. For example, a searcher might first search for the phone numbers and
> not find what they’re looking for, so they search for something like the
> article title, and end up on the article. Google might see a high enough
> proportion of users behaving this way, and decide to save them the trouble
> of performing the second search.

Or - way simpler - maybe a link to the page at the text "4843218317" on it.

------
mqus
maybe I'm just weird but I always look at the green text below the blue links
and expect to see my search terms (or their derived equivalents) with a bit of
context. I can't understand why google would even list links where it shows me
some context but without _any_ of my search terms in it. I probably wont click
on them anyway(unless the title or url are containing search terms)

------
utopcell
More often than not, if a term is missing from a page, chances are it is on
the anchor text of a link that points to the page.

------
crazygringo
It doesn't seem _completely_ random... the first two results which include the
number are about Uber SMS messages, and the blog post is about Uber SMS
messages.

So Google has clearly determined that this number is somehow associated with
sketchy Uber SMS messages, out of a very sparse signal...

That seems pretty cool actually. Like, _really_ cool.

------
MockObject
Term frequency–inverse document frequency is a technique that can retrieve
documents related to a term, but which literally lack it.

[https://en.wikipedia.org/wiki/Tf%E2%80%93idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)

------
ucaetano
Isn't this the core concept of PageRank?

If there are 10 million websites out there linking to a single article when
referencing X, even if such article doesn't include X verbatim, that article
will rank high in queries for X.

------
tomerbd
I like google

------
vesinisa
Googling 101: surround the search phrase with literal double quotes ("") to
request an exact match.

Includes the unwanted page:
[https://www.google.com/search?q=2109085405](https://www.google.com/search?q=2109085405)

Does not include the unwanted page:
[https://www.google.com/search?q="2109085405"](https://www.google.com/search?q="2109085405")

~~~
paco3346
Except this doesn't exact match anymore. It's just a strongly suggested
phrase.

~~~
vesinisa
[https://www.google.com/search?q=%22Except+this+doesn%27t+exa...](https://www.google.com/search?q=%22Except+this+doesn%27t+exact+match+anymore.+It%27s+just+a+strongly+suggested+phrase.%22)

I rest my case :)

~~~
dcbadacd
:) You're wrong because I've seen strings between quotes being ignored as well
:)

