
Google search takes 7 seconds on certain queries - Liron
https://twitter.com/liron/status/1157327854033674241
======
thrwawy3493
I have that beat, as I once made Google perform a 30-second query. I know
exactly how I did it, too. I wasn't sure whether or not it would find it...

So how does Google do a query? Well fundamentally it has lists of sites that
have a keyword, so it gets the list for each of your keywords, then finds the
intersection of those lists, then sorts it by the pagerank of the resulting
URL's. That's how it used to work anyway.

So back in the day I decided to try to do a test. I got the Gutenberg list of
all of Shakespeare's words, and I wrote a program to go through it to find the
longest string of all "stop" words (which you could still force Google to
include - normally it ignores them, since there are just too many sites that
include words like "the" and so on). Stop words are super common words like
"the" which would be on millions or billions of pages. A huge list of URL's.

So I coded up my little algorithm that goes word by word through the text,
keeping track of what the longest string it found so far is, that consisted
entirely of stop words.

The longest string it found was: "from what it is to a".

Just now I again did that query, and on the whole Internet it just found 4
matches (soon it will find this comment too and any archives of this comment),
all being the exact phrase from Shakespeare:

[https://imgur.com/a/lvU1QSs](https://imgur.com/a/lvU1QSs)

Impressively, that query now takes 0.84 seconds (I haven't done that query in
several years, it's possibly cached but I doubt it.)

However, when I first performed it, it took 30+ seconds. I didn't take a
screenshot but I was super impressed. I brought Google to a crawl for 30
seconds, in the exact way I was intending. Moohahahahaha.

"Holy cow. I just made Google's databases join six lists each with millions of
pages on them, find the intersection, and then go through all of them for
which ones had my phrase in literal order. And then it found it."

Pretty mind-blowing that today it can do that in < 1 second.

~~~
thrwawy3493
The key for people who don't get why it was so much work for Google is this:
where I just wrote "then go through all of them for which ones had my phrase
in literal order" I meant "then go through all of the _cached contents_ of
every single one of the web pages on the results list -- since the joined list
itself would have every web site that has those 6 words, which is pretty much
every English-language page on the web of more than a few thousand words. So
it neeeded to scan through the cached contents of all of them to find which
ones had the phrase in order."

For example I just looked at the top post on Reddit right now, it's about 20
page-downs of comments, and has the words this many times:

    
    
       from 17
       what 22
       it 223
       is 169
       to 195
       a - more than 1000
    

pretty much every English-language web page on the whole Internet will have
those words, unless it is very short.

~~~
wereHamster
> pretty much every web page on the whole Internet will have those words,
> unless it is very short.

… because everyone speaks english?!?

England had a large empire back in the days but it wasn't that big, nor did it
span the whole world.

~~~
treypitt
"The sun never sets on the British Empire" it was up there historically with
Rome and Alexander's conquests (although the latter may not have constituted a
bona fide empire)

------
bubble_talk
For everyone who is suggesting Google search quality has taken a hit, I would
suggest a quick test to see for yourself that Google is still WAY better than
the competitors when it comes to long tail queries.

Example: someone compiled a list of things they learned from indiehackers.com
interviews.

[http://www.toomas.net/2017/07/18/reverse-engineering-a-
succe...](http://www.toomas.net/2017/07/18/reverse-engineering-a-successful-
lifestyle-business-heres-everything-ive-learned-from-reading-indiehackers-
com/)

There are a lot of quotes on that page, and unfortunately none of them are
linked back to the original interview. Some quotes are very interesting, so I
wanted to find the original interview.

I took some of those quotes, put them in double quotation marks, and searched
on Google, DuckDuckGo and Bing. By the way, you can only replicate these
results by adding the double quotation marks.

Results:

Google always shows toomas.net in the top results, and _almost always_ finds
the relevant interview article on the indiehackers website.

DDG (usually) finds the article written on toomas.net, but not the
indiehackers interview.

Bing often fails to list toomas as the top result, and doesn't find the
indiehackers interview at all.

~~~
Hedja
It's worth mentioning that indiehackers.com is a JavaScript-only website and
doesn't work with JS turned off. Google's spider can execute JS, while others
do not. This is why indiehackers.com doesn't show up, not because of the
search engines themselves.

I have a similar issue with a side project of mine. JS-only websites solve
this with server-side rendering or static generation.

~~~
hedora
> Google's spider can execute JS, while others do not.

I’m honestly having a hard time seeing how this isn’t a bug in Google search.

~~~
jefftk
1 point by jefftk 0 minutes ago | edit | delete [-]

A search engine is trying to predict which pages will be helpful to you in
response to your query. This means it should take "how does the page look to
users" as input, which today means executing JavaScript.

(Disclosure: I work for Google, not on search)

~~~
dredmorbius
Contrapoint: if you keep enabling hostile behaviour, you'll get more of it.

Non-JS web indexing as a default would vastly dimnish the utility of JS-only
pages.

------
jmpeax
Great, now I'm on a list:
[https://i.imgur.com/I8FM4aN.jpg](https://i.imgur.com/I8FM4aN.jpg)

~~~
alienallys
Query was "area 51 goto DevTools and edit"?

~~~
jmpeax
Exactly. My screenshot was a tongue-in-cheek comment on the black bars in the
original post. Unless we can reproduce/test it ourselves, then it's as good as
a devtools edit.

------
d-sc
Tdlr: Googling “Powered by” seems to cause google to take excessive time to
return results.

~~~
xurukefi
2.64 seconds for me (i guess it's cached now?).

Honestly, the mentality in this twitter thread is a joke. Google handles
insane amounts of data beyond what 99.99% of engineers have ever dealt with.
The fact that their search works at all is a miracle. But I guess people like
to ignore that and much rather complain on Twitter that it takes a few seconds
to query a database of trillions of records.

~~~
ehsankia
It very clearly has to be a bug. I also got ~2.5s. It makes no sense for it to
take that long for everyone, even after running it multiple times in a row.
The system most definitely has many types of caching, so if it takes that
long, there's probably something going very wrong.

~~~
ibejoeb
There are comments on the thread that suggest that it is an intentional delay
to prevent Google itself from being weaponized. Essentially, "powered by" is
special because of automatic inclusions like "powered by wordpress" or
"powered by nginx."

~~~
ehsankia
How can it be weaponized? Can you not search any other keyword used by those
platforms? "by wordpress" is pretty fast.

~~~
ibejoeb
I'm just pointing out the hypothesis and the train of thought. I don't know
the real answer here. It's not my hypothesis.

If, for example, you wanted to harvest domain names because you had a hot 0day
on wordpress, this would be a convenient way to do it.

Yes, there are probably other signatures, too.

------
imdsm
I posted about this a while ago, where queries took even longer with certain
calculations. -2 * 2 takes ~10 secs.

[https://news.ycombinator.com/item?id=17470163](https://news.ycombinator.com/item?id=17470163)

------
mehrdadn
Yeah, definitely seen Google take a long time. Also seen weirder things. Just
a couple days ago I managed to type a query that neither told me there are no
results, nor did it actually return any results. Hadn't seen that before.

~~~
Sendotsh
> Just a couple days ago I managed to type a query that neither told me there
> are no results, nor did it actually return any results. Hadn't seen that
> before.

I've had that a few times lately. I thought it was my internet being shoddy
but the header and footer load, just no results in the middle nor any error
about not finding anything... just blank space.

~~~
andromeduck
This is probably a big but I wonder what it would return if all the results
for GDPR'ed or otherwise legally censored.

------
ysleepy
I have sympathy with the guys watching the metrics seeing a huge spike in
outlier response times and go investigating after this made its round.

------
stareatgoats
tldr; querying the term "powered by" causes the query to take an inordinate
amount of time (still less than 10 seconds, so a far cry from "all day").

The reason doesn't seem clear, but one comment [0] claims that "powered by" is
a common query used by black hats and spammers to find targets. Not sure why
any "antispam" behind the scenes would cause the query to delay though.
Hardcoded delays?

[0]
[https://twitter.com/syndk8/status/1157930276208750593](https://twitter.com/syndk8/status/1157930276208750593)

~~~
stareatgoats
Finally found a comment that makes sense actually [0]:

"Powered by X" is on millions of web pages because it gets autogen'd by
popular web-facing CMS (Joomla, Wordpress etc). So when a search query
includes "Powered by" google must determine which among the billion pages with
this phrase is most relevant"

[0]
[https://twitter.com/bradcog/status/1158008651619004417](https://twitter.com/bradcog/status/1158008651619004417)

------
amelius
How many CPUs does Google have at its disposal these days? And how many
queries are performed per second?

~~~
deathwarmedover
DuckDuckGo make some of these sorts of numbers public:
[https://duckduckgo.com/traffic](https://duckduckgo.com/traffic)

~~~
rightbyte
The actual searching is performed at MS and Yahoo, though.

------
noncoml
“hackernews powered by lisp” seems to be the problem

------
dekhn
I've heard that numeric range queries make the backends work harder. Consider
adding those to your queries to make them slow.

------
hedora
I tried to switch from duck duck go to google, but the seven second latencies
for certain two word queries got me to switch back after a few days.

------
gambiting
Slightly off topic - has anyone else noticed that in the last couple years
Google results have just become....crap? Like, as a programmer I'm used to
googling all sorts of things, but lately it's almost useless. Like, my recent
fail was searching for "<class name> C# programming" and......the entire first
page had both the class name and C# crossed out and was showing me generic
results for programming. Top result was some website offering courses in
programming. What the shit Google. I personally think it's the rise of devices
like Google Home that's to blame - Google tries to reduce every search query
into something that can yield a short snippet that can be read back to you -
so it's very aggressive towards highly technical queries that it used to be so
good at.

~~~
crummy
That's my biggest frustration, when I search for "X Y" and they return me
results without "Y" at all, and even call that out in the results.

I end up having to put everything in double quotes just to get results that
I'd expect.

~~~
mort96
"I" "miss" "the" "times" "when" "I" "didn't" "have" "to" "type" "like" "this".

------
trilila
Also off topic. Google seems to favour "essays" over news, and most times you
need to read the history of man kind before you get to relevant content in a
news piece. Also it has no means to detect clickbait which is seriously
problematic.

~~~
onion2k
Pagerank uses a bunch of things that would favor older pages - more inbound
links, more clicks in the search results, and so on. They have "news search"
if you just want news: [https://news.google.com/](https://news.google.com/)

~~~
Yetanfou
The "Google News" search can not be seen as an objective search for "news" as
it is biased both by the selection and weighting of sources by Google (staff
and algorithms) as well as by the search history profile. If you still want to
use Google News make sure to access it using a clean browser profile (not
logged in, no Google-related cookies) to avoid the search profile
contamination which otherwise colour the result. The source selection and
weighting bias can not be avoided as that is part and parcel of the way the
thing works.

~~~
londons_explore
I don't think anyone expects Google news to be unbiased?

It collates hundreds of sources, many with obvious or subtle bias. If an
aggregator puts bias in, they can't provide unbiased out...

~~~
Yetanfou
Why not? Google Search started out unbiased, Pagerank (the algorithm) did not
care about any opinions or attitudes or leanings and the results ended up
being representative of what was out there on the 'net. The same approach
could have been used for Google News by pulling in any news source found by
Google's crawlers which happened to report on any specific issue, weighing the
rank of those sources the same way that regular search results used to be
weighed (i.e. those sources which are linked to more often (after filtering
out link farms) end up higher than fly-by-night operations which nobody links
to. This is not what Google News does though, instead is seems to either use a
whitelist or apply a blacklist to produce 'reputable news sources'. It is
there the bias starts, in the selection of which sources to link to.

------
JonathanCreek
If you type “google” it will try to find itself, go into infinite loop and
crash. Source: Internet

