
How is search so bad? A case study - Tenoke
https://svilentodorov.xyz/blog/bad-search/
======
rahulchhabra07
I have been thinking about the same problem since a few weeks. The real
problem with search engines is the fact that so many websites have hacked SEO
that there is no meritocracy left. Results are not sorted based on relevance
or quality but by SEO experts' efforts at making the search results favor
themselves. I can possibly not find anything deep enough about any topic by
searching on Google anymore. It's just surface-level knowledge that I get from
competing websites who just want to make money off pageviews.

It kills my curiosity and intent with fake knowledge and bad experience. I
need something better.

However, it will be interesting to figure the heuristics to deliver better
quality search results today. When Google started, it had a breakthrough
algorithm - to rank page results based on number of pages linking to it. Which
is completely meritocratic as long as people don't game for higher rankings.

A new breakthrough heuristic today will look something totally different, just
as meritocratic and possibly resistant to gaming.

~~~
zackees
The real reason why search is so bad is that Google is downranking the
internet.

I should know - I blew the whistle on the whole censorship regime and walked
950 pages to the DOJ and media outlets.

\--> zachvorhies.com <\--

What did I disclose? That Google was using a project called "Machine Learning
Faireness" to rerank the entire internet.

Part of this beast has to do with a secret Page Rank score that Google's army
of workers assign to many of the web pages on the internet.

If wikipedia contains cherry picked slander against a person, topic or website
then the raters are instructed to provide a low page rank score. This isn't
some conspiracy but something openly admitted by Google itself:

[https://static.googleusercontent.com/media/guidelines.raterh...](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)

See section 3.2 for the "Expertise, Authoritativeness and Trustworthiness"
score.

Despite the fact that I've had around 50 interviews and countless articles
written about my disclosure, my website zachvorhies.com doesn't show up on
Google's search index, even when using the exact url as a query! Yet bing and
duckduckgo return my URL just fine.

Don't listen to the people who say that's its some emergent behavior from bad
SEO. This deliberate sabotage of Google's own search engine in order to
achieve the political agenda of the controllers. The stock holders of Google
should band together in a class action lawsuit and sue the C-Level executives
of negligence.

If you want your internet search to be better then stop using Google search.
Other search engines don't have this problem: I'm looking at qwant, swisscows,
duckduckgo, bing and others.

~Z~

~~~
preommr
Google's search rankings are based on opinions held by other credible sources.
This isn't really blowing the whistle when, as you admitted, Google admits
this openly.

And maybe your site doesn't get ranked well because it's directly tied to
project veritas. I don't like being too political, especially on hn and on an
account tied to my real identity, but project veritas and it's associates
exhibit appalling behavior in duplicity and misdirection. I would hope that
trash like this does get pushed to the bottom.

~~~
leereeves
In a political context, "credible" is often a synonym for "agrees with me".
Anyone ranking "page quality" should be conscious of and try to avoid that,
and yet the word "bias" doesn't even appear in the linked guidelines for
Search Quality Raters.

Of course Google's own bias (and involvement in particular political
campaigns) is well known, and opposed to Project Veritas, so it's quite
possible that you are right and Google is downranking PV.

Would that be good? Well, that's an opinion that depends mostly on the bias of
the commentator.

~~~
JamesBarney
[https://en.wikipedia.org/wiki/Project_Veritas](https://en.wikipedia.org/wiki/Project_Veritas)

I doubt this affected search rankings but Project Veritas does have a ton of
credibility issues.

~~~
qbaqbaqba
And so does wikipedia.

------
avionicsguy
This should probably be a separate submission but why is search so bad
everywhere?

\- Confluence: Native search is horrible IME

\- Microsoft Help (Applications): .chm files Need I say more.

\- Microsoft Task Bar: Native search okay and then horrible beyond a few key
words and then ... BING :-(

\- Microsoft File Search: Even with full disk indexing (I turned it on) it
still takes 15-20 minutes to find all jpegs with an SSD. What's going on
there?

\- Adobe PDFs: Readers all versions. What? You mean you want to search for TWO
words. Sacrilege. Don't do it.

Seriously though with all the interview code tests bubble sort, quick sort,
bloom filters, etc. Why can't companies or even websites get this right?

And I agree with other commenters as far as Google, Bing, DDG, or other search
sites it's been going down hill but the speed of uselessness is picking up.

The other nagging problem (at least for me) is that explicit searches which
used to yield more relevant results now are front loaded with garbage. If I'm
looking for datasheet on an STM (ST Microsystems) Chip and I start search with
STM as of today STM is no longer relevant (it is, meaning it shows up after a
few pages). But wow it seems like the SEOs are winning but companies that use
this technique won't get my business.

~~~
jborichevskiy
Or MacOS Spotlight. Good lord. Most common occurrence: searching for Telegram,
an app I have open 24/7 and interact with dozens of times a day.

CMD+Space

"T": LaTeXIT.app (an app I have used fewer than a dozen times in two years)

"E": LaTeXIT.app

"L": Telegram.app

"E": Electrum.app (how on earth??)

"G": telemetry.app (an app which cannot even be run)

"RAM" : Telegram

Similar experience searching for most apps, files, and words. It's horrendous.

MacOS Mojave 10.14.6 on a MacBook Pro (Retina, 15-inch, Mid 2015)

------
arielweisberg
Google has definitely stopped being able to find the things I need.

Pasting stack traces and error messages. Needle in a haystack phrases from an
article or book. None of it works anymore.

Does this mean they are ripe for disruption or has search gotten harder?

~~~
tyingq
My guess is that suppressing spammy pages got too hard. So they applied some
kind of big hammer that has a high false positive rate. You're getting the
best of what's left.

Maybe also some quality decline in their gradual shift to less hand weighted
attributes and more ML.

~~~
basscomm
My guess is that Google et al are all hell-bent on not telling you that your
search returned zero results. They seem to go to great lengths to make sure
that your results page has _something_ on it by any means necessary,
including: searching for synonyms for words I searched for instead of the
specific words I chose, excluding words to increase the number of results
(even though the words they exclude are usually the most important to the
query), trying to figure out what it thinks I asked for instead of what I
actually asked for.

I further suppose a lot of that is that The Masses(tm) don't use Google like I
do. I put in key words for something I'm looking for. I suspect that The
Masses(tm) type in vague questions full of typos that search engines have to
try to parse into a meaningful search query. If you try to change your search
engine to caters to The Masses(tm), then you're necessarily going to annoy the
people that knew what they were doing, since the things that they knew how to
do don't work like they used to (see also: Google removing the + and -
operators).

~~~
plausible
> They seem to go to great lengths to make sure that your results page has
> something on it by any means necessary

You just described how YouTube's search has been working lately. When you type
in a somewhat obscure keyword - or any keyword, really - the search results
include not only the videos that match, but videos related to your search. And
searches related to your keywords. Sometimes it even shows you a part of the
"for you" section that belongs to the home page! The search results are so
cluttered now.

~~~
brownbat
Searching gibberish to try to get as few results as possible.

I got down to one with "qwerqnalkwea"

"AEWRLKJAFsdalkjas" returns nothing, but youtube helpfully replaces that
search with the likewise nonsensical "AEWR LKJAsdf lkj as" which is just full
of content.

------
cygned
In my opinion, Google is getting worse constantly, which boils down to
basically the following aspects for me:

1\. I don’t like the UI anymore. I preferred the condensed view, with more
information and less whitespace.

2\. Popping up some kind of menu when you return from a search results page
shifts down the rest of the items resulting in me clicking search links I am
not interested in.

3\. It tries to be smarter than me, which it fails in understanding what I am
searching for. And by “understanding” I basically mean to honor what I typed
and not replacing it with other words.

I try to use DDG more often but Google gives me the best results most of the
time if I put in more time.

~~~
mszcz
Yeah, number 3 really pisses me off recently. If I type in 3 words I would
like to search by those 3 words. What ends up happening is Google just decides
that it's too much of a hassle or that I've made a mistake and just searches
using 2. So now I have to input all the words in quotes so that it works like
it supposed to in the first place.

This functionallity literaly never helped me during search. Not once.

~~~
reportgunner
"try" "putting" "the" "words" "in" "respective" "quotes" "like" "this"

~~~
mszcz
That's what I'm doing, sorry it wasn't clear ;)

------
lettergram
I've been thinking about this for years[1]. The truth is, what Google solved
was parsing the search query, not identifying the best results. In fact,
Google is not incentivized to give you the best results, they are designed to
maximize their revenue, derived from getting you to view / click ads.

Google is not a search company, they are an advertising company. The more
searches you make, the more revenue they make. Their goal is to quickly and
often get you to search things. As long as you keep using their platform, the
more you search the better.

[1] [https://austingwalters.com/is-search-
solved/](https://austingwalters.com/is-search-solved/)

~~~
l0b0
Is it time for paid search engines? Make users vote with their wallets and pay
for the eternal arms race. Problem is, whoever is behind something like that
would have to start with an already sufficiently superior experience (or
_massive_ geek cred) to make people pay from the early days. Maybe going back
to manually curated results of a subset of topics would work? Or some Stack
Overflow-esque model of user powered metadata generation?

~~~
notriddle
Simpler idea:

Paid search engine that ranks sites based on how often the users click results
from that site (and didn't bounce, of course). The fact that it's paid
prevents sybil attacks (or, at least, turns a sybil attack into a roundabout
way of buying ads).

Of course, at this point, _you are now the product even_ though you paid. But
it's a tactic that worked for WoW for ages.

~~~
wgx
Google already includes clicks and bounces in its ranking factors.

------
ma2rten
This article is essentially just complaining that DDG and Google don't have
special parsing for reddit pages ("How come it doesn't know that thread didn't
get many upvotes?", "How come it thinks some change to the site's layout was
an update to the page?")

Maybe if you want to search reddit, the best search engine is the search bar
on reddit.com.

~~~
aldoushuxley001
everyone knows the best way to search reddit is via google. The search engine
bar in reddit is for optics only.

~~~
yusef555
But does anyone know why search on Reddit is broken? Perhaps intentionally? I
don't want to get tin foil hatty but perhaps more not readily apparent false
positives = more user clicks = more revenue via ad serving?

~~~
tsian2
I often wonder why some fairly large companies that rely heavily on their own
website don't seem to put more than a sole web developer worth of resources
into them. Reddit fits into that category for me (Reddit has 400 employees).

Initially I had the impression that search was hard to implement. However,
spending a work week figuring it out with ElasticSearch, Solr and Sphinx
changed my mind. Getting the solution to work with the scale of the website
would take more work, but all the know-how is there, and they could put a
whole team to the task for a month.

~~~
yusef555
I wouldn't say it's a trivial ask, but yeah, if you have 400 employees at
least assign some resources to get it right. Unless it's intentionally broken.
Facebook's prioritization but also randomization of the feed is a feature not
a bug.

------
nvarsj
The first time I realized that Google search was bad was when del.icio.us got
big. I was an avid user - and I stopped using Google except for basic things.
You could search tags on del.icio.us - and the results were incredibly good,
far better than Google, especially for niche areas.

I think, unfortunately, this kind of curated, social approach to search will
never be compatible with monetization by ads. I'm not quite sure how to make a
search engine profitable without significantly distorting its results. Maybe,
depressingly, Google is the best thing possible given the constraint of making
a profit?

------
ses1984
The worst thing for me is that I have become accustomed to search working a
certain way. If I put a word in the query, it had better fracking be in the
results. That's why I put it in the fracking query.

I guess whatever sauce Google applies to the query maybe works better
according to some metrics for some users, but it is a source of endless
frustration for me.

------
dmfdmf
The problem as I see it is that popularity ranking worked fine in the pre
Eternal September era for the web (~10 years ago?). I think it is safe to say
that most HN users skew toward searching for more technical, intellectual or
scientific topics and get frustrated by their searches getting swamped by
popular topics. What I'd like to see is a check box or slider bar to exclude
or adjust the weighting for popularity in a search. I don't need to see links
for the latest Taylor Swift breakup or what the Kardashians are up to that
appear in a technical search due to a randomly shared keyword. Often, the
topics I am searching for will never be popular and current search operates on
the assumption that it will.

A second problem is that now that Google likes to rudely assume to know what
you want, i.e. ignoring quotes and negation in search or even modifying
keywords, its even harder to find what you want especially if its not on the
first page or two of results. Because of this interference even changing your
search parameters doesn't change the results much and you see the essentially
the same links. What I'd like to see is a search engine that will do a delta
between say Google and Bing and drop the links common to the two services.
This might lead to uncovering the more esoteric or hidden links buried by the
assumptions of the algorithm.

Finally, a last problem that I see right now is the filter bubble effect. I
had to search for how to spell "kardashians" in the above paragraph. Now my
searches and ads for the next 2-3 weeks will be poisoned by articles or ads
about the Kardashians. Taking one for team to make my point, I suppose.

~~~
klingonopera
> _pre Eternal September era for the web (~10 years ago?)_

LOL, was that by chance the time that single from Green Day was released? ;)

["Eternal September or the September that never ended is Usenet slang for a
period beginning in September 1993"]:
[https://en.wikipedia.org/wiki/Eternal_September](https://en.wikipedia.org/wiki/Eternal_September)

EDIT: TIL Green Day's single has nothing to do with _that_ September. Huh. I
gave them more _nerdcred_ than they deserved...

------
overcast
Don't get me started on Outlook search. For a company that runs a global
search engine, the most prominent mail client in the world is absolutely shit
for searching.

~~~
amiga_500
Outlook is poor in every aspect.

------
james_impliu
I increasingly just use google to search sites I already know have better
content than the web at large, since it so often feels like it sucks for any
depth of information.

Want to get in touch with someone? name job site:linkedin.com. Want to find
how to solve a tech issue? "issue" site:stackoverflow.com, and so on.

Google's search of sites like this is pretty good (although the recency
working well would be really good... but perhaps impossible to solve well),
and often better than in site search. But that + very basic fact finding "how
much is a lb in kg", "where is restaurant X" are pretty much all I feel it's
good for. Then again, I guess it's not supposed to be an encyclopaedia (or it
can be, site:wikipedia.org!)

------
z3t4
Public information should not be filtered by one single private entity. We
need a distributed system with an open standard. Search should work more like
DNS... In order to get your web site indexed, you only have to publish your
search URL. There should be many index cache servers, so that your search URL
only get a hit when a search string expires...

------
ovx99
Using shopping engines is even worse. Google shopping and Amazon, I've been
having an incredibly difficult time finding products within a price range and
sorting it by price. Searching for items in quotes on Google Shopping often
returns all sorts of irrelevant results. In Amazon, the 'price low to high'
filter doesn't even seem to work most of the time and it includes sponsored
results way out of my price range in the middle of the results. Amazon also
seems to have removed any type of price range filter on the left sidebar.

------
bobosha
I think web information access needs a new paradigm, that needs to make
"search" itself irrelevant. Much like search replaced the taxonomy-based
browsing (Yahoo/"Portals" of the 90s).

I don't know what that is, but there needs to be paradigmatic change.

------
anuraj
Try using natural language phrases in search like "reddit on best phone to
buy" \- most search engines are NLP enabled and can give more relevant
results.

~~~
Tenoke
A big part of the problem is that it returns results from years ago, even when
I specify that I want only recent results. I tried a few different variations
and the results were still bad.

~~~
wickedOne
searching for

    
    
      best cell phone to buy site:reddit.com
    

and setting it to results from last month works fine...

~~~
Tenoke
Did you click on the results? I just checked that exact query and it's mostly
the same links - the first one is the one from 6 years ago, as with my
original search.

------
nothis
I'm a bit cynical with this but I believe a lot of the "super smart" AI tech
they no doubt run over all their search these days isn't actually that super
smart. If we actually handled metadata properly "last month" would be a
trivial, "dumb" thing to search for but apparently that's broken. Why?

~~~
kqr
You might be surprised at how many ways "last month" is encoded just in a
single data set. One of the biggest problems with search -- just about any
kind of search-- is the low quality of markup/metadata. If only data was
structured properly, we would barely need search in the first place!

~~~
nothis
Right! It's just... the internet isn't exactly in beta anymore. You'd think
there's a timestamp type value that just tells you the date of an article or
forum post that's linked. But no, we have to run a sophisticated algorithm to
search for it on the page – and fail.

Honestly, my second theory is that google knows _exactly_ when a given reddit
article was posted but doesn't trust the user to judge the relevancy of that
information. Which might even be reasonable in many, many cases. But it's also
annoying. I'm definitely seeing a trend towards "editorializing" search
results on Google, where often the first half of the page isn't even websites
anymore but some random info box and whatnot, and your search terms are
interpreted _very_ liberally, even when in quotes. It's one of those things
that is probably better for 99% of users/uses but super annoying when you
actually want precise results.

------
ronilan
_> This is a query for checking out what reddit thinks in regards to buying a
phone_

[reddit phone to buy]

What is astonishing, is that the most obvious part of the non-progress in
search over the last two decades, is just accepted as the starting point to
the way things are.

------
mellosouls
I don't agree with the premise of the article, although I accept that the
example given is clearly terrible.

I haven't noticed it before, is it a recent bug? It certainly seems a
significant one, but not representative of my experience using Google - which
I also acknowledge is skewed according to the data they have on you and SEO
gaming.

But generally - those constraints acknowledged - I _still_ find Google's
search to be one of the modern wonders of the world, still the go to, and -
yes - not perfect.

------
miket
The article hardly supports its conclusion with these cherry-picked examples;
however, the core reason these results don't meet the author's expectations is
that Google's AI does not understand the content of webpages well enough to
identify the publication date accurately (at least anywhere near as accurately
as a human can). Google's publication date is based on whether it found
changes to the HTML on its own crawl date (which is very noisy due to today's
dynamically generated website) or based on schema.org/microdata, which as
other commentators point it is game-able for purposes of SEO, or simply
missing on most sites.

As a contrast, take a look at how Diffbot, an AI system that understands the
content of the page by using computer vision and NLP techniques on it,
interprets the page in question:

[https://www.diffbot.com/testdrive/?url=https://www.reddit.co...](https://www.diffbot.com/testdrive/?url=https://www.reddit.com/r/berlin/comments/1mwxx3/best_place_to_by_a_cheap_new_smartphone/)

It can reliably extract the publication date on each post, without resorting
to using site-specific rules. (You can try it on other discussion threads and
article pages, that have a visible publication date).

------
l0b0
SEO is the new spam. We solved spam pretty well, but it was a very different
solution space to what is available for the web:

\- Spammers had basically two ways to verify their efficacy – they could
either sign up to every provider under the stars and test each email with each
of them individually, or they could use the absence of a signal as "proof" of
being caught by a filter. But neither of these are very efficient. An SEO
expert can simply wait for the search engine to detect their changes and
verify the result with two or three search engines quickly and automatically.

\- For practical purposes whether an email is spam is answered in a binary
form: either it ends up in your spam box or it does not. Removing spam-looking
things from search results entirely would be devastating for any site victim
of a false positive. And how do you implement the equivalent of a spam box in
a search engine in a useable way?

\- Spam filtering was implemented in different ways on every mail provider, so
the bar to entry was "randomized" and spammers would have to be quite careful
to pass the filters on a large subset of providers. ISPs and users _currently_
have nowhere near the resources to implement their own ranking rules, but
maybe this could be a solution in the mid to long term with massively cheaper
hardware.

------
rmetzler
I think a main issue is the vagueness of the query.

I was searching for "what phone should i buy in 2020 site:reddit.com" and
while a few results where from a year ago, most where from January 2020.

~~~
pbhjpbhj
Google have moved on from allowing exact searches to me made, sometimes you
have to think "if I didn't know what I actually wanted to see, and was trying
to make a question-form query about it, what would I write" and that gets
better results, IME, than something where you know the exact words you want
(which even with "" might not be in the link).

~~~
rmetzler
No, I didn't use quotes (exact match) in my query. This was only to express
delimiters. Sorry for the confusion.

------
fortran77
I just did a google search for "piano". Just the word "piano"

Only one link on the first page, the wikipedia entry for "piano" had anything
to do with pianos, (i.e., the instrument invented in Italy 300+ years ago that
has hammers, strings, and an iron frame).

~~~
CM30
What do you get when you search for that? Did a test right now, and apart from
the Wikipedia page, I get videos about piano music/pianos, shop pages for
buying pianos and local businesses that sell either pianos or piano lessons.

So I'm curious whether the issue is that there are too many shopping/business
related pages (which is fair, but at least those seem to be piano related), or
whether you're getting something completely different.

~~~
fortran77
The first three links after the ad were for the same "virtual piano" (not a
piano) on different websites.

See [https://imgur.com/a/cMC9wQH](https://imgur.com/a/cMC9wQH)

Then the wikipedia page, then a couple of "online" non-pianos, then a company
that happens to be called piano.io

[https://imgur.com/a/uRcyx84](https://imgur.com/a/uRcyx84)

Shopping pages are fine, if we'd get links to, say Steinway, Yamaha, and
Bosendorfer, or links to Lang Lang's home page, or something that has more to
do with _pianos_.

------
lqet
Well, how do you actually determine the age of a web page? Is it the post
date? How do you even find that out? Is it the last _comment_ post date? Is it
the last edit of the main post, or the last edit of a comment? How do you find
_this_ out automatically? Is it the last change the HTTP server responds with?
Is it the last time the entire page has been modified? If the page is built up
of multiple components like iframes, do _their_ post dates matter? Do ads
matter? If the page is dynamic, everything gets a few orders of magnitude more
complicated.

Point is, it is not a trivial task at all to automatically find out the time
that corresponds to the intuitive understanding of the "age" of a web page.

~~~
rwmj
It's the first time Google saw the page. Reddit posts have unique URLs and
Google scans popular sites very regularly (in fact is rumoured to have site-
specific optimizations).

~~~
undefined_user6
That can't be right if, according to OP, the first result was a reddit post
from six years ago, yet the date according to Google was Jan 11, 2020. So the
first time Google saw that page would likely have been the day it was
published.

------
jszymborski
Not sure if DDG patched this, but querying DDG w/o the month based tick
results in a result that'll point you to the correct subreddit for finding
which phone to buy in <current month> [0].

Although it's not using the "This Month" dropdown, doing the vanilla search
still gets you the "most correct" answer imho.

[0] [https://jszym.com/dl/imgs/20200119-ddg-
example.png](https://jszym.com/dl/imgs/20200119-ddg-example.png)

------
soheil
> At any rate, I got annoyed at this point (mentioning for those who couldn’t
> tell), so I switched to DuckDuckGo.

For those who might be misled like I used to be DuckDuckGo is just a proxy for
Bing.

~~~
reaperducer
_For those who might be misled like I used to be DuckDuckGo is just a proxy
for Bing._

Every time the topic of search comes up on HN, someone always jumps in and
says this.

Then there are a bunch of other people who jump in and say that Duck is much
more than that.

So, which is correct?

~~~
Kiro
[https://help.duckduckgo.com/duckduckgo-help-
pages/results/so...](https://help.duckduckgo.com/duckduckgo-help-
pages/results/sources/)

Interpret it as you wish. To me it sounds like they are using 400 sources and
their own crawler for the Instant Answers stuff but get all their "traditional
links in the search result" from Verizon(?) and Bing.

------
allovernow
I've been screaming about this for _years_ and only recently have people begun
agreeing with me - and I know exactly what the major problems are, and they
are synergistic:

1\. SEO has totally warped result rankings. Now instead of getting results
which naturally match my keywords because of content, I'm presented with
almost exclusively commercial websites which are trying to sell me something.
Gone are the days where you could search for technical terms and not be
bombarded by marketing websites.

2\. Google's AI is far too aggressive for technical searching. It is clear
that Google is using NLP to parse queries and substitute synonyms based on
some sort of BERT-like encoding. The problem is that a given word may have
synonyms that are actually orthogonal in meaning space. For example, if I
search for trunk, Google may return results for "boot" as in car trunk,
instead of anything related to SVN. Contrived example, sure - but here's where
the real problem is: Google's AI is regressing to the layman's mean. It is
effectively overfitting to Grandma's average search query. Think of it as the
endless summer of search...and since there's no way to usefully customize your
search now (can't give people too many options or they might get confused!),
you're stuck combing through unrelated results and it is increasingly
difficult to disambiguate your search query. Remember when advanced search
existed and typing in a question to search was terrible practice? That
shouldn't have changed - but as more and more non-technical people started
searching, Google (rightly, from a marketing perspective) seized the
opportunity aggressively.

3\. Primarily because of a combination of points 1 and 2 above, and the
endless summer of non-technical users, informative websites have all but
disappeared in search results, replaced by shitty SEO optimized blog spam and
commercial websites which offer high level summaries primarily to generate
traffic and sell you shit. Curious about how to repair your own roof in
detail? Well don't bother searching "roof repair" (and quotes seem to be
broken too btw) because the first two pages will be full of roof repair
company websites.

So what is the result? The portal to the greatest asset in the history of the
civilization, the internet, has gradually turned into a neutered,
commercialized corporate service where users are a product. It's tragic to see
all of that empowerment thrown away in the name of profit. As they say, if the
user doesn't see it, it isn't there, and for this reason Google is effectively
killing the internet.

I haven't even gotten into the demonstrated potential for search curation and
autocomplete abuse, where Google becomes an effective, centralized arbiter of
truth as the defacto portal to the internet, and how dangerous such a
concentrated power over society can be.

Google really was admirable when it wasn't evil - now I'm about convinced that
it needs to die.

~~~
zozbot234
Well, '"roof repair"' was never a good search for obvious reasons. But you'd
expect a search like '"how to" repair roof' to mostly filter out sites
providing roof-repair services - and if a search engine doesn't do that
properly (because it ignores the "how to" part as irrelevant even though it's
in quotes!) that's just broken.

~~~
drivebycomment
I tried "how to repair roof" and the first organic result is the featured
snippet of how to fix shingles, with bunch of youtube videos on various roof
repair methods next, with DIY repair sites. So I don't see anything broken
there ?

------
dbetteridge
Does anyone else no longer see the date at all in their google search results?

Drives me mad, Especially when I only want results from a certain year/month.

------
kops
A shoutout for duckduckgo here. I tried both qwant and duckduckgo about a
couple of years ago but the quality of search results forced me back to google
in a couple of days. But I gave it another shot a few weeks ago and I have
been pretty happy with the search results from duckduckgo and the fact that
the ads do not follow me around the web based on my searches is rather
pleasant. I must admit that google hasn't done anything bad to me but their
sheer size and scale is scary enough for me to look for alternatives.

Now back to the search quality, I wanted to find out when exactly world
economic forum 2020 is happening and google won hands down (search term =
world economic forum 2020 dates). But for now and for most of my day to day
search terms duckduckgo is doing okay. I know this won't last for long as they
also are looking at advertising dollars but I hope then someone else will
stand up to challenge duckduckgo+google combined.

------
scottlocklin
I agree google is shit, which is why I never use it. They're too busy being
woke to run a search engine which compares to what they were doing 10-15 years
ago. It's pretty obviously entirely that; smart people don't want to work in
dying Brezhnevian bureaucratic hellscapes.

Try Yandex for an example of a much smaller company doing a fine job at
search:

[https://yandex.com/search/?text=reddit%20phone%20to%20buy&lr...](https://yandex.com/search/?text=reddit%20phone%20to%20buy&lr=102589)

Produces exactly what OP was looking for. QED.

Qwant isn't bad either; don't remember if they piggyback off of other search
results:

[https://www.qwant.com/?q=reddit%20phone%20to%20buy&t=web](https://www.qwant.com/?q=reddit%20phone%20to%20buy&t=web)

~~~
klingonopera
IIRC, Qwant piggybacks (-ed?) off Bing.

------
fiatjaf
The point no one is making in these comments is: How is search so good? I've
tried to implement search on small websites and always failed miserably.
Results were always terrible no matter what library/database/indexer I used. I
cannot even imagine how one would proceed to implement search over the entire
internet.

I also have the experience that Google is much worse the some years ago, I'm
also frustrated by everything, but still, it does the job much better than
anyone else probably -- otherwise someone would have better results overall.

(Also, yes, I use DuckDuckGo and I don't think it's so much worse than Google,
which is good, because I used to think no one would ever be able to do be as
good as Google but today there are many competitors that come close.)

------
brownbat
I feel like Google has often turned strict commands into fuzzy searching,
maybe for a decade?

I never heard a clear explanation as to why, I just imagined that it was some
sort of A/B tested paternalism. Maybe most users really want fuzzy searches
when using the commands I use for a strict search.

~~~
drivebycomment
I think it's simply a human bias in action - people don't realize when their
queries benefit from the "fuzzy matching", and they only notice/remember when
they don't get what they want from search and then (often mistakenly) blame
fuzzy matching for it as that's what's visible to them.

------
sj4nz
When I'm ready to flip the table over about search results, I remember that
[https://millionshort.com/](https://millionshort.com/) exists and I give that
a whirl. Then when I'm sick of seeing links to sites that are highly-SEOed but
low-signal, Tampermonkey is there to give me a nice little [block] button to
remove them.

Fantasy-future: Mozilla could "widen out" their library and hire 20,000
librarians to curate the "New Web" in a non-wiki-format. If you paid each
librarian about $150K/annum, that's about $0.60/annum from 5B subscribers,
just for their salaries for the advertising-free library.

------
aaron695
Everyone can make a better search engine in the comments, but strangely,
barley no one is actually commenting on the actual case study. Search based
around time of publishing.

I guess another committee to paint the whole garden shed is easy, talking
about what paint to use is hard.

I suspect Reddit needs to add meta data of the publishing date.

It is complicated in a forum, is it publish date or last comment date. But
Google is still getting the basics wrong ie Every comment date is a year ago,
still less than a month in search.

It still doesn't help many time based issues (News always displays new
headlines on old articles. So you'll see Iran funeral crowd crush on 'old'
news articles in search) but it's a start.

------
JDiculous
Google search is absymal at generic questions like the one the author
mentioned about the best phones, or others like "Redux vs. Mobx" or "styled-
components vs emotion". I end up just searching Reddit (or sometimes
stackoverflow) not because Reddit is particularly good for technical
discussions, but because Google's default search literally just returns
blogspam from dev agencies.

Why hasn't a superior competitor emerged yet?

I get that the web is super broad and indexing the entire thing is an enormous
task. But perhaps there's room for more niche search engines (eg. focused on
tech) to stab away at this Google search monopoly.

------
syphilis2
It's worse than the author described. Google won't even sort by date
correctly. I ran into this problem years ago and it still exists. Example:
[https://www.google.com/search?q=where+to+buy+a+phone&hl=en&t...](https://www.google.com/search?q=where+to+buy+a+phone&hl=en&tbs=cdr:1,cd_min:1/1/2019,cd_max:1/1/2020,sbd:1&tbm=nws&source=lnt&sa=X&ved=0ahUKEwixnoaswJHnAhWNl-
AKHe0pAVAQpwUIHw&biw=442&bih=727&dpr=3)

------
cf512
As a point of clarification, "Past Month" is not the same thing as "Last
Month" or "Previous Month". "Past Month" in this context actually means the
past 30 days from today. It's a really subtle and confusing nuance in English.
In any case, Google Search having results from 5 days ago is accurate when
filtering for Past Month. Google (being a global resource) really should
reword "Past Month" to "Past 30 Days" to eliminate any confusion.

~~~
radicalriddler
But still, none of the reddit posts he referenced are from the past month.
They say things like Jan 11, 2020; which would be correct, but the actual post
on Reddit is 6 years ago.

Whether this is Google or Reddit causing this is another issue.

------
whoisjuan
Isn’t last month supposed to be, last month chronologically speaking instead
of calendar month? If I search “last month” today I would expect to get
results from Dec 19,2019 and Jan 19,2020.

------
Razengan
Why can't every web server just remember (index) all the content that it
serves?

Much like resolving DNS queries, I could then ask every server near me for
specific terms. If they serve that content, they'll return a list of links
containing those search terms.

We could have different apps with different algorithms to sort the bare
results according to the criteria that's most relevant to each individual
user.

------
anonu
We need a taxonomy. In the old days finding links was done by going to a
directory: dmoz or Yahoo.

I'm not saying we regress back to the past... I'm sure there is some hidden
underlying directory structure in web search today. What I'm suggesting is
making it more accessible to users as a way of navigating the web for good
content.

------
lordnacho
I find I agree with the article, but mainly on queries that are purchase
related, or those that overlap with some kind of business. If I'm looking for
some coding question, usually the right SO question comes up on the first
page.

Makes sense that it is so. Clearly there's incentives to show up first on a
"where to buy phone" search. The actual answer without vested interests has
nobody to pay for it. I bet there's some street in Berlin where there's a
number of phone shops, but without coordination (eg a shopping centre) there's
no way any of them will tell you that the selection is best if you just show
up somewhere on that street.

Also on a deeper note, I fear the internet is being flooded with terrible
authority sites. Superficial articles that sound right and have lots of
affilicate links. But the incentive is not to be informative and correct,
rather to look right to people who don't know better - that's why they're
there! - in order to funnel them to certain shops. I can just imagine there's
an anti-vaxx site somewhere that purports to have real evidence and sells
vaginal anti-cancer eggs.

------
Grimm1
This is something I'm currently focused on solving actually. Currently on Show
a ways down is the alpha but the basic gist is a search engine focused on
discovery while maintaining topical relevance. We thoroughly agree that search
results are kind of garbage on major engines at the moment.

------
jsilence
Maybe it is time for s personal meta search aggregator with indexing proxy.
One that delegates the query to reddit/SO/github-lab search and filters the
results down to those that actually match your query. The proxy indexes all
the pages you surf over and includes those in the search.

------
carusooneliner
From the author's search result and a quick test I ran, it appears that the
specific problem here is Google doesn't seem to "understand" Reddit. If it
did, search results would be based on relevancy of comments on the Reddit
page, posted in the specified timeframe.

------
ropiwqefjnpoa
All search is being devoured by SEO

~~~
arkitaip
This. For ever engineer that somehow works with search quality, there are
thousands of experts who are working to subvert SERPs in some fashion. Pretty
sure that if we gained true knowledge about the challenges that search faces
due to abuse, it would be like facing one of Lovecraft's cosmic horrors.

------
drivebycomment
Yawn. Anecdote doesn't make it a data. Search is now a sufficiently
complicated and difficult problem that can't be represented by a few queries.
You need qualitative analysis as well as quantitative to draw meaningful
comparison.

------
mariushn
Any suggestions on how could one start an open source search engine business?
Being able to index the whole internet as an MVP and quickly serve queries on
top of this data has a major initial and operating cost.

------
anotheryou
For power users it is. Probably we are just too niche to be worth supporting

------
jobseeker990
I always wondered why a search engine can't use SEO tactics as a kind of anti-
signal. What comes to the top of a google search if you filter out anyone
gaming SEO?

~~~
CM30
Likely because a lot of SEO tactics are not necessarily things that hurt the
quality of the site for the user. Using the right heading tags, image alt
texts, meta data/schema markup, a good title and meta description etc are all
SEO tactics, and they're also all things that help the user experience.

Similarly, a lot of inbound/offsite SEO tactics are theoretically things that
help the user as well. Providing content people want to link to, getting
authorities to link to said relevant content, etc are all things a user would
appreciate.

Using SEO tactics as an anti signal would boost poorly designed sites,
inaccessible sites, etc. What really needs to be done is something that
filters out sites creating thin content just for the purposes of getting
traffic, and that's harder to filter out.

------
discobot
Google filters the results on date of web page update, not date of the post
reddit post creation, which is the correct behaviour.

And you are just making unjustified bold statements

------
zejn
Well, searching for "phone to buy" isn't going to do much good, is it.

"Reddit best 2020 phone" works for me much better, though ...

User friendly it is? No.

------
timwaagh
Should not be hard to solve if they would have an algorithm to search Reddit
built in. But perhaps they do not, in which case it gets much harder.

------
peterwwillis
The user probably would be less pissed off if they had no way to filter by
time. An interesting lesson in UX.

------
3xblah
What was the query the author used? Did he try something like this

"[https://www.google.com/search?sitesearch=reddit.com&q=best+p...](https://www.google.com/search?sitesearch=reddit.com&q=best+place+to+buy+a+cheap+new+smartphone&num=100")

~~~
fluffy87
Watch out with doing more complicated searches like extending those to a date
range - they get people “banned” from using google search.

~~~
clarry
Even browsing beyond the first few pages of results can end you up having to
play with captchas.

Also it's funny how they lie about having about a quadrillion results, and
then when you're on page 12, suddenly that's all the 150 results they have,
sorry.

~~~
3xblah
That quadrillion results bit is like a relic from a bygone era, when people
actually cared.

I have not hit captchas very often. Even IP bans only last hours. It is
annoying, but low risk. I do searching from shell prompt, never browser.

What always amazed me about Google is that they are not willing to let users
to skip pages 1-11 and immediately jump to, say, page 12.

Sometimes queries are non-commercial and there are no ads. Still, jumping
straight to page 12 can trigger a captcha.

I wrote a script that reverses or randomises the order of Google results, as
an experiment.

------
asdff
What kind of cluster do I need to just grep the web?

------
nova22033
_As for the Case Study part, and me saying this isn’t simply a rant - I lied,
hence the quotation marks in the title_

So it's NOT a case study...

~~~
Tenoke
For what is worth, I'm not sure why they changed the title here. The flippancy
in my original title makes the level of diligence a bit clearer from the get
go.

------
failuser
A person using google search is not google’s customer, there is no money is
building a good search engine itself.

------
gglon
[https://yippy.com/](https://yippy.com/)

------
contingencies
_How is search so bad?_

My broad take is that previously search worked (Altavista era through early-
mid Google) because it referenced organic links put in place by real humans
and keywords, plus basic metadata like physical location of servers, freshness
of content, frequency of update, metadata behind domains, etc.

Since the mid 1990s that has increasingly been gamed heavily, PageRank style
approaches have come and sort of gone, and a vast majority of content accessed
by consumers has moved to one of a small number of platforms or walled
gardens, often mobile applications. I don't know for sure, but I'd assume with
confidence that the majority of result inclusion decisions made by Google are
now based on rejection blacklists, 'known good' safe hits and effectively
minimizing anomalous results above the fold. Simultaneously, the internet has
become an international place and the bar has been raised for new entrants
such that an incapacity to return meaningful results in multiple languages
bars a search engine from any significant market position. A huge percentage
of results are either Wikipedia/reference pages, local news or Q&A sites.
Further, huge amounts of what is out there is behind Cloudflare or similar
firewalls which will probably frustrate new and emerging spiders.

The existing monopolies, having some established capacity and reputation in
this regard, may have become somewhat entrenched and lazy, and do not care
enough about improvement. They are literally able to sail happily on market
inertia, while generating ridiculous advertising revenues. In China we have
Baidu, and in most rest of the world, Google.

Who will bring about a new search engine? Greg Lindahl
[https://news.ycombinator.com/user?id=greglindahl](https://news.ycombinator.com/user?id=greglindahl)
who formerly made Blekko is apparently working on another one.

I once wrote a small one (~2001) which was based upon the concept of
multilingual semantic indices (a sort of non-rigorously obtained language-
neutral epistemology was the core database). I still think this would be a
meaningful approach to follow, since so much is lost in translation,
particularly around current events. One problem with evolving public utilities
in this area are that such approaches border on open source intelligence (OSI)
and most people with linguistic or computational chops in that area leave
academia and get eaten up by the military industrial complex or Google.

Now we have [https://commoncrawl.org/the-data/get-
started/](https://commoncrawl.org/the-data/get-started/) which makes
reasonable quality sample crawl data super-available. Now "all" we need is
people to hack on algorithms and a means to commercialize them as alternatives
to the status quo.

------
thoughtstheseus
The author is not using a "search" engine, Google is a recommendation engine.

------
sean_pedersen
Cuz it is goddamn hard!?

------
LastZactionHero
You're located in Berlin. It found a page on Reddit about buying phones in
Berlin.

Not saying you're wrong about the dates but... I dunno... seems like an odd
query. "Phone to buy?" And why not just search Reddit?

This site feels like we're just complaining about nothing these days.
(Downvote away!)

~~~
netsharc
I'm downvoting you because your response is totally useless.

He wasn't looking for where to buy a phone in Berlin, and besides, it's an old
reddit thread.

------
jillesvangurp
It's a hard problem because what is relevant is inherently subjective and
context specific and only a minority of users uses the advanced search
functionality so it is also not a big priority to solve it. Both Google and
Duck Duck Go optimize for the simple use case where there's a bit of user
context and some short query that the user typed. That's what needs to work
well. For that Google is still pretty good. I try duck duck go once in a while
but it's just not good enough for me right now. And of course when Google
fails me, that's probably also a hard case for Duck Duck Go.

The other problem is that websites provide very inconsistent meta-data, and
worse, are actively trying to game the system by abusing that metadata. So,
things like timestamps are not standardized at all (well, a little bit via
things like microformats). So recency of data is important as one of many
relevance signals but not necessarily super accurate. And given that it's a
relevance signal, you have people doing SEO trying to game that as well.

Anyway, Hacker News could also do with some search improvements to its
ranking. It always pulls up some ancient article as the most relevant thing as
opposed as the article from last week that I remembered and wanted to find
back. I consult people on building search engines with Elasticsearch, so I
have some idea what I'm talking about. It seems the ranking is basically "sort
by points". Probably not that hard to fix that with some additional ranking
signals. I just searched for "search" expecting to find this article near the
top 5 (because it is recent and has search in the title). Nope; not a thing.

~~~
zozbot234
Hacker News doesn't even _have_ search at the moment. It just redirects you to
some crappy external site.

~~~
detaro
It giving the YC startup running the search backend some visibility doesn't
mean it somehow "doesn't even have seaarch".

~~~
jillesvangurp
Exactly, there's a search box on the web site. How it's implemented is an
implementation detail. Given that it happens to be something by a company
(Algolia) selling this as a SAAS solution, I don't think this is a great
advertisement for them either.

~~~
Kiro
Algolia is a YC company.

