Hacker News new | past | comments | ask | show | jobs | submit login
What Google search isn’t showing you (newyorker.com)
205 points by the_decider on March 11, 2022 | hide | past | favorite | 181 comments



I'm glad they brought up Marginalia; it's one of my favorite engines.

Teclis[0] and Wiby[1] are similar contenders in the non-commercial search space.

[0]: http://teclis.com/

[1]: https://wiby.me/

What the article mentioned about the influence of PageRank definitely rings true. An interesting variation is used by the Secret Search Engine Lab's CashRank algorithm[2].

[2]: http://www.secretsearchenginelabs.com/tech/cashrank.php

I listed a bunch of engines with their own indexes, FWIW:

https://seirdy.one/2021/03/10/search-engines-with-own-indexe...


So I tried these search engines. They're fun to try but kinda remind me of search before Google.

I'm frustrated with Google search results lately, they don't reach far enough back and are diluted with all sorts of crud.

But these alternative search engines kinda just remind me of why people started using Google. When I use them, I don't get spam, but I don't really get things I'm interested in either. It's like they don't understand what I'm really interested in and I either get no hits, or get hits on things that are just completely unrelated. It's like old-school chat software or something, complete with all the computer misunderstandings and whatnot.

Don't get me wrong, I'd love to see a lot of competition in search. I use DDG a lot. But I'm surprised at how positive people's comments are about these alt search engines, because to me they have problems on the opposite end of the spectrum. To me, what Google returns is a pretty good understanding of what I want, but corrupted by manipulative spam; the other ones generally return no spam, but also not at all what I want.


I'm trying https://kagi.com/ at the moment, was mentioned here a few months back when it was still in beta and I finally remembered to switch one of my browsers to try it.

So far, pretty happy with it. Before that I'd been using ddg but found myself so frequently !g that it was almost pointless.

DDG feels absolutely rubbish at local searches, but I may have just lost patience with it.

To be honest, I don't think DDG has a future purely because of the name. No way I'm telling non-techy friends that as I'll just get "what? Ducky go? Duck what? Are you being serious or is this a joke?"


Being able to make rank adjustments is an absolute GAME CHANGER. I didn't realise just how terrible my search results were until I could have granular control over the poor quality sites which kept appearing in my Google results. It's so clear to me now that Google is prioritising advertising heavily over user experience. The recent discussion about how such a large number of Google searches include "reddit" should serve as a warning for Google to swing that pendulum back in the direction of the user experience, or they will lose people like me. I'll be paying for Kagi when it's out of beta.


>To be honest, I don't think DDG has a future purely because of the name. No way I'm telling non-techy friends that as I'll just get "what? Ducky go? Duck what? Are you being serious or is this a joke?"

Only as a side note/JFYI, naming something, particularly if aimed to be international/multilingual, is particularly tricky, duckduckgo may well sound funny to your friends, but - as an example - kagi (in Italy) would be pronounced the same as "cagi" which is a reknown historical maker of men underwear, to the point that "a cagi" is sometimes used as a synonym to "a tank top", and surely it would make some people think it as a joke.


Or worse, it could be pronounced as "Caghi" ("you shit").


To solve the mystery, Kagi is pronounced kah-gee (and is Japanese for "key")


I’m delighted with Kagi. It finds the regulars I want (StackOverflow, Github, Wikipedia) just as well as Google, but is less infested with SEO spam and has a better selection of indie sites. Images and maps aren’t really up to par yet, but I mostly use OSM for maps anyway. I’ll be very happy to pay $10-$15 a month when Kagi start charging.


Anything that says sign up for beta to get me to provide an email and contact info, I and a lot of other people are not going to use.


It's a beta now but it'll become a paid search. That you aren't willing to pay for search means it isn't very important to you.


> That you aren't willing to pay for search means it isn't very important to you.

Or that they don't have money to throw away on things that aren't things necessary to live. Your statement is only true if the person you respond to is quite wealthy, which is a leap.


The target audience is professionals, where search helps them make money, which is necessary to live.


If the choice is between eating or good search, which do you think someone will pick? Is that because they don't think it's important?


I don’t know where you live or what your circumstances are, but $10-15/month is less than Netflix or Spotify. It’s less than YouTube Premium. All those things are essentially you paying for content in a way that results in you being less advertised to. It’s not a lot of money.


Wait, why is being 'quite wealthy' necessary? Their FAQ says 10$ on the low end, 20-30$ for an unlimited offering. People of varying economic positions spend that monthly on a myriad of non-essential services. I feel like you're also playing a semantic game with 'importance' since in another comment you immediately jump to a starving person trying to choose between a premium search engine and food. We're more or less in a typical HN thread about product barrier to entry here and it feels like you're on a class crusade to get equal search results for all. Not the same conversation.


Totally fair, but why not just provide a throwaway email if you're worried about spam or privacy? Is it the principle? I've never had a principle problem with having a login to a service. I would do so with DDG, for example, if I could customise my results better.


I am thoroughly sick of registering and logging in for services and will avoid doing so whenever possible.


When it comes to these older sites, it's hard for an automated engine to discover what they're about because (1) they lack the kind of structured description that's far more common on modern sites - including, to some limited extent, spammy ones (although extensive structured descriptions would nonetheless tend to favor legitimate content) - and that powers "smart" suggestions in search results. Also, (2) the web directories that would've provided an accurate description back when those sites were current are now dead. What we'd need to improve the Web search ecosystem is for "non-commercial", hobbyist sites to work on addressing both of these problems.


HTML has had metadata tags for ever, it's just that they quickly stopped being used by search engines were so inaccurate and prone to abuse. Even now, the heavy presence of these types of tags is arguably a marker that a website is really interested in its google ranking, and probably fairly spammy.

Any sort of description or tagging or keywords or genre description needs third party vetting to be of any use what so ever. It's simply too profitable to misrepresent your websites for it to be any other way.


Those metadata tags were just a simple textual description and a bunch of keywords with no reference to any controlled vocabulary. This is what made them so easy to abuse. Modern schema-based structured data is vastly different, and with a bit of human supervision (that's the "third party vetting") it's feasible to tell when the site is lying. (Of course, low-quality bulk content can also be given an accurate description. But this is good for users, who can then more easily filter out that sort of content.)

One could even let this vetting happen in decentralized fashion, by extending Web Annotation standards to allow for claims of the sort "this page/site includes accurate/inaccurate structured content."


The thing is "a bit of human supervision" is difficult on a scale of ten thousand Wikipedias. It pretty much needs to be done completely automatically.


I just hope we can get a trust-based network combined with search. So if I trust some friends and search "best toaster", and one of my trusted friends has given a very high review score to a toaster, then I get that one as the top search result.

Extend this to online communities, and you can ask "what laptop would HN recommend for Linux?", etc.

Of course, there's a privacy issue to solve, but the functionality could be very useful compared to the crappy Google search results for commercial products.


I'm having similar idea but with less user involve: a browser extension that extract keywords from the websites you actually visited, and form a p2p database with your followers. If you see spam / undesired ads in search result, you can rate down the content, and the system auto reduce the weighing from the peer providing that history.

To avoid sharing sensitive page / habit, maybe let the user review in batch and confirm before sharing out the list.


>But these alternative search engines kinda just remind me of why people started using Google. When I use them, I don't get spam

In the past few months at least half of my Google searches have had spam in the top 3 results. Literally malware domains that community-made filter lists are aware of but somehow Google chooses to share anyway.


Yes! I was trying to find an article on polish objections to Nord Stream 2 a week or two ago and couldn't find anything. Tried a million search combinations, tried date restricting but it all favotrd recency. Ended up finding it as a footnote on Wikipedia


With marginalia I think that is by design. They intentionally do not index any websites that are deemed "too modern", which includes most things one might be searching for. It does seem to be intended more for exploring unusual places.


It look like good, but perhaps should have some more options (that can be specified as a part of the search query text), and documenting these features. Some possibilities would include filters (by file format, domain name, scheme, etc), sort order, excluding, etc.

There are two menus for options ("Popular Sites", "Blogocentric Eigenvector", "Both Algorithms", "Experimental", "Allow JS", "Deny JS", "Require JS"), but does not explain them very well. (For example, I might want the search engine to not execute any scripts in web pages to determine their text, but if the text works when scripts are disabled that it can still be included in the search results (even if the web page has scripts, as long as those web pages work correctly even when scripts are disabled).)

Also, they have some documentation using Gemini format. I have a Gemini viewer in my computer, but it won't use it because of the "Content-disposition" response header.


I don't know, why i wasn't aware of these search engine, i'm having a blast discovering personal website, this is amazing.


Honestly kinda crazy the publicity my tiny project is getting for what it is.


You deserve every bit of it, and then some. The scope of Marginalia may be small, but it is one of the best examples of making the non-commercial web accessible to a much larger audience. I hope it inspires others to tackle similar projects.

I'll be able to support the project starting a few weeks from now, and would love some non-Patreon options.


What is a good alternative to Patreon? I haven't really looked into it too deeply.

They do take a fairly steep cut, and that's not even considering that PayPal also wants their pound of flesh on top of that :-/


Ko-fi and Liberapay are platforms that don't take fees. Buy me a Coffee is a popular choice for one-off donations.


Why not set up your own Stripe?


list is great -- & knew you'd done your homework when saw Marginalia - also a fave

run search startup, Breeze, so have a similar list and I was like damn :)


My feedback:

I think what you're doing with Breeze seems interesting, but the value-add of making it a commercial offering isn't clear; what does it offer that anyone else can't easily replicate with Google Custom Search? I'm not saying that there isn't a value add, I'm only saying that it isn't obvious as a user.

Something has to be a scarce resource; my guess is that the resource here is "labor" in building the CSE parameters and finding the sites to add to collections. Perhaps the effort that went into this should be emphasized.

Are there plans to move things server-side? Making client-side requests to Google has privacy implications.


tackling in parts / next replies


-- Google / privacy / alternative proxies --

tl;dr -- working on improving client-side privacy protection & there's couple of other options that may permit using Google with a slight bit of user config; premium version is all server-side since proxied via Bing, Gigablast &/or our index; there's an alt premium option that would basically proxy google via a cloud browser, say browserling or KASM or similar

1. The client-side will be set to no personalized ads -- just found out about a fairly hidden setting that permits that -- should be changed / live later today

2. Google's API precludes making server-side unless user configs an account which we can then drop in, since limited to 10K queries / day to do anything meaningful custom or full web; we're planning to do that as blog post as intermediate alt

3. We've started to instrument if client-side calls sidestep any privacy -- those results / protections are necessarily limited, however, we can perhaps uncover some things if anything in their client-side code that is privacy revealing & possibly mitigate some

4. premium is server-side with proxy to {ahrefs*, Bing, Gigablast, etc.} OR our Breeze index -- we scrape inventory-sensitive / time-sensitive sites, e.g., used car dealer pages

5. given enough ad or other revenue, we could alternatively give everyone a Bing proxy like DDG or other services -- bit too bootstrapped to do that out of the gate, plus Bing has more constraints on custom search, so that's a mixed bag of outcomes

6. another approach, also necessarily premium, would be to proxy through a service like browserling for Google searches, since the API doesn't permit it at scale

7. premium also includes alerts, especially for things like say car dealer pages that are more time sensitive and harder to config than what free google alerts do


can DM some examples of CSE config on easy to hard

if added anything else, it's that we're positioning Breeze more as a search client + search engine, e.g., we use Google, Bing, etc. for webw-wide (search client) whereas we scrape car dealer pages for real-time alerts of inventory (search engine)

that same search client philosophy also why we're adding a low-code query builder so that anyone can build a really extended query, aka, custom search engine, and either keep private for their use or share with community, since we can't possibly build all the CSEs ourselves

in that sense, our long-term trajectory is about building what amounts to a deep reddit, where people can search sites / topics of interest in very direct ways that are way more substantive than wrestling Bing / Google

that's also why we're shifting away from /topics at the top of our site to integrating the branches / filters / custom queries directly into the search experience, e.g., blogs is first one we've done that with that's not easily available elsewhere, and podcasts is likely next.


-- custom / topic search, or what we call branches --

tl;dr -- free version makes it easy for anyone to build, use & share custom searches; premium version is zero ads + better web alerts + easier access to alt web indexes

1. average user unlikely to go through config of custom search

2. even if they did, it's so poorly documented & finicky, they'd be unlikely to achieve comparable outcome

3. Bing CSE is even worse -- requires Azure account, limited to 400 up/down boosts, etc.

4. there's other technical reasons a user might not do more than a couple, along with the sheer scale issues that also mentioned

5. we're refactoring how the topic searches are experienced and making the topics, which we call branches, a more natural part of search experience

6. "blogs" is the first one to be done that way -- you can filter to just blogs after searching for X - any other branches / topics will be added in that way

7. e.g., podcasts & RSS feeds are coming out soon, along with more traditional filters such as shopping

8. that approach also makes it easy for us to expose what we're calling a low-code builder to let anyone build a custom search and either share / make public or keep private

9. that includes all possible filters - advanced keyword combos, site inclusion / exclusion, URL patterns, schema structure, etc.

10. premium for users includes zero ads, alerts, and some other features that are mix of TBD or too early to build

11. premium for teams includes similar things, along with the ability to config dashboards of searches, e.g., an HR dashboard of relevant custom searches, etc.

12. we're navigating that labor balance -- e.g., the blogs filter is fairly basic atm, whereas filtering college scholarships was a bit more nuanced to make it work


anyway, hope that helps / iterating quickly


Really good stuff. You might want to explain some of this on the website.


it'd be nice if these could be worked into SearX


I notice this especially in Google Maps. Sometimes I want to search for things like "cafe" or "hotel" and get a list of all the actual matches nearby. Even for buildings that show up on the map itself when you zoom in. But no, google maps search has to get all clever, hide results, and only show "relevant" results. It's super annoying when I search "hotels" and all I get is the equivalent of an agoda.com or booking.com search, when I wanted to actually see all hotel buildings, including offline ones.


That's because the "relevant" results pay Google to be "relevant." It's definitely getting worse over time.


If relevant only was relevant. It happens so often to me that I search for something like "restaurant" and get top results 150 kilometers away.


I don't think that's on Google.

The hotel industry is super heavily targeted with SEO spam. They're almost on par with online casinos in how much effort they put into manipulating search engines.


To be clear, I'm talking about the places that are visibly on the map. You can scroll around and find cafes and hotels that simply don't get returned the moment you send a search query.

Places that are visibly there but not returned in search result.. It's the most bizarre thing.


But it is Google that sets the rules for how and what is sorted. SEO is downstream


No I mean like virtually every document pertaining to hotels is search engine spam. It's very hard to actually find documents that aren't in this category. The hotel booking industry is extremely shady and evidently has pretty deep pockets to be able to afford literally registering hundreds if not thousands of domain names to attempt to manipulate search engines.


> Danny Sullivan, Google’s “public liaison for search,” told me that people using Google to find Reddit threads is actually evidence that search is working the way it should.

Cherry-picked quotes aside, this part of the article really did make me roll my eyes. Users self-censoring searches to a _single website_ because the rest of Google’s results are so unusable is somehow a feature, not a bug.

The same way that Google improperly handling quoted searches and returning pages _without that exact string_ is somehow a feature, not a bug.


For easier access for discussion, the full paragraph is: "Danny Sullivan, Google’s “public liaison for search,” told me that people using Google to find Reddit threads is actually evidence that search is working the way it should. Users on the whole have become passive, relying on Google to anticipate their desires. If they wanted, they could refine their queries, limiting results by, say, price point (“toaster $40 . . . $100”) or by listing certain terms to exclude (“ ‘toaster’ NOT ‘oven’ ”). As machine-learning algorithms have grown more pervasive, we’ve lost some of the fluency with search that older Internet adopters may have learned in a high-school Boolean tutorial. “There’s a shift now where, if you don’t find what you’re looking for, you blame the search engine,” Sullivan said. At the same time, he admitted that many users have a desire for “more noncommercial information, more community-based information.”"

Maybe it's just me, but I thought the middle section of the paragraph wasn't relevant to the first sentence. I speculate, with charitable interpretations, that the order of events was:

1) The reporter asked Sullivan whether the frequent use of "Reddit" in search terms indicated a problem.

2) Sullivan asserted that Google is useful for searching for results restricted to a domain, then shifted to say that Google still gives relevant results if you use Boolean search terms.

3) Potentially after a follow-up question, Sullivan conceded that much of the results that Google returns is commercial instead of community-made.

From a writing perspective, I found it unclear whether the middle section that starts with "Users on the whole have become passive, relying on Google to anticipate their desires" was analysis by the author or part of Sullivan's response. I interpreted the sentences as part of Sullivan's response from the context, though it would have been clearer if the writing more explicitly indicated whether this was part of Sullivan's reported speech.


> Users on the whole have become passive, relying on Google to anticipate their desires

No, Google made users passive to profit from controlling limited choices presented to them.

> I interpreted the sentences as part of Sullivan's response from the context

Looks like it to me, too. It's Google's propaganda for "we made you need us".


pretty much -- must users won't bother to learn search operators for same reason they didn't learn LaTeX and will resist learning markdown. sure it works and lot of work to recall something hard to master only use every so often.

it's the layperson's version of recalling regex if only need it every few weeks -- lots of cognitive load for minimal return, easier to ask friends or reddit.

yes, search operators seem intuitive to perhaps the average HN reader or SO coder, etc. That's a small set of people compared to 7B+ people attempting to recall oh, right ~term adds synonyms -- easy for most people here, mostly useless or not worth remembering for most of the world.


Most users weren't learning search operators before Google came along either.

Google's goal is to make them ultimately irrelevant and just give you what you're looking for, meeting the user halfway on the search query. That may not be possible, but it's the right mindset to have for solving the problem... Making the human talk like a computer is a UI failure.


we're attempting to tackle some of that at Breeze and it's hard / challenging / etc -- it's not sufficient to make it technically easy to filter results by topic, the user also has to rethink it's even possible to filter results beyond whatever default list that's returned. Giving users a UI that facilitates that is equally as hard as solving the technical aspect of making it easy for users to tune results and spend as much time there as coding.

we ran smack into that with a special topic we released this week, Ladypedia, where we ended up added a dial to filter out male results on a sliding scale for Wikipedia results. Doing it by default was too strong, and so we ended up selecting for female gender and then letting user dial in how many male-related pages to filter out.

1. Breeze -> https://breezethat.com/ 2. Ladypedia -> https://breezethat.com/p/ladypedia


> The same way that Google improperly handling quoted searches and returning pages _without that exact string_ is somehow a feature, not a bug.

Couldn't find it now but that was refuted by Danny Sullivan (I think) in an HN thread about the Google search results quality. They gave a pretty convincing explanation of why that may happen (spoiler: it has something to do with the tokenization of words on the website) and I, for one, believed them.

*Edit:* Here it is: https://news.ycombinator.com/item?id=30356382


Yeah, I remember that – he basically said that the word might appear on the page but not in the results description. And that's BS, because it used to be that the description of every result contained at least one bolded token included in the search query.

If this is no longer possible, it's only because they're cheaping out on hardware and not actually indexing as much of the page as they could (no way to retrieve the matching token because it's not saved, but they can still match on it). This probably also leads to diluted results.


That's just another proof that google doesn't care about it's users.

If they quote a search term they expect to find the exact term visible on the website. Not in alt text, not in invisible text, not any meta data. Quotes means this exact phrase visible on the website.


I see that term when I look at the site.

Of course, that's because my user agent renders alt text. I'm glad Google search results match to things I can see on the page.


Doesn't the comment right below the one you linked successfully refute Danny Sullivan's claim?


The one by drawfloat?

Looking at the source of the cached page,

https://webcache.googleusercontent.com/search?q=cache:sRJX_e...

It contains

       <a href="/quotes/tag/don-t-give-up-quotes">don-t-give-up-quotes</a>,
       <a href="/quotes/tag/don-t-give-up-the-fight">don-t-give-up-the-fight</a>,
After you eliminate HTML, that becomes "don-t-give-up-quotes, don-t-give-up-the-fight" and since punctuation is stripped, that matches.

Full disclosure I work at Google, but not on Search.


Google messed up the internet because they don't allow installing other websites' searches easily.

We had an opensearchdescription.xml for literally most of the websites, and somehow Browsers (including Firefox and Safari) managed to mess that up.

I wish there was an easy browser extension that just persists searches correctly, and doesn't forget about them the next time I clear my browser cache.

Guess I'll have to implement it in my own one again :-/

Google is so full of content farms these days, it's ridiculous. And all of them are ranked higher than the sources because they use google ads on their pages. Just search for a quote that you read here on HN or on reddit or on SO...and you'll find hundreds of them ranked higher than the source.

Maybe someone should build a search engine that downranks all websites that have google analytics and google ads? Would certainly be an interesting experiment.


pretty much what we're up to at Breeze. we started out with topic searches, added web search, and are now blending the two. some examples:

1. web, blog tab in search results is ours, and our first iteration of making it easier to dial in topics directly from web results -> https://breezethat.com/

2. early pre-web-wide topics before we added web search, https://breezethat.com/topics

3. ladypedia, an experiment on tuning out ugender bias for Women's History Month on Wikipedia pages, https://breezethat.com/p/ladypedia

4. we're currently sussing out the extent to which sites are human-ranked vs. using other machine-extracted information from a page / site -- it's a lot of rapid iteration at the moment

5. we have similar thoughts as Ahrefs on publisher profit sharing, although we haven't established if we'll hit their 90/10 mark or not. we also have option for users to go premium and skip all ads


Are you planning to add accounts with ability to rank domains, like Kagi.com?

There are a lot of useless, low-quality sites that I (so other people might find it useful) never want to see.

I want to avoid using uBlacklist for that.


tl;dr -- yes

1. why? - only way scales -- we can't possibly curate everything - some sort of shared gain -- token, rev share, that's WIP - core concept is what we call "deep reddit" - search all links in a subreddit

2. users will get same controls

- domains <- specific, subs, TLDs, etc. - keywords <- exact or fuzzy, akin to ~ - structure <- microformats, schema.org, etc. - format / content <- images, blogs, newsletters, video, etc. - user controls <- think form builder so don't have to remember query syntax - updates <- how often to snag updates - alerts, etc.

3. deployments

- private, unlisted, or public - branded or white labeled / embedded - basically your personalized equalizer for the web


> Maybe someone should build a search engine that downranks all websites that have google analytics and google ads? Would certainly be an interesting experiment.

Try Kagi.com. Also, you can manually rank or even block domains you choose.


I've never had a problem using a specific website's search engine on Chrome? Your experiences may not be universal.


I only use Google to search Reddit because the Reddit search is unusable, not because the Google results are.


Maybe Google should buy Reddit, build a better search across it, and set AlphaGo to use karma scores on Reddit to better identify relevance across search results on Google proper.


> karma

No! Goodhart's Law. Karma in itself is pointless.

The reason that adding reddit to the end of a search query has better results is because you are querying a community for your results.

There are already bots on reddit that take submissions that hit the frontpage 366 days ago and re-submit, and then other bots that take the top comment and resubmit the comments to that submission. They get loads of karma doing this. Sometimes it's useful, sometimes not. But it does nothing to improve actual credibility.

The larger the subreddit, the SMALLER the community. r/AskReddit is not reliable. r/MechanicalKeyboards is. You want to find a small, tight-knit community of peers that know each-other. There are small communities of SEO experts just submitting Amazon affiliate links, so just 'small' isn't sufficient either.

You want to vet that the posters are acting in good faith. Find the top posters, find things you know(and it's better if you disagree with mainstream opinion here), and check their other posts and see if you agree with them there. If you do, it's more likely that you will agree on this subject you are referencing and don't know much about.

This process sucks - especially when you start - but over time you will accumulate a network of trusted people and communities that you can rely on. Anything else will be corrupted by greed.


Both are true. Reddit search is useless but google search can also be useless until you put reddit in.


> because the rest of Google’s results are so unusable

I would suggest the correct framing here is "the rest of the internet is unusable".

How else is Google supposed to guess you want a personal opinion from Reddit other than... typing it in the search query? Google is a search engine, not God.

If you want an answer from a particular source you're going to have to specify that, Google can't know if you want Wikipedia, or Quora, or a news article or a TikTok video or one of the other thousand pages that plenty of users might be interested in.

The internet has exploded in size, so has the diversity of legitimate sources, the search engines are hardly at fault.


I would suggest an even more correct framing as in "the rest of the internet as discoverable by Google is unusable".

On a more serious side, I think, we may need to have another dimension to web search, regarding the type of origin, namely "conversational" versus "commercial" (to be applied to both text and image search). Something like this would provide a more generalized solution to the problem, instead of pushing one of the more commonly known origins to become yet another monopoly.


Everyone--note this moment. We're witnessing the fall of Google and Windows. Like IBM, it will take time to realize what happened. To some, it's obvious. But to others, it's all that's known.


I'm not sure IBM is doing too bad. Sure, they aren't the growth machine that is AAPL or MSFT, but relatively low PE for a tech company that's been valued at or above $100B for over 10 years is a pretty good success story.

https://www.google.com/finance/quote/IBM:NYSE?window=MAX


By "fall" I don't think they meant financially. Just in relevance.


Yes - IBM used to be computers, as in average people genuinely considered them to be the same thing, like "Hoover" was more or less synonymous with vacuum cleaner. Google was functionally synonymous with "the web". No longer.


Meanwhile alphabet EPS is up 35% from last year.


On a tangential note, I recently started wondering about how legal brothels work, where prostitution is legal, what do legal brothels charge, etc, and google was supremely unhelpful. I don't know if these things are just so shady they don't have websites, or I'm looking in e.g. Germany and it's not in English, but I imagine American sex tourism is probably decently sized market. Dunno. Strong suspicion the morality police were preventing me from sating my abstract curiosity about sex work.


This sounds like a great example of where Reddit would be 100x more useful than Google's version of the web. There's no doubt in my mind that Google tries really hard to Disneyfy the web, unless you push it really hard not to.


Seems pretty useful to me[0,1], but this is google.com set to english.

0: https://i.judge.sh/WmhyQ/F_pp1uDoEG.png

1: https://i.judge.sh/H1t5n/G_6H65oODP.png


Eh, wasn't clear - was looking for specific brothels.


You might just not be good at googling for that topic.


It's difficult when you're searching for something where there's money to be made because all those sites are fighting for the top and using all the tricks in the book do it. See also: blockchain.


> When I recently Googled “best toaster”...

Well if you google "best toaster" then yes, Google will (and should) show you ads and sponsored posts and articles that have been seo-ed to death, because it's a meaningless search.

Google isn't in the business of assessing toasters' quality. Google is in the business of finding information relevant to your search.

But relevancy goes both ways: you need to search for something specific to find it.

"When will I die" will not point to anything interesting. "Pancreatic cancer symptoms" might.


It's not in any way a meaningless search and one of the main goals of decent search engine should be to figure out common shorthand searches like this and show meaningful results for them.

When I search "best toaster" it's shorthand for "show me roundup reviews of toasters + forums where people discuss toasters + very highly reviewed toasters on shopping sites". It's not shorthand for "show me loads of ads and SEO bullshit".


Ok, you may be right. But my point is that "best toaster" doesn't mean anything per se; do you want a fast toaster? Durable? Low energy consumption? Elegant? Large capacity?

There are many characteristics to any device and the "best" one depends on personal preferences. What's the "best car" for example? Does that even mean anything?

I don't think a generalist search engine should be used for shopping. You don't need Google to read reviews on Amazon and form your own opinion, again, depending on your own preferences.

People use Google as an oracle and then complain when that fails. But Google is a search engine. It's not the Pope.


I would imagine, if I had info on _every aspect of a human's life_, such as their email, their documents and photos, their messages, their browsing history, etc, that I could tell what _they_ meant by "best" _at that moment_. If not, what _are_ we trading all of that information for? Ads?


We pay for our searches and social media in information rather than currency. Using the information to improve the service would cost more than that information is worth.


> do you want a fast toaster? Durable? Low energy consumption? Elegant? Large capacity?

I don't know. That's why I'm asking "what makes a toaster the best?". I'm trying to learn. If I get bombarded with ads and spam the search engine has failed me.

> But my point is that "best toaster" doesn't mean anything per se;

When I type "best toaster" into a search engine it does have a specific contextual meaning. It means "please help me educate myself on what makes a toaster 'best' so I can make an informed purchase".


Of course. Most people have figured out by now that "best toaster" will only get you horrible SEOd sites written by robots. This is why "best toaster site:reddit.com/r/toasters" gives you the actual best answer.


Not defending Google, but there are hundreds of online retailers that sell toasters [and other such utility items] and it really is an opportunity they are missing by not having a page that addresses such queries whilst allowing 'others' to do it instead.


Cuisinart CPT-180 Classic Toaster


This is a meta post. A discussion rose up on HN, so much that people at New Yorker took notice, cited HN, and that article is posted here too.

A couple of observations since I last wrote my thoughts on this (last time i said this: https://news.ycombinator.com/item?id=30348492) :

- People who work in tech do not constitute the avg internet user now a days who uses Google to search. Naturally, google evolved to serve the lowest denominator by default, and that would probably not be satisfying for advanced users. (which opens up space for a new competitor I think?) Half the searches are when people are asking basic questions - hence they do not even result in a click. I have searched multiple times for a random calculation because google is faster than opening a calculator on mac.

- Adding site:reddit.com is a feature, and it's a utility of the engine that Google is allowing us to concentrate search to a subset. Same goes with many other operators, it's just another phrase. Ideally, there could be a wrapper which does these things for us, but I doubt it would work well. That does not mean Google search is dying, but more like we are now realizing the dependency we had on Google and the laziness when it comes to finding things on the internet.


Looking for an exact phrase (with "quotes") and limiting search to a specific site (maybe assisted by UI) is not difficult for the "avg internet users". It was possible to find just the most obscure and surprising information with a good search, and Google had the SEO spammers and content clowns beat for many years.

Now people use "site:reddit.com" to limit the results to a space that is moderated by humans and not completely commercialized.

The loss of Old Google might be worse for humanity than the burning of the Library of Alexandria.


what do you suggest? Bring back advanced search? Or just disable the training wheels.


I tried their "Best toaster" query and one thing I want to say is just how hilarious it is that "best x"-of-anything webpage HAS to have $CURRENT_YEAR in the title. Yelp does something similar to its Google search results.

As if nothing good or the best in its breed ever came out from $PREVIOUS_YEAR or any previous years before that!

Tell me you're a garbage SEO website without telling me, just slap on "in $CURRENT_YEAR" or use some capital letters, I love how easy it is to tell the wheat from the chaff!

edit: someone with a more negative outlook/rant than my glee https://reddit.com/r/changemyview/comments/qowtws/cmv_seo_an...


> As if nothing good or the best in its breed ever came out from $PREVIOUS_YEAR or any previous years before that!

I know most of these are blogspam sites and I hate them too, but the way I understand the rationale behind the year, and the way I would do it too is this:

Best $PRODUCT $CURRENT_YEAR can include products that were manufactured in $PREVIOUS_YEAR or $LONG_TIME_AGO. The $CURRENT_YEAR is saying that the list is up to date, not that all the items in the list were made in $CURRENT_YEAR.


I understand the point being made, but this is solved trivially by including a published date, eg: Best $product - published $exact_date

(or alternatively, instead of "published" use "updated").

This has the added benefit that those searching for $product towards the end of a year can decide whether such lists published/updated near the beginning of the year are relevant or not.


Totally. When I’m searching for the best option for some random thing, I’ll often add 2021 or 2022 to make sure the results are recent. And, inevitably, when I need to add “Reddit” to the search query, adding the year is still helpful to avoid showing 10 year old posts. I’m looking for recent recommendations in these types of searches, normally,


And searching for "best toaster 2020" gives plenty sites with a year in the title, not 2020, though...


Best toaster: https://shop.panasonic.com/kitchen-and-home/kitchen-applianc...

It is, in fact, a 20+ year old design that's had some minor updates but is still essentially the same as the one it replaced.


I just have one of the bread toasters with two slots. It was like 12$ and I can’t imagine a better toaster. What you’ve linked to is a toaster oven, which isn’t what Americans tend to think of when we hear “toaster.”


I'm American and don't think I know anyone who uses that kind of toaster. Lots of variety of experience in this country.


As an American with a toaster oven, I still call it a toaster colloquially. (As do others I know with toaster ovens.)


https://www.youtube.com/watch?v=1OfxlSG6q5Y Toasters were IMO a basically solved problem.


I will say sometime I want the date in the title. For instance when looking up JavaScript topics where things move so fast, or even C++ where the year standArd can mean the difference between a language I want to use and not.


Yup, I agree to that! Helpful for the newest JavaScript libraries/Ecmascript features/best practices in $CURRENT_YEAR.

But, wow! Who knew that the landscape of toastering moves this quickly and changes so much from year to year?!


Or doing the reverse were you need info for a very specific version, and it’s absolutely obliterated by results of newer versions.


> As if nothing good or the best in its breed ever came out from $PREVIOUS_YEAR or any previous years before that!

That older one could still be the best, but this way you (or that’s how it’d be in a perfect world) know that they at least checked if anything better came out.


Even better are the articles that were written last year or whichever period SEO abuse started skyrocketing, except slapped on with [Updated $CURRENT_YEAR]!

So fresh, so new! Except when you click, you realize the article is dated and was barely updated at all from when it was written!


Some sites have changelogs for those articles. I always want to give them a hug for that ;)


8 Best Toasters of 2022 - Good Housekeeping Published: Dec 9, 2021

Best toaster 2022: 12 toasters for every kitchen Published: Jan 12, 2022

For a laugh I tried "best toaster 2023" and there are already two worthy pages.


I mean it’s just spam evolving to users realizing that, for a time, you could get better results by putting the year in the query.


Honestly I just don't want Google giving me a list from 2015 with entirely discontinued products...


> As if nothing good or the best in its breed ever came out from $PREVIOUS_YEAR or any previous years before that!

Try to apply this to software (e.g., "I want the December 2018 version of Chrome, it was the best version of Chrome so far.") and observe the reaction you get.


Not for minor builds, but it's trivial to find posts saying older versions of Windows/OS X/Firefox/Chrome are better. But you might have to go to a more substantive change like, with Chrome, Manifest3


Speaking of the "best toaster", this and related HN discussion comes to mind: https://news.ycombinator.com/item?id=29342936


So of I could filter out all search results with a year in the title, perhaps I'll get something more "authentically organic"?


The attitude below: "Well, duh, what did you expect, I'm sick of these HN posts" is a depressing, cynical take.

Google is the search engine, and if a business, or a thought, or a piece of information, doesn't show up there, then it's invisible to 95% of the (Western) human population that doesn't use duckduckgo or Bing or whatever failed alternatives there are.*

I do find it very amusing that I'm not the only one that uses reddit as a keyword in search results, because the results are almost spot on.

* - Yes, I use duckduckgo as my default, but use g! a lot because the results aren't superb for tech items, and it too falls prey to SEO type manipulation with dubious stuff at the top, in my experience.


"The search engine’s stated mission is to “show you sites you perhaps weren’t aware of.” Its results, based on its own custom algorithm and data gathering, prioritize text-based Web sites that lack ads, mobile support, encryption, and other features that qualify as good S.E.O. “Google punishes sites that aren’t up to speed with modern Web technologies,”"

Does Marginalia really prioritize sites that lack encryption?? The other things seem like good ideas, but don't we all want to encourage encryption?


i've never understood "encrypt everything". Your ISP still knows the domain you're going to, encrypted or not. It might not be able to see the specific pages/URLs if it's SSL.

But some sights just aren't... sensitive. They don't have logins, they don't have user-data stored. I don't get the point.

Does it matter if zombo.com really has an SSL cert?



Kagi's interesting. On the surface, it seems like a metasearch engine. But upon closer inspection, two of its sources--its news and "non-commercial" Teclis indexes--are actually developed by the Kagi team, and can be used independently on their own dedicated sites.

I do like the ability to boost and downrank certain domains myself. Its TinyGem platform also looks like an interesting "social bookmarking" approach that's more minimal and focused than other mainstream solutions. I'm currently working on a way to POSSE[0] my public bookmarks to TinyGem.

[0]: https://indieweb.org/POSSE


I've been using Kagi for a few weeks now. Unlike when I trialled ddg, and found about 90% of my searches ended up being !g, I'm finding Kagi results to be genuinely excellent.

The "lens" functionality which lets you gone searches to a subset of site is really excellent and genuinely useful. Plus you can remove sites from results, too.

I'm really optimistic about where it's going. And yes, they have a "pay for it" model which reassures me they're thinking about the future.


> Unlike when I trialled ddg, and found about 90% of my searches ended up being !g

How long ago did you try DDG and do you have some examples of searches that didn't work on DDG but did on Google?


I filled out a survey for them, and the suggested pricing scheme seems out of whack though: $120/year is going to be a complete non-starter for mass adoption.


I agree. I really like it but $120/year is really pushing it. This is similar to what I pay for services like Spotify and Netflix, which I use far more frequently. I could stomach a $50 annual plan, for example, and I suspect they'll find this price point will result in higher revenue/profit over many more accounts when they begin price discovery.


You really use Netflix and Spotify more frequently than a search engine? I'm the other way round, and by like an order of magnitude at least. (Well, I don't use Spotify at all, but still.)


There might be a debate about the use of the word "frequency" but I average ~20 hours a week between Spotify and Netflix. My search engine time is far below one hour. Kagi would need to successfully argue that that hour is more "valuable" than the 20. For me, Kagi would need to be 2,000% better than Google to justify the price.


Not sure there's a reasonable way of measuring my "search engine time" (time actually viewing results? time reading the sites found?) but the vast majority of the time I do spend using them is work time, and the quality of results affects my ability to do my job. It's nigh on impossible to measure, of course, but I suspect even a modest improvement would easily pay for itself at that price point, in my case.


That's a good point: should we include the amount of time viewing the content found by the search engine? I'm not sure but this would certainly bump the time estimate up. It's even more complex, because if I find the right information fast, I spend less time viewing the results, but I actually have a better experience, meaning that there is an argument for an inverse relationship with the time spent on a search engine and the value derived.


Interestingly, I just don't think about it in that way.

For starters, search is just critical to me (and I think pretty much anyone who does anything around development / design / content / ...actually any modern day desk-based work...) - so in time terms, no, I probably listen to Spotify more but in touchpoint terms I can't imagine how many searches I do in a week. I would imagine it's thousands.

Two things then follow: firstly, the quality of those results is totally critical. If (as is starting to happen with Google) I'm really struggling to get past the endless ad-based SEO'd b/s to get to the site or nugget of information I need, and each of those searches is taking me much longer and longer, then this really does become a genuine value / cost proposition. With a tool like Kagi I can say "just don't show me site X" or "boost Stack Overflow" - then this plus there are no ads taking up screen real estate becomes a viable value proposition.

Secondly: there's something to be said about supporting a viable non-ad based model for services on the web. I would argue that basing everything around clicks in an advertising way has degraded basically everything about the modern internet. It's lowered content quality, drowned out real voices in favour of bots and algorithms and causes major large scale issues with "truth". Where organisations / companies / individuals can afford to pay X and get a bigger platform with their made up b/s about vaccines / war / the opposition / guns / whatever), then this drowns out the viable, real, truthful alternatives from people who actually know what they're talking about.

Plus, of course preaching to the converted, but the tracking model being used by Google is pretty terrifying, too.

Plus... I mean, $120 a year is $0.3 a day... I think I can manage that ok...


I agree with your points. I just think my demand curve for search looks different to yours. Might be my income band, or my use cases, or many other factors.


I for one will pay $120/year for it without blinking. They're not aiming for mass adoption.


I don't think they're aiming for mass adoption. In fact I think I remember them saying in their FAQs that they "absolutely not a Google killer"


Don’t tell the SV VCs, but not every site needs to target mass adoption.


If I put in "best toasters", there's a section called listicles that it lumps all such articles into. What a great feature.


I want to add to the discussion that whenever I search for any kind of discussion topic on the internet I search on Google and add "-reddit" to it! I hate the reddit-style low-effort void anti-discussions. It's unreadable. I find every forum in the whole Internet better than reddit. The voting system turns everything into a one line joke and there are no founded opinions.


Interesting, I do the opposite and add "reddit" to most of my searches.



>> those who want to know what a “genuine real-life human being” thinks of a certain product

Are SoL, as they say on the internet, because more or less any avenue where a genuine person can leave a review can also be used by a marketer to help steer you to the 'right' choice, their product.

I'm still with Bill Hicks on this one.


Typing “Reddit” after your query will get the job done quite easily :)


Until (which is already happening) marketers start using Reddit to game their recommendations. Next it’ll be hacker news. So basically it’s the red queen thing in the internet.


There's nothing wrong with the red queen thing, animals keep breeding even in the throes of it. Spam is impossible to block but also largely blocked, same story.


They have been for ages but marketers are less able to control the narrative and SEO optimise for reddit.


> Google messed up the internet because they don't allow installing other websites' searches easily. We had an opensearchdescription.xml for literally most of the websites, and somehow Browsers (including Firefox and Safari) managed to mess that up.

I’ve long thought that search should be provided by individual websites and cached/indexed by ISPs, like DNS etc.

You search for “foo”, your nearest Search Protocol Service (or called whatever) hosted on the ISP looks at all the content served by the ISP in the past which contained “foo”, then it asks those web servers for a detailed index for all occurrences of “foo” on their site.

If none found, query the next nearest search servers.

Search could and should definitely be decentralized.


> Search could and should definitely be decentralized.

While looking things up in an index partitions fairly well, search queries are typically extremely underspecified, so what you end up with is potentially millions of results.

This is why you need a search ranking, like this is not negotiable. If you don't have this, your results will be hot garbage. The crux being that this partitions in an orthogonal axis to document retreival.

If you don't solve this problem, you basically end up with YaCy. It's peer-to-peer, but it's slow as molasses and the results just aren't very good.


This would be even easier to game than Google SEO.


If anyone wants a real answer for "best toaster": http://automaticbeyondbelief.org/


This toaster has several failings. Probably the biggest one is it’s always live and will kill your if you touch the inside even if it’s not toasting. But then a number of minor failings like not having the railings that clamp in so it can’t handle thick cut toast or crumpets since it’s fixed to the side of bread.


And the technology connections video about it: https://www.youtube.com/watch?v=1OfxlSG6q5Y


That one has actually been discussed on HN a few times:

https://news.ycombinator.com/item?id=29346596


I bet it still can’t make a toaster strudel that isn’t either burned or cold in the middle.


I bet it can't too, because "toaster strudel" cannot ever work in a toaster. A toaster is specifically designed not to cook bread all the way through. The idea is to caramelise (toast) the outside of a slice of bread to the point where it's golden and crisp while still leaving the inside fluffy and bread-like. Good toast is fundamentally opposed to what a "toaster strudel" is.

To make good strudel you should heat it through consistently in a conventional over and then grill it.


Huh TIL. Kinda defeats the purpose of calling it a toaster strudels tho, doesn’t it?



The best toaster is having no toaster. Just eat "raw" bread, it's faster and likely healthier.


Probably the most HN response to this question possible. "Well you wouldn't need x if you just did Y instead, which is just better"


"Hacker News does not represent the totality of everyone who searches," he said. "Not all of them are going to be happy if all of our results are a hundred-per-cent Reddit."

How would he know whether they are/would be happy or not. The HN contigent is the only one that actually gives meaningful feedback. It seems like Google does not want that feedback.

Something about Danny Sullivan becoming a spokesperson for Google reminds me of when Microsoft bought out Mark Russinovich.


I thought I was the only one who started adding "Reddit" to some of my searches when I'm trying to do some research and learn about products.


It feels like the New Yorker should pay royalties to Dmitri Brereton? It's a slightly better written rehash of the same points, including the mention of founders' concern about tension between useful search results and profitable ones.


I haven't used Google directly in years, but here on HN I frequently get the idea it's going downhills.

What would be good to have is an independent search engine that both works well, and which is not backed by anonymized Bing or Google results.


'Ukraine on fire'

Google gets you nowhere near a link to that film at this moment because Google is not a search engine.

DDG has it as the 3rd result so you can decide for yourself.


Ah, the New Yorker. Just about a decade late


This "Google Search sucks" discussion is officially getting boring. We get it, it sucks; we already know. It's been discussed ad nauseum. This article is particularly bad because it's a rehash of (and commentary on) content that already appeared here, with no significant new information (much like the original article it's commenting on, which was also a rehash of HN threads and comments made by people popular with the HN crowd).


Just because you're bored and in the know, doesn't a) mean the problem has gotten any better and isn't relevant and b) hasn't given others, newcomers a place to express their grievance. The fact that it still persists and still gets upvotes shows people aren't done and discussions of alternatives still need to be had.


I’m glad it’s not just me. I’ve done so many fairly basic searches and gotten back absolutely awful results. I imagine most of the web isn’t even accessible through. This is an important topic that deserves even more discussion, especially since there isn’t even another solution yet


This. The Internet needs search. I took a screenshot the first time Google showed only ads and no organic search.

It's critical to the functioning of the free market that consumers have accurate information.

The fact that Google is crap is less important. The fact that Google squashes competition and there is no unbiased search engine, is of national interest.

Current situation is because of a combination of Google's monopoly abuse and the fact that US govt does not recognise that search engines are(is) the entry point to a significant market. If you believe in the free market, then the state of Google search indicates huge inefficiency. A lot of people treat the term free market to mean "I, personally, am free to get rich, however I can, at the expense of everyone else." Which is pretty much the opposite of what it really means. It really means "you are free to join the market and compete on equal terms." Under those conditions best product/price should win.


I tried searching for kagi, and all I got was jewellery results. This is absolutely true


It was pleasantly surprising to see the article cite Hacker News as a source ("“Google is dead. Long live Google + ‘site:reddit.com’ ”—became the No. 10 most upvoted link ever on the tech-industry discussion board Hacker News."), which likely inspired the pitch to write the article.

The article still adds value as the reporter got a Google spokesperson to comment, and also brings the issue to a much wider audience outside of Hacker News.


Nothing will change until the discussion enters the mainstream and every boring HN thread on the subject helps bring that goal closer.

I switched to DDG more than a year ago and I can honestly say that I've not had to revert to google search more than a handful of times since. A regular user would probably forget about Google in a breath if only they knew that such an alternative existed.


It may be coincidence, but around the same time that the "Google Search sucks" articles and comments starting becoming more frequent I am now getting pop ups on the Google Search results page to rate the results between 1-5 stars.


Sometimes I wonder if anyone is running a HN bot that can comment on the repetitive threads that come up on a monthly basis.


I’ve thought about creating HN Tropes, similar to TV Tropes to categorize the repetitive discussions with comments that are easily predicted. I haven’t convinced myself it’d help things, might just fuel the fire.



by the time it's boring for us technical folks, it's just beginning to penetrate in mainstream discussion


>When I recently Googled “best toaster” on my phone

When I search for an opinion...


This site is maliciously messing with Reader Mode.


No love for http://Yandex.com ? It's the only search engine I know of that actually answers the question when searching for "what countries are using ivermectin" instead of giving out a bunch of anti-ivermectin editorials.


How much longer is Yandex going to exist? Aren't they at risk of default?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: