Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Let's build an HN uBlacklist to improve our Google search results?
374 points by sanketpatrikar 14 days ago | hide | past | favorite | 267 comments
For the unaware, uBlacklist [0] is a browser extension that lets you blacklist sites from the google search results page. It lets you blacklist sites right from the results page, by regex, or by linking lists hosted somewhere.

The low quality of results has been a problem from a while now and has become worse lately thanks to all those StackOverflow and Github clones. So I was wondering if we could come together and contribute to a single blacklist hosted somewhere and then import it into each of our browsers. Who knows? We might end up improving the quality of the results we all get.

Lists to get rid of the StackOverflow and Github clones already exist. [1]

I would love to contribute to a project like this, but won't be able to be a maintainer due to time constraints. Would greatly appreciate it if someone could host this. A simple txt file on github would do.

What do you say, HN?

[0]: https://github.com/iorate/ublacklist [1]: https://github.com/rjaus/awesome-ublacklist

>become worse lately thanks to all those StackOverflow and Github clones

A google search showing some of these leech type sites:


For me, "farath.com" is outranking stackoverflow.

> farath.com was first indexed by Google more than 10 years ago

This seems pretty suspicious? Is it reporting the first time Google crawled the main domain farath.com? How is that relevant information?

This is the first time it crawled the domain at all. It's been a website since at least 2008[0], but was recently re-registered in 2020[1].

0: https://web.archive.org/web/20080607010730/http://www.farath...

1: https://who.is/whois/farath.com

Thats weird. I noticed no ads on this farath.com site. Are they going to monetize the email subscriptions somehow? How are they making money off of this?

This is a great example of why "Google sucks!!11" is mainly FUD. Let's say you're looking for the SO link, which is #2 for Google. Let's compare:

Google ("code that protects users from accidentally invoking the script when they didn't intend to")

Link: https://www.google.com/search?q=%22code+that+protects+users+...

SO - #2

Bing ("code that protects users from accidentally invoking the script when they didn't intend to")

Link: https://www.bing.com/search?q=%22code+that+protects+users+fr...

SO - #2

Brave Search

Link: https://search.brave.com/search?q=%22code+that+protects+user...

SO - Not on page


Link: https://you.com/search?q=%22code%20that%20protects%20users%2...

SO - Doesn't load


Link: https://duckduckgo.com/?q=%22code+that+protects+users+from+a...

SO - #2 (seems to depend on refresh)

Basically they're all the same. Google is faster, but the order of the results is identical.

If you did a large scale analysis in this manner I doubt Google would lose.

I'm not sure it's a good example, really. It's an "exact phrase search" with quotes, which doesn't happen much in real life.

It was helpful solely to show what some of these leech sites are.

Searching for (without quotes): What does if __name__ == "__main__": do?

Is probably a better test of which search engine has better results for the real-life query. Google might still win, but it should do a better job of screening out the spammy sites. It used to be better at this.

All the search engines have seen massive increases in SEO and consequently, spam. Do you believe Google et al isn’t working on it?

It’s obviously a very difficult problem.

I believe they've become more complacent since there is no competitive pressure, yes.

I doubt it. If their results become noticeably worse than Bing (and therefore DDG) results, people will start to switch. But Bing is having the same problems with the crap sites.

So they at least have to keep their quality even with the competition for the average person (HN readers are not the average person).

That matches what I said, really. "Better than current-day Bing" is a much lower bar than they used to shoot for.

Bing has always been very comparable to Google. Especially in certain niches. For example bing was better for social in 2010:


I don't know, a lot of people have claimed this but no one has any proof.

What would proof be? Over a period of time, they have clearly de-prioritized organic search results by pushing them down the page with ads, widgets, and so on. They used to engage more directly with content producers. Matt Cutts was often called the "Head of Web Spam". When he left, they dissolved his role and spread it around to several other people.

Your claim regarding deprioritized search results for example - where’s the proof?

If what you are saying is true it should be trivial to give sample queries so we can test across all search engines and see if it’s true.

"Deprioritized" means moved down the fold of the page. Like, for example, the 3-4 ads at the top that used to not be there. And various other widgets that used to not be there. Ads in SERPS started out in the right hand sidebar only. Such that organic results were at the top. They were slowly moved down over time by a slow rollout of more ads and widgets. There are many queries now with ZERO organic results above the page fold.

None of this is in dispute, so I don't feel compelled to dredge up old screenshots or examples.

Comparing the page layout to competitors isn't especially helpful when Google has 90%+ market share in search. Google is defining the standard for others.

Why don't you give any examples of these queries? There's no point in comparing to say, 2008 Google, there are many of orders of magnitudes more sites. You can't expect old algorithms to keep up. The main thing that matters is how long it takes to find what you're looking for.

Can't say I really get your point. You're willing to complain and go on this long tirade, but not to give a single query example? The one you gave in your original post was already debunked easily enough. You're saying these things as if they are a fact - I disagree with you, it is in dispute. If we're going to just take random claims are facts, then I'll say Google's results are better than ever.

Google's market share isn't really relevant. It's very easy to just use Bing or any other search engine. I actually use Bing half of the time since it offers rewards.

I didn't give any query to be debunked. I gave an example of a query to see the copycat sites. It does that just fine. Re-read it with some benefit of the doubt maybe? Your rant seems to be based on the idea that my upthread post was something that it wasn't. I did note that one copycat site was ranking above SO, but that was secondary.

As for a query with a shit ton of ads and widgets? Try vegas hotels, or anything else with lots of widgets and ads. The travel space has a lot of them.

All I’m saying is that claims should have proof

Would you please make your substantive points without degenerating into the flamewar style? You did it repeatedly in this thread, and worse each time. That's the opposite of the direction we want things to go here.


Proof of what? That google didn't used to have ads above organic results? That they added more and more over time? That some queries (vegas hotels) have no organic results above the fold? Those are all common knowledge. Burden of proof of the opposite would be on you.

I don't happen to have a historical screenshot history of SERP results specific to StackOverflow copycats, no. I'll concede that.

> Burden of proof of the opposite would be on you.

What, lol. I'm not making any claims. You're the one saying Google is becoming complacent. I'm asking for proof of this, lol.

>> I believe they've become more complacent since there is no competitive pressure, yes.

I have noticed that Google and Bing seem to present results which link to sites like stackoverflow.com where the questions and solutions are absolute FUD.

I think someone or an entity has been engaged in a consertive effort to manipulate the results if its not something more nefarious in Google and Bing's domain.

Very few entities have the resources to do this either, its not something a ragtag band of goat herders could, thats for sure!

In my experience, the you.com apps and overall search results aren't affected by SEO the same way that some of the other engines are, which is why I think their results work for me

Just tried your search in both Google and Duck Duck Go. On Google first page spam copies are ~80% of the links, on DDG maybe 40%. Not good, but much better than Google.

I tried you.com[1]. The first few results seem quite relevant. Best part is that you can actually personalize the weights to assign to your search (your very own bubble)


This isn't the same search. The parent post had quotes around the phrase. You.com returns identical copy-cat results if you do the same search.

To be fair, not sure what other results we'd expect if we're going to search for a specific, plagiarized phrase.

Edit: actually, upon review, you.com does indeed give one extra useful result within the top three. So one point to gryffindor.

>To be fair, not sure what other results we'd expect if we're going to search for a specific, plagiarized phrase.

Yeah, I posted an exact search solely so people could see examples of the copycat sites. It's not a good example of a real life query in any way, and not useful for comparisons. It is interesting that Google puts one of the copycats above SO though.

I saw you.com displays some Code Complete snippets but the lines are too short and doesn't get the language highlighting, which make it harder to read. Nice try anyway.

uBlock Origin supports blocking search results, so I don't require an additional browser extension. I maintain a blocklist for myself, targetting Google and DuckDuckGo [1]. Feel free to contribute more websites or use this list as a template for your own repository.

[1] https://github.com/darekkay/config-files/blob/master/adblock...

Blocking w3schools, I was not sure but I think you are right, MDN is just much better

That's an ambitious goal, I'm not sure to see how that would be maintainable on the long run.

On a much smaller scale, if anyone is interested, I maintain a black list focused on those code snippet content farms that gets in the way when you're searching for some error message or particular function here https://github.com/jhchabran/code-search-blacklist.

May I know why my domain (cyberciti.biz) was added to that list? I created my site back in 2000, and there was no StackOverflow or anything. So much for creating original content and then getting labelled as a spammer. In fact, some of the top answers on StackOverflow were copied from my work without giving any credit to me. Some people do give credit tho. But, go ahead block a site that actual humans maintain over 20+ years. Also check my About[1] and Twitter[2] page. There is no scrapping or spamming going on my part.

[1]https://www.cyberciti.biz/tips/about-us [2]https://twitter.com/nixcraft

Interesting, I have your site on my mental blocklist as one of those scrape and rehost sites.

I'll be honest, I don't remember how I came to that conclusion but I suspect I encountered an unsatisfactory answer to a question I was looking to answer, saw the .biz and drew my conclusions.

The noise to signal ratio for most of my queries is so high that I have to start judging a book by its title, not even its cover.

I've noticed cyberciti.biz showing up in my DDG search results but I've always ignored it because of the initial captcha. I will try it now that I've seen your post here!

The .biz definitely does not help, since it hints to me that it's just another one of those worthless reposting sites, as someone else commented below.

Not OP but dot biz is associated with spam in my head for what it’s worth

A couple of years ago, I was at Google's office, and I talked with someone who works on search about .biz extension. They said domain extension doesn't matter. At that time, they said backlinks is one of the most vital signals apart from some PR. That was like eight years ago. So I never changed the domain name despite owning the .com version too. It will break too many backlinks.

Sure, google might be fine with a .biz. As a human consuming googles responses, my eyes typically glaze over seeing .biz and jump to the next search result. It's not that there is anything particularly wrong with .biz, but this is the first legitimately useful site (to me, probably plenty for others) i've heard of using .biz.

cyberciti.biz is one of the few sites that come up in Google search results for anything code/linux related that has valuable content. I do wonder why someone would block it.

I agree. Valuable site and nixcraft adds a ton of value to the linux community. Thank you nixcraft!

Thank you for your support!

I wanted to stop by and say thanks for cyberciti.biz! I've been using it since 2001-2002 when I got my first Verio Freebsd VPS and had to figure out what was going on.

When I see your site pop up in my search results I know the content is going to be more reliable than most of the others. Thanks for the effort you've put into it.

At first scan your site looks like one of those automated scrape and republish sites. I'm curious what got you on that blacklist (misspelling? bad first impression? automated tool gone awry?) though.

Glad you said something though, I wouldn't have looked at it twice without a human attestation.

I kept it simple on purpose. As a result, it loads faster on both desktop and mobile and passes web.dev PageSpeed insight test too.

For me personally, the titles are what makes it suspicious for me. I've almost never, NEVER found something good in an article titled "(top) xy ways to z", I've come to immediately avoid any article with such a title.

Yep, and it doesn't break if you have javascript disabled. Good work.

That doesn't really intersect with my observation in any way though. As a stranger I don't see your intent when I see your website, I just see your website.

I'm always happy to see your site in search results, it's one I recognise and trust for CentOS/Linux related information for years. Thank you!

Thanks for the kind words!

Your comment is exactly why spam prevention is difficult. Sorry for that.

Yes, imagine if this blocklist becomes mainstream and used by major other adblockers or extensions? Then, there is no central place where one man's project can ask to remove my domain, and it will vanish. I often read on HN how much the web is centralized, and then we come across resources that kill independent blog/sites because of an error on the list maintainers part.

As a user I think I've put your website on my mental "avoid it" list for its design. I've opened a page now and I feel like I'm instantly in a tunnel vision mode. For UX: it's not a pleasure to scroll up & down; maybe there's also a psychological element about the main content area being so slim in width.

The other comment made me remember there was captcha too, right? I had been using my own rented server as a VPN for all my internet access. But I'd have never blocked it for a public list - I've read the 'about me' page.

> some of the top answers on StackOverflow were copied from my work without giving any credit to me

That's really frustrating. I'm building a faster search engine for programming queries and just added your site cyberciti.biz as a recommended and curated source of Unix/Linux material. Hope more devs get aware of your work and you (and your collaborators) receive the credits deserved. Thanks for your work of many years.

Thank you. Do ping me when your work is ready. I will share it on Twitter :)

What CDN do you use? I was immediately asked to solve a captcha from my phone.

Cloudflare sometimes triggers those when they think IP reputation is not good. Typically happens for data centre IP ranges as WAF has an anti-bot feature. So I know it is a problem for some.

Maybe it's the fact that I don't use one of the three major US ISPs. Hopefully CDNs get used to the idea that there can be more than one fiber provider.

Would you mind sharing the Cloudflare ray id displayed at the bottom of the screen when you see a captcha? I can look into it, and maybe be I will able to fix it too. Reply here or email me at webmaster@cyberciti.biz. HTH.

Not sure if you changed your Cloudflare settings, or if Cloudflare changed something, but I'm no longer getting the captcha, so that's good, but sadly I can't help debug the original issue.

It's worth a try! Also, thanks for maintaining those lists!

Well there's only one way to find out

Isn't this Google's job? Are developers a small but lucrative target and so the suits at Google don't see the benefit of improving that experience by cleaning up the spam?

Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?

Is there a word for this tendency to say, "it's someone else's job" as a justification for doing nothing at all to help or improve one's own circumstances? I see more and more of it in the public discourse over the last years and it kind of bothers me. I see it a lot in conversations related to poverty or climate change, but it is as we see here by no means exclusive to those topics.

To the original replyer: you could wait for Google to do something, but if they were going to fix the listicle issue, and it were fixable on their end, they'd probably have done it by now. I'm disappointed in the situation too but if there is a workable solution on our end it would be silly to ignore it because fixing the problem is someone else's job.

To the OP: I worry that the number of domains pumping out crap might be far greater than we know, and that might hamper the effectiveness of this. If the collaborative block list ever got big enough you might also have to deal with spam. But I think it would be a great thing to try. This is one of those issues that annoys me, but it's just below my action potential threshold. My biggest objection right now is the spammy recipe websites.

I think the question was specifically about declining to improve one's own circumstances because someone else could, not declining to help someone else. That's different from the bystander effect as usually conceived. The theoretical bystander effect is something like "Someone is being attacked, I could call 911 but someone else will;" this is perhaps something like "I am being attacked, I could scream for help but other people will notice anyway" - but really more like "I can't cross the street in front of my house because the storm drains by the crosswalk are clogged, I could rake them but the city is supposed to do that."

Skip the abstract and head straight to the caption for figure 3, then go back and read the whole thing.


Fair enough, maybe Learned Helplessness is a better fit.

I feel like the iterated prisoner's dilemma is part of it too -- if you let someone off the hook repeatedly for defecting, you just enable them to continue defecting.

Or somewhat related, the "if you touch it you own it" problem.

So to overcome this learned helplessness effect, we'd need a good strategy to prevent the deterioration of duty of whoever is "supposed" to be fixing a problem, and/or a way to cut out the derelicts entirely.

I suggest this because there can only be so many websites that use SEO to game their way to the top and bury the good results beneath them.

If we manage to block them, we might be able to get a results page with good sites upfront and the other meaningless content below it. I assume Google will also surface good content along with the bad, so our blacklist might enable the good stuff to reach the top.

The spam problem, I'm sure of yet, but we might either be able to block enough of it to be satisfied or it won't pose a problem for most searches that are currently giving bad results.

I like to think of it as solving the problem in the right place.

It’s often possible to work around issues in lower layers, but it's usually at least worth raising it upstream to get it fixed 'properly'.

It'll help me when I dont have a blocklist active, and it'll help new programmers who arent familiar. It'll reward good sites with extra traffic and discourage new spammers entering the market.

In the worst case, if Google really can't or won't address tge issue, understanding the upstream problem more fully can help make a better workaround.

> I worry that the number of domains pumping out crap might be far greater than we know, and that might hamper the effectiveness of this.

I'm sure you're right about the number of spam domains, but Pareto suggests that blocking even a small percentage of them might provide a large gain.


Thanks, that's a good thing to remember. The other concern is installing ublocklist -- I'm scared to give more extensions access to google.com. I wonder if I could fork it and restrict its permissions to the SERP.

I ended up creating a repo with blacklist.txt myself and will add to it for my own usage. I don't see anyone else who'd maintain this. Feel free to use it / contribute to it.


I just added to your repo:

* code-search-blacklist based on this jhchabran's repo [1]

* pinterest.com based on SwiftOnSecurity's interesting seo analysis [2]

Maybe your repo could be an opinionated list of things developers find annoying about google search results, where others might value sites like pinterest in their results.

[1]: https://github.com/jhchabran/code-search-blacklist

[2]: https://twitter.com/SwiftOnSecurity/status/12588753334467174...

Internal v External Loci of Control perhaps?

Exactly this. I've seen it around on this site a lot lately.

> Is there a word for this tendency to say, "it's someone else's job"

At the risk of sounding pretentious, I call it "socialist", since they spend their lives telling others what to do or what is good or not for the rest of us but they rarely do anything about it. Surprisingly, this is the group that is really worried about poverty and climate change and do as much as I do for it, with the difference that I do it by myself, the few times I do it, not requiring the rest to do it.

It is always someone else who will do it. Though the other day I had a conversation with a non-socialist person that had that same attitude ("other should do it") towards what OTHERS should do. I really dislike that attitude, no matter where it comes from.

Point at hand: when I want or promote something, I am the first one to do it no matter others do it or not. The rest, no matter the ideology, all b*llshit.

As imperfect as I am, I try to do what I think is good (and sometimes my imperfection prevents me from doing it) but I do not spend my life telling other people why they are worse than me and telling them what they should do or not. The most I have for someone is good suggestions, never requirements.

This has nothing to do with socialism. Since it is a problem under capitalism why don't we call it "capitalist". Or maybe we could just say lazy.

The irony here about socialism for me is that I find plenty of people that would fall in those ideals (not only in those, but most of the time) that tell you what is good or bad for everyone else but they seldom do what they say. That is why I said "everyone else's job" is an approximation of "socialism" in my view.

I cannot see how that would be a vice of capitalism since capitalism is basically non-intervention and freedom.

>That is why I said "everyone else's job" is an approximation of "socialism" in my view.

I don't think socialism is the word you're looking for. If you're trying to say that the described behaviour ("it's everyone else's job") is common in, or a byproduct of, a socialist system then it would make more sense (though it could be argued).

But that doesn't make it a correct "approximation" of socialism. Socialism is:

>a political and economic theory of social organization which advocates that the means of production, distribution, and exchange should be owned or regulated by the community as a whole.


I know the definition, lol. What I meant is attitudes I found under that thinking quite often. It was just a silly observation.

I did not mean anywhere every socialist is like that or that all peopple that do that are socialist.

On top of that, that is just a personal opinion. Nothing else.

I think you should read some more socialist authors. Socialists are very into doing things but would also like to have access to the resources that facilitate getting things done, rather than being forced into the position of constantly petitioning bureaucrats. But socialists who run for office on a platform of opening up that bureaucracy to the public are often denounced as undermining the foundations of society, prosperity etc.

Socialism and scarcity go hand in hand no matter the intentions. It has happened, and it still happens partially.

A centralized economy is not flexible enough to supply the changing demand or to readjust.

The big leap forward was created, for example, to overtake England (I recall, correct me if I am wrong) in Steel production. In practice what happened is that people run desperate for their lives and died of hunger. Because they did not have the means to do it.

There are way more examples, being this one maybe the biggest disaster, but in essence there is an impossibility to calculate prices and gather information of good quality in a socialist system.

Socialism is not synonymous with central planning (example of a decentralized socialist concept below). I am not an advocate of the latter, though if I were I might point out that highly centralized Amazon seems to operate profitably and efficiently.


Amazon works with private capital and under market laws.

Not with public capital and under coercion, which is how a socialist system works.

Oh wow, an uncritical hot-take refers to some one they disagree with as being a 'socialist', you're so revolutionary. I wonder where you could have gotten the idea that all of your problems are the fault of the 'socialists'.

Oh wow, where did you take from I said all my problems come from socialists?

I described an attitude I dislike and I often find, like it or not, at that side of the spectrum: we have to eliminate poverty, stop the climate change and do all good things in the world. At the same time I write from my big luxury Iphones, my big house (as I promote equality) and, of course, tell the rich to pay the bill as if their money belonged to everyone else. Mine no, I am no rich... I am socialist to point to who should do it, not to do it.

I have never, ever seen a person with a capitalist mindset pointing to the wealth of others or saying that everyone should have the same be equal for the sake of being and they are quite more restrained about telling all the others what to do. Also, they are often quite more frugal people.

> Isn't this Google's job?

Have you searched anything on Google lately? The answer is "no". Their new job seems to be to stuff your results with anything even remotely related (and sometimes related in a way that only machine learning can see) so you have things to click on.

Edit: with the lone exception of "find me this bussiness nearby".

It's very obvious Google is no longer the equivalent of grepping the web. There's some ML/NLP interpretation that's rewarded for returning the substring/interpretation that returns the most/highest ranking results.

It's very noticeable if your search contains a short keyword that has to interpreted in the context of the other keywords. As an example, if I search for 'ARM assembly' plus another keyword (macro, syntax etc) it will see 'ARM assembly' without the extra keyword has way more high ranking results and happily show me how much it knows about armchairs that don't require assembly. Ignoring the fact that the extra keywords are there specifically to limit the search results.

It's tiring, a lot of time I previously spent browsing the limited but valuable results it returned I now have to spend mangling the keywords enough to outsmart their ML/NLP interpretation and get it to admit I am actually asking for the thing I am asking for so I can finally get to the part where I have to solve the modern captcha: click all the results that are:

1. Not stolen/rehosted 2. Not a "Hello World" level Medium blog 3. Written by an actual human

If I search for a business name its normally not the first result any more. Usually an advert for another company in the same space I dont want to use (basically an offensive / scamming result from user perspective) and then also the standard "buy search term on amazon".

I think you're wildly overestimating the influence of the minority within HN (or other similar communities) that actually care enough to switch to another search engine.

This reminds me of the Linux gamers who claim that they can influence game development companies by purchasing games with Linux ports, but wind up being less than 0.5% of sales of most games with Linux ports, which leads manufacturers to ignore that customer base almost completely.

Not disagreeing with you.. big companies truly mostly ignore Linux but there are more than a few indie devs who support Linux as a platform. And I tend to play only indie games these days anyways because all the big commercial games have been reduced to some kind of click-and-succeed or free-to-play-and-milk-some-whales crap.

I personally am kinda happy where Linux gaming has come to be. Sure it could always be better but I remember times where there were only like 3 games for Linux and you had to compile them yourself..

game companies might have ignored us, but in the end it created a space for valve and codeweavers to fill.

> cleaning up the spam

I'll be happy to be proven wrong, but I think Google is now fully in the 'optimize for engagement' camp. If that's what they're doing, it's by definition not spam (from their point of view) if people are clicking on it more than the non-spam results.

Again, only my guess as to what's going on. I don't see another good explanation for them only serving cloned Stackoverflow and top X lists for basically everything now.

From a user point of view, a search engine’s job is to link you away from the search engine, so how does a search engine measure engagement? Is it time on page or maybe number of searches performed? When you don’t have viable competitors both of those are improved by worse search results. Even number of ads clicked would be improved with worse search results because ads don’t have as much competition for your attention when there are no relevant results on the page.

> From a user point of view, a search engine’s job is to link you away from the search engine

That hasn't been true for a long while.

Users are there for an answer to their query, not to be directed anywhere. Frequently this comes in the form of search vertical response that doesn't lead the user anywhere.

From Google's point of view, you want the user where you can show them ads, full stop. Google does not exist to provide you with answers to your queries, it exists to make money for shareholders.

> When you don’t have viable competitors both of those are improved by worse search results.

I don't think "worse search results" is the way to think of it. From G's point of view they're better because they make them more money.

It is Google's job, but they either aren't doing it or are failing at it. We could do something about it at least until a better alternative or a solution appears.

> Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?

Many influential people have tried and nothing seems to have transpired from it.

Google.com is the most popular website. I don't think the leaving of any minority group we manage to create would even matter to Google, let alone force them to fix the issue. Not that I discourage using alternatives.

Can we just nudge them to do so under the threat of an influential minority leaving

No. This is a classic mistake of intellectual types, who are impressed by each others' cogent arguments. But there is a much wider pool of people who are not, and among whom the intellectual types actual have very little influence, due to being boring and hard to understand (plus, it has to be said, kind of snobbish about how smart they are).

Now, you might reason that Google is full of smart people who should care about cogent arguments. But that assumes as an unspoken premise that Google's internal goal is to maximize the quality of the service and profit from being The Best. They passed that goal years ago are now so awash in money that it's cheaper to just squash competition than to innovate. They can be moved by threats to advertising revenue got up by angry crowds on social media (a market when they have little direct power), but Google would probably be delighted if grumpy nerds wandered off somewhere else. If they need talent or access to some compelling technology they can just throw a pile of cash at the problem.

>” Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?”

I sense Google is too big to cater to us like this. Despite a steady decline in quality, Google is still the dominant search engine and the competition isn’t even close to its market share. Not only would they not notice many of “us” leaving, the amount of change they would have to implement in order to satisfy our desires would end up changing the product for the rest of the market. On some level, the product managers must be satisfied with the metrics as they stand since Google is continuing with their current course.

>Isn't this Google's job?

Or more fundamentally perhaps this is just the system working as Google intended?

Google's goal isn't to create the best possible search engine, it's to have a search engine that is good enough that people won't actively seek an alternative at the same time that they put as much ad content as possible in there, again before it's so much that people seek an alternative.

I doubt many advertisers like the status quo very much either. They basically have to pay for ad placement to ensure the first results for their product aren't ads for competing products. On mobile when I search for Boox the first result linking to them is an ad. Same for Kobo. In other instances I'll search for company or product and a competitor ad is the first to show. So vendors get stuck paying for ads when their own site should probably be the first organic result, above the ads.

Are developers a small but lucrative target and so the suits at Google don't see the benefit of improving that experience by cleaning up the spam?

Google doesn't make money from people finding what they're searching for. Google makes money by keeping people searching.

Instead of spending energy to change Google, why not just leave them for good?

Start with changing default search engine to DuckDuckGo or something else, install uBlockOrigin and Privacy Badger to disable tracking, and gradualy reduce using every Google or application, starting with Chrome.

Be the change you want to see.

I relate to this opinion. There are two reasons why my suggestion might still be useful:

1. DuckDuckGo too is affected by these SEO-gaming sites, so maintaining a blacklist will help us make that experience better too.

2. There are times when only Google can find us what we're looking for, so this will prove useful when we go back to it.

No, google wants more clicks so they would prefer poor results that keep users searching

I think the disconnect comes from people expecting perfect search results as curated by humans, whereas Google necessarily must optimize for automated results. Automated results will never be perfect.

Are you paying them to do it?

Worst case scenario if Google drops the ball I just go back to the library.

I'm sure they have great books on stackoverflow answers, reddit reviews of products and opening times of local stores.

Luckily, stackoverflow, reddit, and your favorite mapping thing have their own searches.

Hey it might take me 30 years to find the answer but at least my eyes will thank me!

Google has a fiduciary responsibility to shareholders, which is so much work as it is! Why are you trying to ask them to do more?

The problem with this is illustrated in another comment where nixcraft's site, cyberciti.biz, was added to a personal block list. The content on the site does seem to be original and productive. I'd guess it was added based on the criteria of "I haven't heard of this site and the domain looks suspicious". I have a feeling that this will be true for other domains on this proposed master list. And the owners of those domains will have no recourse.

Specifically blocking github clones seems doable. Adding anything else needs equally specific criteria or it will quickly become subjective and unfair.

I wonder why Apple is not starting it's own search engine. I mean yes, they get >$1Bn per year making Google the default on iOS+macOS, but they have plenty of cash so they wouldn't need it. They would get immediately ~10% market share when it is launched, just because it would be made the default on their devices. From their they just need to present better search results than Google (which shouldn't be that hard right now) and can only grow further.

As another commenter here said "Google does not make money by helping you find what you are searching, it makes money by keeping you searching". That only works when there is no competition. But once Apple would be in the game, people would use what presents them with the better results. Right now, I don't feel there is real competition.

Apple is allegedly paid lots of money to not do this: https://www.macrumors.com/2021/08/27/google-could-pay-apple-...

Wow, $20Bn per year. This smells a lot like anti-competetive behavior, I wonder what happened to that lawsuit.

It's legally not anti-competitive behavior (as in, both Google and Apple's lawyers believe so) because it's just a 'default search engine' fee - everyone knows that iOS/Safari is the most lucrative platform to be the default search provider for, so a large number like $20 billion is to-be-expected. I'm sure that, in a lawsuit, Apple would argue "if someone else came and paid us $21 billion, we'd take it and drop Google/start a yearly auction" but no other search engine has that budget.

Apple is surely free to make their own search engine, and to an extent, they do - in Safari, the "suggested sites" feature is a search engine but one that only returns {single result | no result} and only works on iOS/Safari. On that same note, you can prove this by tracking search engine user agents for your website to look for Applebot/0.1 hits (if your site is popular enough).

apple already has its own search engine. the crawler is known as AppleBot and the results power siri search suggestions.

it’s limited to popular queries, so for many searches you may get ‘no results, search the web (google)’.

i made a bit buggy web front end for siri search so i could better play around with the results https://luke.lol/search/

But what would be their incentive to do so? Normally they launch products and make it exclusive to their devices so more people will buy iPhones, but that is difficult to do with a search engine. Otherwise they would have to get into the ad business like Google.

Apple is a publicly traded company, and every company needs to grow into new markets to make more revenue. And they also maintain their own browser Safari, even though on macOS they could just withdraw from market and leave the field to Chrome and Firefox. Even amongst macOS users Safari usage is very low and doesn't make Apple any money.

On the other hand you can see how Google is using its dominance in Search to push its browser and mobile OS - once you login to Google in Chrome on your phone, suddenly they can track you when you use their mobile Apps etc. And Apple is trying hard to grow in the "Services" field, i.e. through Apple Music and Apple TV - both available to Windows and Android users too. Just as they made a buttload of money with iTunes and the iPod because they also targeted Windows users.

Running a search engine is a massive money sink, regardless of its popularity. It's the surrounding ad network which makes money. Competing with Google and Facebook in that regard is an impossible battle, and something Apple has already failed at a couple times now. They have since pivoted into creating a privacy friendly image, so emulating Google simply does not make sense for them.

This is just my own personal preference, but I manage my own list of what is blocked or allowed on my systems. I would be concerned that a group contributed list for this category of blocking could quickly devolve into a group-think censorship dominated by whomever is the most devoted to blocking and extending echo bubbles to peoples browsers.

Seeing "how other people configure their tools" can be interesting. I love seeing how people configure their .bashrc with custom commands.

I don't think I'd want to download a list of the most blocked sites and plug it into one of my tools though, for some of the reasons you mentioned.

That, or it would be gamed by the SEO folks like they do every other thing that was once good.

This looks like a big, time-consuming project that would rely on a private Google API that can change any time. I think it's not worth to invest your effort into that. I wish more people would help to improve FLOSS, peer-to-peer search engine YaCy instead, https://yacy.net.

This improves other search engines as well, not just the Google universe. I'm sure even an opensource, peer-to-peer search engine will have similar issues of content farm content and gamed pages if it becomes large enough to compare with search engines like DuckDuckGo.

On the other hand it is absolutely ridiculous to conflate the difficulty of occasionally adding a domain to a local filter, and assisting to build a random unproven search engine. People volunteer their development effort for projects they personally find interesting or challenging. If you want more developers advocate for the project don't try to scold people for wanting to spend a small amount of their time refining a solution that works for them.

I’m not sure why you think that a domain blocklist would be harder than custom search engine development.

Plus there’s no private Google API here, just an extension that removes search results from the page. I suppose you could say the extension APIs are from Google (Chromium) but they’re certainly not private and are commonly used.

Doesn't this extension depend on how exactly the ads are presented on the page? Can't this be changed by Google easily?

> I’m not sure why you think that a domain blocklist would be harder than custom search engine development.

I didn't say this. The custom search is already created. Helping it's development is much easier now. AFAIK it's main problem is the lack of hosted servers.

I'm not sure about the specifics on how uBlacklist works but these[0] uBlock origin filters seem to be very basic and invariant to small changes. Google could certainly could make them harder to remove but given that they don't put up much of a fuss (to my knowledge) for their normal ads (YouTube not included), I don't see why this would be otherwise.

Fair enough about a custom search engine though, I misinterpreted your comment.

[0]: https://github.com/stroobants-dev/ublock-origin-shitty-copie...

See comments in this thread for a number of lists in progress.


How about support better search engines instead?

It's usually better to support efforts that have a greater chance of succeeding, than supporting the ideal solution that is unlikely to happen.

...assuming "support" is a substantive action and thus a finite resource, and not just an upvote on a social media page.

Which are better in this regard?

Kagi imo, will be paid though.

And waitlist atm (which I'm on). Having to pay for a search engine would be a feature not a bug, IMO.

It is based on Google and Bing, so you will get the same spam results

Their selling point is that you won't. You will get a lot less results with it too. Liking it so far, can't say I've run into spam with it

Why not use another search engine, such as Kagi, which has built-in support for this? At least for the programming niche, Kagi has worked really great for me for a month now.

Kagi is still in beta as far as I know. I sure hope someone will fill this niche eventually.

Is their plan to have the login screen on landing or is it just a beta thing?

If I have to log in to an account only to search, that'd be a no-go for me.

Indeed, I'm very happy with kagi so far

Sounds like we'd just need an HN uBlacklist subscription. Sourcing and validating submissions to the blacklist is the problematic bit. Perhaps use HN as an OAuth provider (not currently an available feature), use rules based on account age and karma for allowing or scoring submissions, voting system like HN ... sounds like something that might actually do better hosted on HN.

There is also https://www.mojeek.com/, I haven't tested it out in a while, so perhaps it has become better, but they should be striving to make it what Google used to be.

This was discussed around a month ago, leading me to this post:


and the consequent uBlock Origin list that is what I'm using as the so far better solution for this problem:


but it will need curation and updates over time, which I'm not sure the author is willing or has the time to do.

Offtop: Please notice that uBlacklist has nothing to do with uBlock Origin. They just use uBlock’s “brand”. Also uBlock Origin have “blacklist” feature, you don’t need uBlacklist if you use uBlock Origin.

Or just use search.brave.com

No problems there.

Comparison [0]

[0] https://brave.com/search/

Huzzah, that extension supports other browser engines as well. It's not nearly as atrocious an issue on DuckDuckGo but there are still some of those re-post heavy sites that aggressively get through, as well as some low quality content farms. It's nice to have a tool available to do local/personal fine grained refinement.

Here's the low quality sites, if anyone is curious... I'm sure you'll recognise them


Nah. Blocking isn’t the answer. What we need is a better search index.

I find it difficult to believe that relatively beginner NLP projects get posted here all the time, yet no one has adapted that stuff to create a new search index.

Personally I don’t know enough to really do this well, but I can tell just blocking sites from Google’s results isn’t the way.

I am keenly interested in this idea.

I sense that in the near future the paradigm of search engines will go from the current “index everything and become a universal answer engine” to “index a small subset of the Internet and become an answer engine honed towards a specific topic/domain”.

Shouldn't we make a plugin for SearX that learns from the results you click, so that the customization and machine learning is on the client side? That way search becomes a commodity, but the final selection algorithm's behavior is owned by the user.

That'd be pretty sweet, esp if it also could access my personal stuff (notion, bookmarks, etc to search those and put them higher...in context).

Why are we spending the effort to fix Google?

We make $0 doing this, they make...astronomical profits screwing it up. So we invest a bunch of time so they can continue to take in astronomical amounts of money while abandoning "don't be evil"?

Absolutely not.

Not sure if this is helpful, but here you go: https://someonewhocares.org/hosts/

Could anyone tell me why we don't add these domains to adblock filter?

Wouldn’t it be easier to simply invert the problem and come up with a whitelist instead? When searching for technical info there are only a handful of sites I use, namely stackoverflow and wikipedia.

I'd love to contribute. If my small contributions collectively have the potential to save many hours of others' time, it may end up the most impactful thing I do this year.

How about we just let the free market decide and just go with another search engine instead of trying to fix the perception of a broken product which we don't own?

Odds are some of things some HN users want blocked are startups started by other HN users. What if w3schools didn't yet exist but applied to YC tomorrow?

I just use duckduckgo with fallback bangs when the results aren't satisfactory to me, usually falling back to Bing.

Google doesn't deserve my time or my eyes.

A reply to those in this thread saying that Google should/will take care of this:

Imagine you had a position in a huge market that was as close to unassailable as there has ever been. Imagine also that you have a controlling position over the mechanisms that allow people to participate in that market.

Now try to make a case against optimizing for squeezing every last cent out at the cost of the user experience.

In 10 years we will regard Google the way we regard cable companies today. Maybe even worse since we need to be able to search for answers more than we ever needed cable TV.

The good news is that competition is much easier in the search market than in the cable TV market (since it's hard to run a new cable into everyone's house).

If comparatively tiny operations like DuckDuckGo or Brave can launch decent search engines, I think there's hope of reducing Google's dominant position in the market.

Thus could be really helpful for us at Mojeek, and generally helpful for all search engines we think, done as a list rather than as an extension.

Google's existing blacklist is half the reason (for me) that the results quality has declined so much. So this is not a great solution.

Blocking bad sites is just one side of the coin. You should also be able to promote sites that you are more interested in.

This is the goal of Entfer (Show HN thread: https://news.ycombinator.com/item?id=29799867)

Entfer will in the future also allow you to bulk export and import your personal rankings, so that they can be shared on GitHub, for example.

uBlacklist? You have that much of a problem getting relevant search results? Seems like bad search technique and paranoia. Seems crazy to have to maintain a blacklist for a few domains on specific searches. You can "search around" the bad domains with higher quality search phrases

Does HN block Google completely. I can't recall any search bringing them up even with the wealth of onfo

Ironically, when testing an example Google search from this thread, this thread came up. So HN doesn't block Google completely, but maybe has poor SEO so it rarely shows.

4th result is HN for me https://www.google.com/search?q=%22code+that+protects+users+...

My method of improving search results is to not use Google.

Shameless plug: I run okeano.com, a privacy friendly search engine. We support natively blocklists [0].

[0] https://okeano.com/blocklist

If only distributed search engine was possible...

How about a search engine that does this already?

Open source software FTW

I think we are attacking the wrong angle here. This should be solved at google

Google has precious little incentive to filter out ad-ridden spam content, given that their entire business model is the very ads these sites are plastered in.

In short, if you find what you are looking for, they get few ad impressions and make less money. Meanwhile, if you have to click through half a dozen spam results first, there are many ad impressions, and they make more money.

Are you sure it hasn't been (from their point of view)? I've got that old familiar feeling that there has been a change internally about what now constitutes a "quality" search result. Google has been moving toward optimizing for engagement over what we normally thing of as relevance for a while.

I imagine that Google's incentive is to first prioritize results that have the most profitable Google Ads on their page, and then for quality of result, not the other way around.

Until that incentive structure changes, Google will not be interested in solving this.

I coincidentally tweeted a reply[0] to your comment a couple minutes before seeing it:

> If I end up having to solve a problem that's someone else's fault, yeah, sure, it's unfair.

> But if I end up having to live with a problem because you insist that we wait for the responsible party to fix it, knowing that they never will, that's more unfair.

[0] https://twitter.com/lkbm/status/1478425875578302464

How long are you willing to wait?

Google is a lost cause by now. The only solution is a competitor. DDG is just as bad since they source results from google so that is not the solution.

DDG don't source any results from Google. They rely on Bing and increasingly their own bot.

Their own bot is only used to crawl stuff for their widgets.

> and increasingly their own bot

Except they so obviously don't. They lie through their teeth and their good at marketing, that's it. They've zero technical ability.

ddg sources from bing

Im afraid this is potentially dangerously political.

Are you only going to filter obvious spam and sites that republish other’s content, or are you going to block sites that are “harmful” or disseminate “disinformation”?

Who will get to decide which media bubble I’m in?

> Im afraid this is potentially dangerously political.

It only took 7 days for that other search engine project on HN last week (Mwmbl) to add hard-coded weights for certain news websites, so it does show how careful you have to be with this stuff.


What's the issue with that? Doesn't Google do this too, based on ssl availability for instance?

Well Google testified against having hardcoded weights for news websites in congress.

And the issue is because it creates filter bubbles and introduces bias into how information is discovered.

This is a browser extension that you have to explicitly install and configure. You get to decide which media bubble you are in (setting aside the fact that google may or may not curate their search results.)

That's hilarious.

"dangerously political", as if search ranking is not already political.

> Who will get to decide which media bubble I’m in?

Whoever you rely on for your consumption needs. Your search engines, content aggregators, and news sites already do. None of us have much agency online and pretending like we are operating objectively and rationally is delusional.

I dont pretend. I know what a cesspool of petty control the internet is. I dont want it to get worse though

It can be made configurable - an option for github clones, another for SO clones, etc..., the folks who want to filter "harmful" content will add their category

Who sets the default 90% of people won’t touch?

Someone, give me the late 90’s cesspool of un-curated internet back! Everything is so… trite now.

TL;DR: I strongly suspect that relatively small, personally-curated lists will be much more appropriate and highly effective. These might be augmented with specific classifications, but probably not on a widespread basis.

Though the proposed solution borrows heavily from concepts long used in email and Usenet spam, there are a few critical distinction in SEO SERP[1] spam which both make a widely-crowdsourced listing less applicable and less necessary.

In the case of email, your inbox is an unlimited resource to the spammers --- there's effectively no limit to how much spam they can throw at it. As there are also an effectively limitless set of source addresses (by either domain name or IPv6 addresses), and because email/Usenet spam is itself a quantity/numbers game with rapidly shifting origins, collectively-source and curated blocklists have value.[2]

A SERP is itself a finite resource --- the default is to display 10 results, and not making it into the top ten provides little reward. Moreover, high ranking search takes some effort and time to achieve, it's not like in email where a new server can spin up and immediately start deluging targets.

My experience with annoyances matching this sort (stream-based social media is one example) is that blocking a relatively small number of high-profile annoyances hugely improves signal/noise ratios. And I think that will be the case with SERPs as well. There are a half-dozen or so sites which tend to dominate results in most cases, and those can be individually blocklisted (if the capability exists). If more appear, they can similarly be removed.

The other factor is that quite a few sites which some people find exceedingly annoying and spammish, others find appealing. Coming to agreement on what to block, and classifications of such domains / sites, is likely to be difficult and/or contentious. There may be exceptions in specific instances (hence: specific classifications of unwanted results), but less so in the general case.

I might be wrong. The case of DNS adbloc, with PiHole as the classic example, shows that very large lists can be compiled and used. My own Web adblock / malware block configurations have typically had from ~10k to ~100k of thousands of entries. That said, the really heavy lifting is typically done by a much smaller fraction of the total. Power laws and Zipf functions work in your advantage here.



1. Search engine results page, that is, what you see in response to a query.

2. Even in the case of email spam, the principle value is largely from curated lists, usually by experts, e.g., Spamhaus.

In that case, we'd still need a repo bringing together all these individual lists. I couldn't finding anything like this.

I suggested a single list to prevent repetition and to limit the imports one needs to make to one.

So, my larger point is that no, that repo doesn't seem to be called for.

For malware and the like, repurposing extant DNS-based blocklists as in for uBlock Origin / uMatrix should be viable, and not require an additional curation effort.

Note also that we're looking at a browser extension, and as such, very large lists and memory load would probably carry significant negative impacts.

I've been building something similar to this with https://fantastic.link. Would love to get your feedback!

I think empowering individuals to curate the web would create stronger social and financial incentives to improve online indexing (I.e: Shopify vs Amazon). 20 years ago we could approximate quality from backlinks from credible sites, in the age of social media it seems this signal has shifted towards what creators, influencers, and online experts endorse.

I'm not seeing the relationship here.


Use a search engine that doesn't fuck up your results instead of trying to unfuck the results it gives you. Why go through this much effort to still give your money to google?

Any recommendation? I use ddg but it's Bing based and the results are no better.

My experience with ddg has been very positive over the years. Although I think my eyes have gotten good at skipping over the noise in the search results. It might be nice to offload that effort to software.

DDG has no idea of the intent of your search based on your previous searches and unfortunately this generates abysmal results for me when searching for something like `ruby xxx`. Those terms rarely return results about the programming language. I unfortunately end up using Google in these situations, always to immediate success.

One of the biggest reasons to switch away from Google is to shed that "previous search" bias, and refine + refilter results till you hone in on what you want.

Leaving it to "the algorithm" to decide what to show you based on what it "thinks you want" from previous interactions turns the tool into no better than Twitter or Facebook optimizing engagement by showing you what they think you should see.

It's safe to say that most people searching for 'ruby <term>' have absolutely no idea what the ruby programming language is, and didn't mean to find results for it. If you're running a non-personalized search engine like ddg, which results do you show?

For reference, Ruby is only has 6% on the Stack overflow dev survey https://insights.stackoverflow.com/survey/2021#most-popular-...

I really like that it doesn't "know" me. I've come to appreciate starting from a well-known shared state, often quite empty, and then ranging out from there in my own unique way. I don't want my view of the world to change; I want my behavior to change, which ultimately yields a view of the world that is under my indirect control.

>ruby xxx

Not sure if the X's were to denote wildcards or if you're after porn actresses named Ruby, but either way it might be worth adding a term to the query to clarify

That's literally the conversation though. Google has enough context to know that you want programming language results, not porn or gemstones.

Oof - muscle memory failure....

I'll take having to write verbose and specific searches over google guessing what I really want any day.

Google seems to have changed to using NLP + "what would the average person potentially mean by this", meaning anything actually specific is nearly impossible to find. I'll have half my search terms completely ignored

Pretty sure you’re using “xxx” as “some generic search term” but if you include literal xxx as a search term, you’re almost definitely going to get some skewed search results.

xxx ** ??? same thing, it's obvious by context

As I indicated in my comment, I was “pretty sure” from context, but fowkswe enclosed their whole search term in backticks. I’m not familiar enough with ruby to know if there’s some valid package called “xxx.” Based on fowkswes response to another comment, it’s clear they meant “some variable search term” and not literal “xxx.” Out of curiosity, I tried a few $language xxx searches on Google & DDG. I get mostly porn results for Rust, Ruby, Julia, Bash, Java in both engines with xxx appended. Interestingly, on Google, Python xxx returns mostly programming results.

I've been using Kagi over the last few days and have actually been pleasantly surprised with it performing better than both Google and DDG for my use cases. It's still free during the beta so might be worth giving it a shot.

I’ve also been using https://kagi.com for the past couple of days and have found the search results to be unparalleled to both Google and DuckDuckGo. I haven’t come across any SEO spam and the results seem to be of better quality overall. They have all of the same bangs DDG uses and get results from Bing, Google, and a bespoke index. Because they query Bing and Google, it will be pretty expensive when it launches (it's in private beta) since it costs $12 per 1000 queries, but since search is so important, it will probably be worth it. They say they're privacy focused too, but it's hard to be sure when a product isn't open source.

Kagi isn't trying to take significant market share from Google or DDG, but find a niche (probably Hacker News type people). The one thing I find they're lacking is maps and location based results (e.g. you can't search for "lunch near me"). If you're interested, I'd recommend checking out their FAQ: https://kagi.com/faq

Hmm, annoyingly some bloody invite based thing instead of just being able to try the damn thing. What is this, 2004?

Just tried and it seems we have an underdog here that could be useful to avoid spammy results. I can totally see why they want this to be invite only, as they probably need to find out how to scale well. If it would go viral suddenly, they might not collapse and the whole effect would backfire.

Edit: This is really promising and interesting. They show a summary of the link when you hover over the crystal ball icon next to it, together with a button for "Block" and "Boost" - i.e. you can vote with your account. These metrics could be used for ranking and kill affiliate marketers.

I'm on the waitlist. AIUI they intend this to be a paid service eventually, which is highly desirable to me. If it can be made to work, I would very much like a search service that is directly paid for by its users, rather than funded via side channels such as advertising or paid results placement that create skewed incentives.

If they're not ready to charge for use yet, a limited preview makes more sense than a public free service.

Signed up; I will gladly pay for better results. Google has sold its soul long ago to the capitalism gods. DDG and Brave are no better.

They also have a browser that’s still in beta. Here’s the direct link to test flight: https://testflight.apple.com/join/DeC8ZDnu (got this after a month on the waitlist)

Can you be very kind to provide the Mac browser link as well? It seems this link is only valid for ios and iPadOS apps

Apparently their invite, once you receive it, is free to share with anyone. Enjoy:


Might be a limit on the number of users - this link now gets an "expired" message after attempting to signup. Anybody else have one I can try?

It is in closed, invite-only beta.

Signed up for the beta, but their signup page has one annoyance... clicking back doesn't take you back one page in the signup process, it ejects you to the top but your answers are saved. Fine if it's an honest mistake, weird if it's an aesthetic choice, not a good idea if it's an indicator of how they plan to be different. It would be much better to prove how smart they are by fixing google's search than "fixing" well-established UI patterns.

How long did you have to wait for the invite?

2 days in my case

Thanks for sharing this link! I described my odyssey to finding craft printers in this thread [0], and just repeated the search on kagi.com. The results are better IMHO, even though the first hit is a link to some kind of Mindcraft-clone game that allows to craft printers. The second link directs at least to Canon's Craft Landingpage, and there are way less Spamsites and more Manufacturers links within the first 20 results. Those sites that are Spam, are at least a bit more disguised/offer more content than usual so I will let kagi get away with that.

[0]: https://news.ycombinator.com/item?id=29772136

Worth mentioning, that they actually have the blacklist feature for domains that google removed years ago.

I have been using Brave Search for the last half year, results seem to be better than DDG and you can append g! to your searches like DDG for Google results.

You.com has actually been really good for me.

After trying a few, I currently test Brave Search. It misses DDG's bangs, but results seem to be better.

DuckDuckGo is not Bing based.

It does in fact use the Bing backend for the majority of its web searches:

> DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (our crawler) and crowd-sourced sites (like Wikipedia, stored in our answer indexes). We also of course have more traditional links in the search results, which we also source from multiple partners, though most commonly from Bing (and none from Google).


Of their additional sources they receive the commonly from Bing, but that is only from their additional sources. They use the results from there sure but by no means are they "Bing based".

Their intention is to confuse you, so they're successful on that front.

Their "hundreds of sources" are for very niche topics and appear in separate boxes. The 10 links per page come from Bing which would make them Bing-based.

I don't like Brave Browser at all with its integrated crypto ads, but I've been using Brave Search for a couple of days and it seems nice, much better than DDG:


I use ecosia.org wich is like ddg, but a part of the proceeds go into charity. Not affiliated with them, just like it a lot.

Like ddg it's just Bing with a gimmick, but it's decent.

> ddg it's just Bing with a gimmick

Is that true? I've read images are served by Bing, not sure about the web search.

DDG is like "Dr Pepper" of search engines.

Coke/Pepsi create, market, manufacture, distribute their soda.

Dr. Pepper just creates and markets their soda. They don't manufacture or distribute their own soda (believe it or not, Pepsi manufactures & distribute Dr Pepper soda).

Web search is served by the Bing Web Search API.

Fully agree, its an endless cats'n'mice game, as SEO spammers create new pages on new domains with automated wordpress deployments in no-time.

Instead of a blacklist, a whitelist like a curated list of links à la Yahoo would be better approach, or something with upvote/reputation/karma as on HN.

Funny enough. Years back I poked around Google Custom search. As a laugh I used it to index and search Quora (which was still a high quality site then). But I contemplated setting up a series of specialty search engines (e.g., JavaScript) such that I would feed Custom Search a list of URLs and/or rules for what to index.

I never took any further than "what a cool idea".

I'm interested in that alternative too. Does a whitelist like that exist?

Wonder if the alternatives are really fixing the SEO spam problem by improving their algoritms or manually pinning sites like StackOverflow to the top.

Considering this method would be through an ad-block extension, it doesn’t seem as if they would be giving their money to google.

Side question, is there a max number of sites you can block on that list?

Edit: fruedian misread. I thought this was through ublock origin.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact