Here's my "fix the internet" idea: build a search engine that is itself ad-free,...

52-6F-62 · on May 16, 2020

Wouldn't an ad-free-only search result eventually prioritize for news/info sites that have other forms of sustenance outside of readership?

Digital subscription rates have been rising, but not fast enough to subsist most publications without additional ad revenue.

My concern is that such a search engine would allow private interests to create "news/media/info" sites directly that qualify while long-respected publications are ostracized.

Great idea if the primary intent is to index other types of content outside of that sphere, but based on the history of the modern web I think we could expect that to be gamed pretty quickly in a detrimental way.

Even just writing this comment I keep coming into thoughts and it's making me realize what an interesting subject of discussion it could be. There's a lot of questions in there!

chris_f · on May 15, 2020

I launched a new search engine a few months ago and have been looking at several alternative ways to fund it, but none of them seem to be a viable path.

The best balance of privacy and user experience seems to be display ads related to search term context that don't do any tracking (like billboards in the real world). But even that isn't an easy proposition and will need a custom network built to generate those type of ads.

A lot of people say they will pay for a search engine, but it is such a small niche that I believe the cost would be prohibitively expensive, and the search engine would probably still be subpar to Google in many respect. Would you pay $10/month, what about $29? Those are most likely where the monthly fees would have to be for this type of product.

calvinmorrison · on May 15, 2020

I would love a search engine that had a -ecommerce option. Sometimes I want to buy shocks for a f150, sometimes I want to see how to replace it. What i don't want is to wade through all the ecom sites.

freediver · on May 27, 2020

Can you point to a site that has f150 shocks replacement information that it not commercial in nature? (no product sales/ads/newsletters/affiliate links) I am very curious and have a hard time finding one.

dhimes · on May 16, 2020

This is exactly it. And I could fill in my own profile that I share. And when the switch is on the search engine would have access to it to filter ads and allow marketers to target me properly.

eevilspock · on May 16, 2020

Do you realize how much we are all paying on average already to Google? There is no free lunch. Googles money comes from advertisers who get that money from us -- nearly everything we buy today has advertising and marketing costs baked in. For many things, those costs actually exceed materials and labor for the simple product.

Excerpt from an old comment of mine [1]:

Mozilla Research put the entire web's advertising revenue at $12.70/month per user. In other words, if they are right we are living with the consequences of advertising for a mere $13/month, $13 dollar they still get from us anyway because it's baked into the prices of the advertised products.

---

[1] https://news.ycombinator.com/item?id=10043324

chrsm · on May 15, 2020

I used to work for an ad network (BSA, Carbon Ads, etc) that does not intrusively track and believes in the effectiveness of quality one-ad-per-page. Try contacting @toddo on twitter or email me (in profile) and I'll forward it over.

Anyhow, I'd love to see something like this. I've switched to DDG but often have to reach for Google if I can't find relevant results.

pwdisswordfish2 · on May 15, 2020

https://wiby.me

xrd · on May 15, 2020

I am learning Svelte, so I typed that in:

https://wiby.me/?q=svelte

Two links came back:

"Facts on Farts" "NiceJewishMom.com"

Not a good first impression.

ByteJockey · on May 16, 2020

In it's defense, svelte is actually a word that appears on both of those pages.

afandian · on May 16, 2020

One of the big questions about search engines is "what is it trying to do?"

The simplest is "search for doc that contaons" . But we're used to "search for document about concept" by now. ISTRC Bing called themselves a "decision engine".

I don't know what Google is/trying to be now, but it often thinks it knows my business better than myself, excluding critical words.

I don't think there can be a universal answer. The corpus gathering is a huge barrier to entry, but having a common corpus would still allow room for competition on diversity of querying methods.

ByteJockey · on May 18, 2020

Their niche is to search the "classic web".

Atleast the title in firefox says:

> Wiby - Search Engine for the Classic Web

That's also probably why you didn't find any information on the svelte framework.

cousin_it · on May 15, 2020

Very cool! Though it seems more like a hand-picked directory than a search engine with a crawler, as it doesn't find quite a few good noncommercial pages that I know exist.

burkaman · on May 15, 2020

Looks like pages have to be submitted here to be included: https://wiby.me/submit/

armyofbots · on May 15, 2020

I think humans are better at building such a thing if the aim is not gobbling up the entire web but only light websites that aren't commercial. I think that would be hard or impossible to automate. It definitely has an early 90's vibe in terms of results. Not terribly useful be but interesting nonetheless.

tylerchilds · on May 15, 2020

1. cool.

2. the purple for the links makes me think i've already clicked and seen everything

mlacks · on May 15, 2020

this is such a therapeutic way to internet surf. No distractions - just content. thank you so much

myu701 · on May 15, 2020

Added to my FF search. wonderful.

arm64future · on May 16, 2020

There is a hidden chat server on there.

armyofbots · on May 15, 2020

thank you for posting this, I've spent the last hour pressing surprise me

eevilspock · on May 16, 2020

i would love to see a whole TLD that was ad-free. Or perhaps go one step further create a whole alternate name service so we can ditch corrupt ICANN while we're at it. Either way you'd want clear borders for this ad-free internet and be able to legally deport violators.

34679 · on May 16, 2020

I like your idea, it's along the same lines as something I've been thinking about. I call it "The Web at Large and the Filters of Reduction", shamelessly taken from Aldous Huxley's "Mind at Large and the Filters of Reduction".

Huxley "The Doors of Perception" (1954):

>each person is at each moment capable of remembering all that has ever happened to him and perceiving everything that is happening everwhere in the universe. The function of the brain and nervous system is to protect us from being overwhelmed and confused by this mass of largely useless and irrelevant knowledge, by shutting out most of what we should otherwise perceive or remember at any moment, and leaving only that very small and special selection which is likely to be practically useful.

We've all heard the comparisons between the brain and the internet. And we've all been overloaded at one point by the vast amounts of useless crap on the internet. So what about a site that allows users to customize their filters of reduction? You could have popular profiles premade and ready for tweaking, or you could go raw and witness an endless stream of information raked from all over.

duxup · on May 15, 2020

Is there enough content where searching just the ad-free segment is ... worth it?

What happens if you're a top result on the add free segment and you start incurring some level of costs? Do you drop off if you add an ad?

Granted I really like the idea, I'd love to see it tried, but I wonder what all the unintended consequences would be / skeptical of the value of segregating "ad-free" vs. "has an ad of any sort" vs. the sites that really are a mess.

eloisius · on May 15, 2020

The add-free segment of the internet does indeed sound like a niche market. I wonder if quality of results could be drastically improved by just penalizing domains that contain a lot of referral links and ads. I remember the days when it felt like you could learn anything by searching the web a bit until you found some suitable content to read. Now it's like sifting through reams of affiliate spam that I have to at least skim before I can tell that it's low quality garbage.

Max_aaa · on May 15, 2020

How would the project sustain itself? (Hardware and hosting costs)

ShamelessC · on May 15, 2020

I'd honestly appreciate this enough to pay a subscription for it if there was a policy that meant the subscription fee was explicitly to avoid needing revenue from shady business.

I believe fastmail is like that.

pwdisswordfish2 · on May 15, 2020

Begging and government grants, like Wikipedia?

pkaye · on May 15, 2020

So why doesn't Mozilla go for government grants instead of getting money promoting Google Search?

hawski · on May 16, 2020

There would be drastically less content and the whole operation could be easier, because you probably don't have to worry so much about spam websites. That would keep it low cost already, not single DigitalOcean droplet cheap, but I think it could be sustained from a programmers pocket initially.

Donations/patreon for one, then subscriptions. It should be possible to offer a contentless index for download (I estimate it could be around 50-200GB) and that could be a paid option.

totemandtoken · on May 15, 2020

Not the exactly the same thing, but I had this idea for something I call "newsbetting." Basically, you get a de-titled article and have to bet as to whether it is a right-wing or left-wing biased source. The news source collects the transaction cost of the bet and the nonprofit platform is sustained through grants/donations. The goal was to provide an alternative revenue stream for news organizations which cuts out the need for ads entirely. I don't know if this idea could extend to other types of media but maybe there are ways, like sponsored vs non-sponsored content for tech news, replication prediction markets for scientific news, etc.

hawski · on May 16, 2020

That's an idea I had for a few years now. I started some motions [0], but progress was slow, because of life. I wanted to start with going through the Common Crawl [1] data at first for testing purposes and to calculate a rough percentage of sites being uBlock-Origin clean.

I think that such sites would be in ballpark of a few ‰. That would enable me to offer the contentless index for download. With delta updates and torrent for distribution it could be not that expensive, but that's a thing that I could charge for.

My intention is to use AdBlock rules like easylist to check whether or not indeed the page.

My initial code is fine in Go, but I lost enthusiasm for Go lately and careerwise it's not a good fit for me (I don't have much time to learn something not as useful for me professionally). So I started to rewrite it in Rust, while learning it, you can laugh now (Rust Evangelism Strike Force el oh el). It has an advantage with ready to use rules parser from Brave [2] and presumably high quality tokenizer from html5ever [3].

I want to use a tokenizer instead of a full parser to be able to do stream processing bringing costs down.

Common Crawl data lays on S3 so the processing must be done initially on EC2 to keep it low cost.

[0] Current Go code: https://github.com/hadrianw/abracabra

[1] https://commoncrawl.org/

[2] https://github.com/brave/adblock-rust

[3] https://docs.rs/html5ever/0.25.1/html5ever/tokenizer/index.h...

EDIT:

Also for the search part I want to use something more stand alone than Elasticsearch to offer desktop search with downloaded index. When I started with Go I wanted to use Bleve [4], now I'm not sure, but I think that Bleve is getting mature enough. I will worry when I will have some data to search through.

One of the challenges with this whole enterprise is a small need of JavaScript parsing. There is a common pattern, that for example Google Analytics uses, that uses a snippet of JavaScript to insert a proper script tag. But those snippets are very short so I think they may not need a full JS VM, maybe even a tokenizer would be good enough. Browser AdBlockers base on the site executing JavaScript already.

[4] https://blevesearch.com/