Almost all searches on my independent search engine are now from SEO spam bots

MicahKV · on May 16, 2022

So spammers have latched onto your search engine because they are getting useful results. They are able to systematically discover websites built on certain platforms that allow users to post content containing links, which they can target for link spam. It is very difficult to fight this on a technical level because there is an entire industry built around blackhat SEO, with all kinds of softwares and services dedicated to thwarting your defensive efforts. Even Google struggles to keep up with this.

However, they are also systematically feeding you their footprint lists. I imagine you could put together a footprint blacklist pretty quickly, and just stop returning results for any obvious spam queries like those containing "powered by wordpress".

It's not a very elegant solution I'll admit. It won't stop the bots from trying, and you may have to circle back periodically to add new footprints as they surface. But it's a potentially quick and easy way to stop rewarding their efforts, and the blackhat world is pretty used to burning out their resources so hopefully they will figure out it's a dead end and move on.

marginalia_nu · on May 16, 2022

> So spammers have latched onto your search engine because they are getting useful results.

I'm not sure about this. At least with my search engine, it doesn't really seem to matter what response they get, I don't even think they look at the responses. They keep hammering away with tens of thousands of queries per day with the requests even though they've seen nothing but HTTP Status 403 since last October or so.

My best guess is they're going after search engines in general in case they forward queries to google, in order to manipulate their typeahead suggestions.

miohtama · on May 16, 2022

Put a CloudFlare web application firewall at the front of the site and then use its rate limited / CAPTCHA features to throttle traffic. It is the easiest way to get rid of parasitic scraping and API abuse. Cost is $0.

marginalia_nu · on May 16, 2022

Yeah, that's essentially what I've done, except I'm paying for their cheapest non-free tier to have a bit more control over it. I really wish I didn't have to route all my traffic through an untrusted a 3rd party like that, but I guess we can't have nice things on the Internet anymore.

B1FF_PSUVM · on May 16, 2022

> I guess we can't have nice things on the Internet anymore

Not since it left the larval stage and became "pay for play", no.

Oh, well, those taxpayer-funded years were nice for those of us who were around.

bcrosby95 · on May 16, 2022

I think I remember wondering, after the dotcom bust, if the whole web thing would actually take off.

The reasoning I vaguely remember reading was that the internet required government subsidy to exist - at first directly, then in the form of universities, and the bust was a sign that it couldn't exist without one.

I don't remember how prevalent the view was at the time though. Obviously it turned out to be wrong.

kordlessagain · on May 17, 2022

Putting authentication on the site would be easier.

There a rub here, in that people expect to search things without being logged in. But then if you don't log in people, anyone can come calling, including bots. This then causes you to do things like get a third party to filter the data, which then affects the users by having to reroute their traffic to someone else to get rid of some of the visits you don't want from the bots.

And round and round.

Simple authentication to the site with tokens might solve the problem. If an IP comes calling that does so with out authentication, or payment, then hang the connection.

Schroedingersat · on May 17, 2022

> Cost is $0.

Cost is the slow enclosure of the internet hy a handful of giant companies and once attestation is universal having anyone without a locked down device be locked out of most of the internet without providing endless free labour.

FTFY

MicahKV · on May 16, 2022

Huh, well I guess there goes my theory about the incentive. What a bummer. I would have thought that at least with search engine scraping, they would stop expending the effort once the results dried up.

z3t4 · on May 16, 2022

Or put those query results behind an anti-bot/"capcha" test.

MicahKV · on May 16, 2022

That would probably help, but it's also a continuation of the cat and mouse game. There are plenty of captcha breaking services out there, it only cost about $1 to programmatically solve 1000 captchas.

noAnswer · on May 16, 2022

> There are plenty of captcha breaking services out there

Give it a try and see what happens.

People said greylisting against email spam wouldn't work, since spammers would just resend. It works since 20 years. To get your IP off the DNSBL NiX Spam you just have to follow a link. People said spammers would automate that process. Never happened in 19 years. Sometimes spammers are just lazy.

minsc_and_boo · on May 16, 2022

Sure, but it increases friction that forces a re-eval of cost/benefit of the bot(s).

Newest captcha services are a prediction score, not even a verification screen, and you can feed polluting data to bots you are certain to exist.

Calavar · on May 16, 2022

Agreed. I suspect that this is an arbitrage game on the part of the SEO spammers. Each search is cheaper for them than it is for a competitor who's using a major search engine with more extensive anti-spammer protections, and that difference equals $$$. A captcha doesn't have to be an unbeatable solution. It just has to provide enough of a barrier to equalize the cost.

MicahKV · on May 16, 2022

I'm not so sure about this. The spammers goal is to build up as big a list of link spam targets as possible. If one spammer chooses to only scrape minor engines and another only major engines, the one scraping the major engines will probably come out on top despite the higher cost. Whoever is abusing OP's search engine is likely doing it to supplement the data they are already scraping from the major engines.

For OP, I think simply not returning results at all is a more practical measure because it removes the reward completely. Captchas and bot detection keep the reward in play, while taking away the results entirely makes the entire pursuit futile.

jfim · on May 16, 2022

It might be a better idea to return low quality results than nothing at all. The idea is that it's pretty obvious when the bot is banned when it receives no results at all. Having to look at the results manually to determine whether one is banned is a much more time consuming endeavor.

MicahKV · on May 16, 2022

Well what I'm suggesting isn't about blocking the bots, it's about removing the incentive. So in this case, I think the more obvious it is the better. I would want them to realize as soon as possible that they are 100% wasting their time.

If anything, it might be best to return a page that explicitly states "Sorry, this search engine no longer supports SEO footprint search queries."

*edit for typo & wording

bornfreddy · on May 16, 2022

On the other hand, making content difficult to parse is easy to do and a very strong weapon. Make them waste dev time... It is much easier to make variants of HTML than it is to parse it. You can even automate it to some degree.

go_prodev · on May 16, 2022

Deliberately feeding the spam bots into an endless loop of captchas might slowly drain their accounts if they are paying 3rd party captcha farms.

jon_richards · on May 17, 2022

Then monetize by setting up your own captcha farm, but instead of paying for compute, send the captcha to the spam bots, who send it to another captcha farm and solve it for you.

anselmschueler · on May 16, 2022

As I understand it, the main point of CAPTCHAs isn’t to keep out bots completely, but to give enough friction to make automated attacks or uses infeasible, while keeping the friction low enough that normal users can still use it normally.

sylware · on May 16, 2022

... and there are the "click farms" with human beings.

z3t4 · on May 16, 2022

If someone pay people to collect data you could outright sell the data to them.

tofuahdude · on May 16, 2022

Captcha breaking is SO easy these days; even the modern captchas are easy to defeat.

Ikatza · on May 16, 2022

How about serving bots with one link per page, and taking a minute to serve each page? Would this impact their efficiency?

wolpoli · on May 16, 2022

Considering that as of Mar 12, this search engine only has 1001 sites indexed, I am not sure how useful this site is for getting SEO backlinks. Speaking of which, are backlinks still a thing these days?

bladegash · on May 16, 2022

They are, but the useful ones are those coming from sites with higher domain authority rankings.

That’s why you'll see fluff pieces (aka, paid content) from online publications like Forbes for the better funded entities.

Another approach is the reach out to site operators with offers of writing content or asking them to link to your site’s content in their existing content.

It’s expensive and/or incredibly time consuming to get back links that matter.

pstuart · on May 16, 2022

If the confidence was high enough, perhaps return garbage data?

gopher_space · on May 16, 2022

> It is very difficult to fight this on a technical level

It is when your base assumption is that you won't hire outside of engineering. There are more bored teenagers with phones than people creating quality content, so I'm not sure why you wouldn't just brute force checks against bad actors.

pascalxus · on May 16, 2022

just to throw out ideas: What if he decided to charge for each search?, say 1 cent or so. Users could purchase them in bulk, say 100 searches for a 1$.

The world is getting more and more desperate for a better search engine. the day may come, when people are willing to pay for better results.

miniwa · on May 17, 2022

what is the end goal here? i understand it's about making money somewhere down the road. but how?

john-radio · on May 16, 2022

Since everyone in this thread wants to jump down OP's throat about the quality of his web site, another interesting search engine is millionshort.com, which allows you to filter out the top N web sites from the results of your search. It's a great tool for looking past sites with good SEO; all you have to do is fiddle with the value of N.

For example, searching for "electronic music box" as /u/ajnin suggested, with the top 100K web sites removed from the results, filters out the following:

> These 23 sites were removed from your results:

> alibaba.com (1 result removed)

> aliexpress.com (1 result removed)

> allaboutcircuits.com (1 result removed)

> amazon.com (2 result removed)

> apple.com (1 result removed)

> bestreviews.com (1 result removed)

> ebay.com (1 result removed)

> etsy.com (2 result removed)

> facebook.com (1 result removed)

> instructables.com (2 result removed)

> lightinthebox.com (2 result removed)

> lumberjocks.com (1 result removed)

> mapquest.com (1 result removed)

> reverb.com (1 result removed)

> twitter.com (1 result removed)

> wikipedia.org (1 result removed)

> yelp.com (1 result removed)

> youtube.com (2 result removed)

And the top result ends up being https://midiguy.com/.

blisterpeanuts · on May 16, 2022

That's an outstanding concept. One problem though: wouldn't it also filter out high quality curated results?

H8crilA · on May 17, 2022

Yes, such as Wikipedia, for example.

synicalx · on May 17, 2022

I assume the idea is this is a secondary search, after Google has failed once again to return anything other than Etsy and Pinterest results.

It also seems fairly customisable, like I can search and include all results but choose to remove ecommerce, or sites with live chat (weird filter, but I like it).

mdoms · on May 16, 2022

Million Short also has an option to remove only e-commerce results which is invaluable if you still want results from sites like Twitter, Wikipedia and YouTube but don't want online shopping spam.

consp · on May 16, 2022

Would this also work for the fake-sites-stealing-text-to-look-legit sites since they quickly end up in the top results?

ajnin · on May 16, 2022

This made me curious to try that search engine so I typed "electronic music box" (first thing that came to mind). As far as I can tell none or the 10+ pages of results include all those 3 words. I mean, you might not have any relevant sites in your database (likely if there are only 1000 sites or so as another of your blog posts imply), and I understand you want to show some result to the user, but if I want irrelevant links I might as well go to google.com...

lubesGordi · on May 16, 2022

What the heck is an 'electronic music box'? I personally wouldn't expect those three words to show up on any sites served by a small search engine.

ajnin · on May 18, 2022

It's a music box that produces sound electronically, as opposed to traditional mechanical ones. I don't think it is that foreign of a concept. It might be present, or not, in the search results, it depends entirely on the niche and I could not now which it was by just reading the blog post. Anyway that was not really the point of my test.

thehodge · on May 16, 2022

Yeah same, I searched for Leeds grand theatre and the top result is something titled "June 2012 – Sam's Blog' which just mentions the word grand.

FargaColora · on May 16, 2022

You mention the "Dead Internet Theory" (not heard that phrase before!).

I agree: the WWW Internet is dead, that is your problem. No-one visits websites anymore, everyone has moved to the 10 biggest websites and all data is now siloed there.

If I want to search for something topical and relevant, I go to Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc.

The general Internet is dead: it's just legacy content and spam.

If you think it's bad for you, imagine what it is like for Google Search! Their entire business is indexing a medium which no longer has any relevancy. People complain that Google no longer delivers good results. But what can Google do? The "good content" is no longer available for them to index.

Want to become rich? Make a search engine which indexes the fresh relevant data from the big siloed websites, and ignores the general dead Internet.

marginalia_nu · on May 16, 2022

I built my search engine in part to explore whether this was actually true, and I don't think it actually is.

There's still a lot of organic human-made content still out there, possibly more than ever, it's just not able to compete with the SEO industry that completely displaces it from Google and social media.

ColinHayhurst · on May 16, 2022

Agreed.

> If I want to search for something topical and relevant, I go to Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc. The general Internet is dead: it's just legacy content and spam.

The "general" Internet is not dead. Though if you just want to participate in just Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord you might well think that.

Users of marginalia (author above), Mojeek (disclosure: CEO) and others [0] are well aware that there are riches of organic human-made content; from years back and new. Yes, a lot of noise too, which Google has a bigger (SEO) struggle to compete against. But still there is good and different content available.

To find good content, using search, you need to use "search" engines which enable discovery, as Google used to do so. I stress the "search" as the emphasis of Google, Bing and thus their syndicates is increasingly on being "answer" engines.

[0] https://seirdy.one/2021/03/10/search-engines-with-own-indexe...

Domenic_S · on May 16, 2022

> The "general" Internet is not dead.

For some things it is. Good luck getting a non-sponsored/SEO-gamed review of a kitchen appliance or particular vacation mode such as a cruise. It's flabbergasting.

Most times I just stick "inurl:reddit.com" in my search and try to get discussion threads about the thing I'm researching, but even that's getting filled up with shills.

throwaway894345 · on May 16, 2022

I think search engines are broken, but the Internet itself is probably not "dead". It's just our accessibility to that information. That's not super helpful until we have better search engines (which steer us away from this SEO stuff), but the good news is that building a better search engine is easier than resurrecting the Internet. In particular, there's a good chance that a niche, naive search engine might be able to significantly improve accessibility (e.g., high rankings for pages that answer user queries in the fewest bytes).

marginalia_nu · on May 16, 2022

¯\_(ツ)_/¯

http://www.jitterbuzz.com/indmix.html

http://www.alaska.net/~akpassag/

FargaColora · on May 16, 2022

These websites seem to be last updated decades ago, which is prehistoric to most casual browsers. There's no doubt there is great content on the general internet, but these examples I would classify as "legacy".

marginalia_nu · on May 16, 2022

I can see why the website owners would be interested in getting traffic to recent websites, but why would you be interested in recently updated websites?

metadat · on May 17, 2022

Stores typically stock recently manufactured products. Once the manufacturer discontinues a model and inventory is gone, that's a wrap. Sometimes the product was good and gets replaced by an inferior one (in the spirit of old burger king vs new post-acqusition burger king), other times it's just small tech refresh tweaks, and everything in between.

A real litter of inconsistency between unrelated external organizations and varying markets and skill sets.

ColinHayhurst · on May 16, 2022

Result #1 & #2 for kitchen appliance review (your personalised/local results might vary):

Google:

https://www.expertreviews.co.uk/home-garden/home-appliances

https://www.goodhousekeeping.com/appliances/

Bing:

https://www.which.co.uk/reviews/fitted-kitchens/article/plan...

https://www.goodhousekeeping.com/appliances/

DDG:

https://www.goodhousekeeping.com/appliances/

https://www.which.co.uk/reviews/fitted-kitchens/article/plan...

Marginalia:

https://www.infiniteeureka.com/shop-markdowns-on-small-kitch...

http://www.fullyramblomatic.com/essays/sarah.htm

Mojeek:

https://www.appliancesreviewed.net/

https://busybakers.co.uk/category/kitchen-appliance-reviews/

FargaColora · on May 16, 2022

Most of these are spam. They contain affiliate links to Amazon to buy the product which is being reviewed, therefore the the review cannot be trusted.

"Which" looks to be the exception, but that is a paid-for service.

It's a sad state of affairs.

kelnage · on May 16, 2022

I understand your opinion about affiliate links - but I use several review websites that use such links for all products they review, and have both positive and negative reviews for products. So I wouldn’t say it necessarily follows that affiliate links = biased reviews.

skinnymuch · on May 16, 2022

How often do they give their best review score or opinion to a product without an affiliate link? Not every product will have an accessible affiliate link.

Isn’t Amazon commonly used for most affiliate links or has that changed in recent years? Amazon isn’t the cheapest all the time any more. Nor is its customer support the top any more

zerd · on May 17, 2022

Also, I've noticed that the list of products reviewed is limited to only those that _have_ Amazon affiliate links. If a product is only available on not-Amazon stores, they don't even get mentioned. Which is a bias in itself.

skinnymuch · on May 25, 2022

Yeah that’s what I was thinking too. A big bias right there.

tmaly · on May 16, 2022

Everyone is trying to game the Google algorithm. The net result is all this long form content and cooking recipes that are 10 pages long.

There seems to be a big disconnect with a typical users attention span and the length of a post.

ajmurmann · on May 16, 2022

I thought the recipe thing was to be able to copyright them

labster · on May 16, 2022

That’s just gaming a different algorithm.

ajmurmann · on May 17, 2022

Yes, butt importantly not one Google or any search engine can do anything about

mc32 · on May 16, 2022

Sounds like we’re back to AskJeeves and a number of failed answer engines from a couple of decades ago!

ColinHayhurst · on May 16, 2022

AskBERT but now MUM knows best.

alxlaz · on May 16, 2022

This matches my findings 100%. The WWW is active and bubbling, but virtually all the cool websites I've found in the last 10 years or so came through friends, small IRC channels, or more recently through marginalia.nu :-). Google and friends are facilitators for the SEO and tracking industries, so of course they have zero interest to prioritize these things over content spam -- their whole business runs on content spam. But the WWW is as alive as it gets.

pmontra · on May 16, 2022

I take myself as an example.

People that know me and don't meet me regularly might know the URL of my web site and might care to look at it once per year and check if there is something new. Usually pictures and tales from holidays. Covid made those holidays less memorable so I didn't make any update since fall 2019. People that meet me regularly don't need that website, I'm telling them the tales first hand and showing them the pictures without being obnoxious. I guess that this website is a target for your search engine except it's not in English and your search engine seems to want English search phrases.

I don't have anything of value to share on a public chat like Twitter and I don't have an ego to pretend I do. I also don't use Facebook anymore. I go there once per year to like the messages that wish me happy birthday. I think it's polite to do so. All my media production is on WhatsApp or Telegram in group chats with people I know in real life.

If I really cared about producing content for the world I'd probably be using Twitter, Medium or the fad of the year and they'd take care of my SEO (do they?) or I'd be trying to score points on StackOverflow.

To recap: I never intended to compete on SEO. I'm really OK that my website is only for friends and spreads by word of mouth. It probably never did, I bet it's been on a flatline since I created it 20+ years ago.

api · on May 16, 2022

All open systems are destroyed by spam once they become popular enough to be profitable targets. This will eventually happen to the Fediverse too. If there is money to be made pissing all over the commons, the commons will be pissed all over.

It even happens to proprietary silos if they are too open. Look at how many bots and spammers infest social media. Propaganda and disinformation can also be considered a form of spam.

I realize this sounds cynical but don’t shoot the messenger. It’s just something I’ve learned watching the Internet evolve since the middle 1990s. Spam eats everything it can.

IMHO the future is enclaves and invite only communities. The Internet is a dark forest.

pixl97 · on May 16, 2022

It's not cynical, is how every system in nature works. Everything alive must develop an immune system or it is attacked and eaten.

marginalia_nu · on May 16, 2022

As old open systems are destroyed, new ones are created to replace them. The Internet exists in a constant state of rebirth and transformation. You really can't step into the same river twice.

nonrandomstring · on May 16, 2022

> You really can't step into the same river twice.

I love the maxim and philosophy of eternal refreshment.

Seems like the problem is more akin to having nuclear waste dumped into our rivers though.

pwdisswordfish9 · on May 16, 2022

> This will eventually happen to the Fediverse too.

Oh, don’t worry, the Fediverse will never catch on.

ffhhj · on May 16, 2022

Why? Serious question.

NoGravitas · on May 16, 2022

You are probably right about the future; not necessarily because of spam, though that's a part of it, but just because of the toxicity of global, open to the world, mostly public social media. The Fediverse has mostly coasted by so far on obscurity, but it's not great, and it's bound to get worse. All of my online socializing these days is either through short-lived pseuds on topic-oriented fora, or invite-only Matrix rooms.

indigochill · on May 16, 2022

How do you surface organic human content? I happen to linger around the fediverse/tildeverse sphere where I see organic content from people I personally have a direct (digital) connection to (and I started self-hosting my music after Epic bought Bandcamp), but I'm not clear on how I'd go about digging that kind of stuff up in the more general case.

ysavir · on May 16, 2022

It's not about surfacing organic human content, it's about only indexing organic human content. The problem is automated indexing. So long as indexing works according to defined rules, the advantage will be to those able to shape their content to those rules, and the spammers and scammers will win.

An idea I've had for a few years is making a social-network based index engine. The only pages that get indexed are pages that users themselves mark as worth indexing, and the only pages returned in your results are pages that were marked for indexing by people you added to your circles, or the people in their circles, or the people in those circles, etc (probably up to 5 or 6 degrees of separation).

kmeisthax · on May 16, 2022

...so, blogrolls?

ysavir · on May 16, 2022

Not familiar with blogrolls, but not quite. The idea is more to have standard search engine user experience, but with the requirement that each result is vetted by someone the user trusts, or trusts by proxy.

kthejoker2 · on May 17, 2022

> Not familiar with blogrolls

Not directed at you specifically but this is the actual problem.

We already had a good system for these things. Delicious, blogrolls, RSS, the folksonomy ..

nyokodo · on May 16, 2022

> up to 5 or 6 degrees of separation

So basically everyone on earth?

ysavir · on May 16, 2022

Alright, 2 or 3!

bornfreddy · on May 17, 2022

Sounds like a great idea, execution will be key...

marginalia_nu · on May 16, 2022

I do a traditional web crawl and exclude anything that looks too much like it wants a high google ranking. Nothing to it.

ratww · on May 16, 2022

This might be controversial, but I wish Google would exclude those websites too.

Google started punishing keyword spam, then it started punishing black-hat comment spam. Even Youtube backtracked on the "videos have to be 10 minutes to rank".

I wish they would do the same for carefully manicured SEO content farms too, as those sites are causing a harm worse than keyword-spammer sites did.

marginalia_nu · on May 16, 2022

They're probably doing all they can. The problem is their dominance, both means they have effectively an entire industry looking for loopholes in everything they do, as well as legal considerations (arbitrarily punishing individual smaller actors might skirt on the territory of anti-competitive behavior)

sdoering · on May 16, 2022

I fear that Google also has a conflict of interest here. A lot of these non optimized sites are not interested in making money via ads. So Google wouldn't profit additionally from leading people there.

And a lot of people (myself often times included) are looking for a quick answer. A good enough answer. So good enough, SEO optimized is being surfaced. The result of an optimization war on both sides combined with the inevitable monetary interests.

I don't habe a solution. Sadly.

ratww · on May 16, 2022

I think there's two kinds of SEO spam going on.

The black-hat kind is definitely made to extract money from ads. But those are easy to avoid for web veterans IMO. And I also feel that Google is doing its part, even though it's costing them money from those sweet ads!

But the white-hat kind, also known as content marketing, is made to let legit companies save money. Instead of paying for Google Advertisement, they get traffic by means of organic content. Think "Michelin Guide" or "Red Bull". Which is a jolly fine idea and responsible for a lot of good stuff, but the problem is that this has been taken to extremes, and now the web is littered with low-effort content made by freelancer writers getting peanuts.

I would personally prefer if those freelancer writers were doing 10 interesting Red Bull articles per month rather than 500 rehashes of contents from other websites. But who am I to judge.

In the news industry things are also very similar.

Nextgrid · on May 16, 2022

The "white-hat kind" can trivially be filtered out (or deterred) by downranking any of the crap these marketers use to measure their conversion rate - analytics, etc.

ratww · on May 16, 2022

I love this idea. Would be nice to see it in a search engine, or at least a browser extension showing how much analytics junk a site has before you click it.

Nextgrid · on May 16, 2022

Kagi has a non-commercial filter that I suspect uses the presence of ads/analytics as a signal.

galangalalgol · on May 16, 2022

Does anyone have an ad free search engine? You'd start with blacklists from ublock origin, pi-hole, and similar, don't bother even crawling those, then have easy reporting for new or self hosted ads. Not much money in it if any, but it would be refreshing. Might even have a mode to nix anything with a payment method on the site, or that links to a site with a payment method.

ajmurmann · on May 16, 2022

> Does anyone have an ad free search engine

kagi.com search.marginalia.nu

EVa5I7bHFq9mnYK · on May 16, 2022

Maybe back to Yahoo model of the 90s? Manually created collection of curated links?

datavirtue · on May 16, 2022

Yes. We have enough users now.

ajmurmann · on May 16, 2022

I love your search engine. Should I stop recommending it to friends to keep it safe?

I jest a little bit, but your comment genuinely makes me wonder if Marginalia++ is search results - Google - Marginalia

pixl97 · on May 16, 2022

Welcome to the billion dollar question. Any place that is authentic will face the zombie horde attempting to fake authenticity in order to capture attention.

tomxor · on May 16, 2022

I think your almost right, but it's not necessarily authenticity... I think it's just money.

Large "authentic" search engines can exist to serve the rest of the web, those personal blogs and other small communities. Those sites have a natural tendency to not be trying to turn everything into a revenue stream, so if that was the prerequisite for an engine, it would be a perfect match and naturally dissuade marketing types.

pixl97 · on May 16, 2022

Authenticity is worth money.

When you have a 'real' community you're talking about real people with real salaries and desires, add in that you tend to develop a real trust between members. Think of this as fertilized soil. You can grow crops in it, but weed seeds will eventually land and try to take over it.

HackerNews is a good example of this, it takes a healthy amount of moderation to keep things on topic where things like politics get peared pretty ruthlessly. If for a minute Dang gave in found ways to additionally monetize the forums, something that would be profitable for a while at least, things would start down a bad path.

sdoering · on May 16, 2022

I can only agree with my sister comment. I find this industrialized web more and more shallow and taxing to use.

While professionally I need to help (smaller, local) clients to reach their audiences I become more and more weary.

It is like walking through a supermarket with industrialized fast convenience food shouting in bright colors and advertising while ultimately not nourishing me like slow, real food could.

I am still looking for this digital slow food movement.

nonrandomstring · on May 16, 2022

> I am still looking for this digital slow food movement.

https://digitalvegan.net

Please read it, and if you enjoy it please suggest it to friends.

lovskogen · on May 23, 2022

Read the intro. So you find vegans annoying (because they 'are the future'), and your not a vegan yourself – and you write that digital veganism is more important than actual veganism. Now that's a way to start off well!

fifticon · on May 16, 2022

I second that independent sites exist - I maintain my own website on a personally run server. There are dozens of us! to quote a quaint phrase.

dylan604 · on May 16, 2022

And who uses your search? I had never heard of "you" until just now. And there is the problem with "new" search engines. Unless you can come up with what would have to be one of the greatest ad campaigns the world has ever seen, no significant number of users will know you exist. Where does the money to pay for that ad campaign come from? How will a search engine generate money to stay relevant? Once people see you becoming relevant, they will figure out how to game your system. It's just the nature of the beast. I don't think I'm being overly cynical about this either.

marginalia_nu · on May 16, 2022

Why would I need to generate money to stay relevant?

dylan604 · on May 16, 2022

<edit>The first </edit>relevant was the wrong word. sustainable would be more appropriate. on the assumption that hosting the search engine isn't free, and unless it is supported by a generous benefactor it will need to have a way of generating money to keep the servers running.

marginalia_nu · on May 16, 2022

I'm self hosting so my operational cost is like $50/mo.

throwaway14356 · on May 16, 2022

then he must be relevant

_ktx2 · on May 16, 2022

Agreed, the general internet is not dead, but the majority of internet users are on Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc.

From my perspective, we onboarded a lot (if not most) people to the internet after 2007 (the explosion of social media). People sticking to big sites really speaks to an inability to explore the larger internet and a lack of knowing why you would even want to.

kthejoker2 · on May 17, 2022

I think the answer is in the name: "social" media.

Most (99%) people use the Internet most (99%) of the time to see or hear what other people are up to. The big sites are where all the other people are. QED.

(This comment falls into that space)

Vladimof · on May 16, 2022

I added it to my list of search engines on Firefox... your favicon is really small, that's on purpose?

boplicity · on May 16, 2022

> No-one visits websites anymore, everyone has moved to the 10 biggest websites and all data is now siloed there.

Really? We make our living running a small web based publication; around 40k readers a month. I know of many other sites like this. Google, and other search engines, depends on niche websites to provide quality search results. Without sites like ours, the internet would truly be dead, and search would be mostly useless. Our "traffic sources" come from a mix of Facebook, Search, Reddit, etc, in addition to our many loyal readers.

Others in our niche are producing blog spam, which looks nearly identical to people who aren't experts in the field, but we have real experts, fact checkers, etc, as part of our production process. This is a big problem: These low quality websites get similar rankings to our own, which does make it much harder for people to get quality information via search. (Hence the general shift towards trusting social recommendations, such as from Reddit.)

In short, the WWW is alive and well, it's just buried under a bunch of #$#$%.

rchaud · on May 16, 2022

> Our "traffic sources" come from a mix of Facebook, Search, Reddit, etc, in addition to our many loyal readers.

40k/mo is a pretty good number for an independent website. As a word of warning though, relying on social media reach is a dangerous game, as there is anecdotal evidence that tweets with outbound links don't get as many impressions as those that link to in-site content, like another Twitter post.

As for Facebook, well, there's a good comic from The Oatmeal (enormously popular on FB back in 2010) that talks about what happened in the long run:

https://twitter.com/Oatmeal/status/923250055540219904

matheusmoreira · on May 16, 2022

The internet itself is probably gonna die soon anyway. Every country wants to impose its own laws on it. I think it'll eventually fragment into multiple segregated continental networks, if not national ones, all with heavy filtering at the borders.

I'm happy to have experienced the free internet. Truly a jewel of humanity.

dreen · on May 16, 2022

I think this was inevitable all along, something similar happened to radio if I'm not mistaken.

However, the good news is that we will never stop reinventing everything. The real value of the old internet was showing us what is possible.

nonrandomstring · on May 16, 2022

> The real value of the old internet was showing us what is possible.

Of equal value is that it showed us what not to do.

We have 30 years of documentation for research on exactly what a successful intra-planetary network needs to be immune to. A successful future network must build-in resistance all forms of human pyschopathology from the ground up.

pde3 · on May 16, 2022

This is a nice fantasy, but it's a fantasy. The tech stack and network we have is too dense a forest to be replaced by clean slate designs. But maybe some of the problems could be improved with some new platforms and APIs. Mind you, ML is making so much progress so quickly that what happened over the last thirty years is at best a partial model of the problem we have to solve now, and the tools we have to do it with...

nonrandomstring · on May 16, 2022

> ML is making so much progress so quickly that what happened over the last thirty years is at best a partial model of the problem we have to solve now, and the tools we have to do it with...

Sorry I don't see how ML can help here. It seems like another thing to pin hopes of repairing an already too broken system on.

"We cannot solve our problems with the same thinking we used when we created them." -- Albert Einstein

"A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it." -- Max Planck

We are the dying generation my friend. We built it. They came. It didn't work. Surely if ML can do anything it's telling us that we need to tear down the old system completely and start again, don't you think? Adding sticking tape won't help.

edit: turning a grunt into an honest question

cesarb · on May 16, 2022

> I think it'll eventually fragment into multiple segregated continental networks, if not national ones

That's exactly the world in which the Internet grew. There were multiple segregated national and sub-national networks, and the Internet was built as a means to interconnect them. After some time, the Internet protocols ended up being used even within these networks, but that was not originally the case. And even today, there are still things like the AS (Autonomous System) concept which permeates the core of the top-level Internet routing protocols, which still reflect the Internet being a "network of networks" instead of a single unified network.

That's why I'm not too worried about the Internet fragmenting; we've seen this before. What happens next is gateways between the networks, and there are already shades of these in the VPN providers which allow one to connect as if one were located in a different network, often from a different country.

Whiteshadow12 · on May 16, 2022

This made me sad, the optimist in me believes that some alternative will be built, that could take us back to those days. Honestly I do feel for most of my life I experienced an American Internet mostly (From South Africa), as long as one can still hop from one internet to another, in as simple a manner as possible it might not as bad as it could be.

matheusmoreira · on May 16, 2022

I'm sad as well. To me it feels like we're already living in a cyberpunk nightmare, things just keep getting worse and there's nothing anyone can do to stop it.

7sidedmarble · on May 16, 2022

The networking may have been open like that, but I'm not sure the content ever was. It seems to me like a lot of internet users consume mainly the content of sites from their country. Kind of hard to blame them when that content is probably going to download fastest. But the language barrier has also kept the internet from becoming truly global.

kmlx · on May 16, 2022

> I think it'll eventually fragment into multiple segregated continental networks

i think it already has.

the Great Firewall of China is the classic example, but I think the trend started in the west with the Right to be forgotten/right to erasure in Europe, and subsequent HTTP Status 451 Unavailable For Legal Reasons. GDPR just further cemented the split between Europe and the rest, and the new DMA & DSA regulation in the European Union finally makes it clear. The writing is of course on the wall, so countries like India or Australia aren't too far behind. Places like California also have their own "right to be forgotten", and I'm sure the US will not be left behind for too long before we see regulation further splitting their internet from the RoW. And I don't think the RoW will hold off much longer till it also splits into multiple big blocks. It's the start of the new "nationalist" internet, and I'm sure we'll all be poorer because of it.

matheusmoreira · on May 16, 2022

Exactly what I mean. There is no way to have an international network with national borders. Telecommunications providers have always been centralized and have always been in bed with the government. Only way we'll ever be free is if someone invents some kind of decentralized long range wireless mesh network.

eloisius · on May 16, 2022

Good luck, spectrum is highly regulated in every country I can think of. If national governments don’t want you networking across borders, you’re definitely not going to be broadcasting long range radio transmissions that way. In fact, it’s currently illegal to transmit encrypted data or to relay packets via ham radio in the US.

matheusmoreira · on May 16, 2022

Who knows? The whole point of decentralization is for there to be so many nodes in the network they can't possibly take them all down so that it's pointless to even try. What if all smartphones formed a mesh network? There aren't enough prisons in my country for all those criminals.

eloisius · on May 16, 2022

I agree with your ethos, but I don't share your optimism. If the state wants to enforce networking firewalls along national boundaries, no technological solution will save us in general. As a resourceful techie with the right know-how you may be able to sneak your packets through, just like people in Cuba receive a literal packet of data via sneakernet, but if the state doesn't want widespread meshnets circumventing their firewall, they will imprison you for emitting pirate radio signals, they will penalize any electronics manufacturer that makes non-compliant hardware, and rest assured that companies will go right along. Liberty requires more than technical solutions.

I'm saying this as someone who once wrote a decentralized P2P mesh for instant messaging[1]. I was inspired by the HK protests going on ~2014 after hearing that they were using Bluetooth chat apps. Luckily Matrix, Telegram, Signal, etc. mostly solved the problem. Still, I don't think any amount of mesh networking would turn back the tide of Hong Kong now.

[1]: https://github.com/zacstewart/comm/

groby_b · on May 16, 2022

>What if all smartphones formed a mesh network? There aren't enough prisons in my country for all those criminals.

There don't need to be. You publicly gruesomely execute the first 100 or so you catch, and the practice of running a mesh node on your cell phone will fall so far out of fashion that the network breaks.

Societal shortcomings cannot be fixed via tech alone. If you can't build a society resilient to authoritarianism in the first place, tech will not help you. It can be used to increase resilience, but that's far from fixing the problem by itself.

politician · on May 16, 2022

Like Starlink?

matheusmoreira · on May 16, 2022

Starlink is maintained by a company, it's an internet service provider. One visit from the police and they'll censor anything.

The mesh network should be made out of common hardware in order to be viable. I'd suggest phones but those devices are owned before they've even left the factory.

Nextgrid · on May 16, 2022

One visit from the US police. US-unfriendly countries have no leverage over it, and similarly, the US has no leverage over satellite ISPs based in countries they aren't on good terms with.

jrockway · on May 16, 2022

> US-unfriendly countries have no leverage over it

"Star Wars Episode 10: The one that's not fiction."

Nextgrid · on May 16, 2022

Internet censorship isn't worth going to war over and disclosing secret anti-satellite weapons that are better saved for a rainy day.

jrockway · on May 16, 2022

It's probably easier to just cut off outgoing payments to Starlink anyway. They're not a charity, so if they don't get paid, they probably don't want to provide service just to send a message to some random government.

On the other hand, if you want to demonstrate that you have anti-satellite capability it's probably a better idea to shoot down a corporate satellite than a military one. The Soviet Union shot down Korean Air Lines Flight 007 and it didn't start a war, after all.

Nextgrid · on May 16, 2022

> It's probably easier to just cut off outgoing payments to Starlink anyway.

Cryptocurrencies might be a problem in this plan, and satellite internet access itself might become a currency (since unlike cryptocurrencies, this one both has almost an intrinsic value and provides its own infrastructure that's very hard to block, where as cryptos rely on external sources of Internet access).

It also depends - drugs have consistently won the war on drugs despite being a physical product that needs a local supply chain and various anti-money-laundering and banking/finance regulations that should make it hard to fund the operation. Satellite internet access is likely to be even easier as it doesn't rely on a physical product (if we reach this stage there's going to be clandestine satellite terminals built locally, so blocking shipments of the real thing isn't going to cut it).

The only solution, apart from North Korea-levels of isolation (and even then, NK has the advantage of their population being isolated & indoctrinated since birth, something most other countries won't achieve even if they turned authoritarian overnight) would be detection followed by harsh punishment, but this has the downside of not only wasting the disclosure of detection capabilities (that are useful to the military) but also outsourcing the R&D of evading such capabilities into the open which enemies will no doubt pick up on too and use against you in a conflict.

ricardobeat · on May 16, 2022

Starlink connects to standard internet gateways on the ground. It cannot function without the 'regular internet', unless a replacement appears.

dotnet00 · on May 16, 2022

IIRC there was mention of it providing some p2p network style communication capabilities for Ukraine's military, and one of the reasons it's appealing to the US's military is the ability to route communications entirely within the network (well, with the gen 2 satellites which have laser interconnects).

So it can (at least eventually) function without 'regular internet', although I would still be hesitant to call it a viable infrastructure choice if the goal is to get around government control, simply from how much SpaceX have to appease the government to do anything space related.

black_puppydog · on May 16, 2022

These discussions always make me recal Jacob Applebaum. Think of him what you want, but this statement of his really stuck with me at the time. Paraphrasing:

The real dark-net is facebook. Everything that goes in there never comes out again and is basically invisible to the world, except if you join facebook yourself.

My own prime example of that used to be pinterest: it seems to be a 100% sink in the directed graph of internet links. But since Applebaum stated this, instagram (also facebook of course) is trying hard to push pinterest off that particular throne.

LegitShady · on May 16, 2022

to me this is also discord - which seems to have become the chose alternative tk online forums for many communities and basically hides what used to be the public face of those communities.

Gigachad · on May 16, 2022

Interesting thought. I just went though my browser history and realised that almost every time I use google search, I already know what website I want, I just don’t know the exact link/page. I’ll use google because the search on stack overflow or reddit sucks but I know I’m looking for a page on one particular site.

Pelam · on May 16, 2022

I realized this too. I disabled search from address bar and started bookmarking everything even remotely sane I see. I often add a few personal keywords to the bookmark bar.

It is starting to pay dividends. Instead of weird stuff thrown up by google when I type in something, I get the "oh yeah, that was the page" from a short list of bookmarks shown to match the words.

npilk · on May 16, 2022

I had the same realization and ended up setting up a simple Cloudflare script to automatically do an “I’m Feeling Lucky” style search to return the first result: https://notes.npilk.com/custom-search

shortformblog · on May 16, 2022

I think this is a tad reductive, but I will say that we sure let a lot of big companies convince a huge portion of the population to create all of their content on platforms that they have no real control over.

The problem is, many of them didn’t realize this was a problem until recently.

That said, plenty of exciting stuff is happening outside of the walled garden, as long as you know how to find it.

Gravityloss · on May 16, 2022

And not only did this happen already over a decade ago, a lot of the current internet users have never known anything else.

We had a discussion with coworkers and somebody mentioned irc. Explaining to younger colleagues what it was and that it was not a product of a company, but operators had servers that formed a network, and it was more like infrastructure. Felt weird.

kasey_junk · on May 16, 2022

Most of the kids in my 3rd graders peer group understand federated infrastructures quite well because of Minecraft.

Perhaps it wasn’t the federated nature of irc that was surprising but the fact that it was irc?

mst · on May 16, 2022

Isn't minecraft more decentralised than federated?

IRC networks usually have multiple servers connected together (historically, often run by a bunch of different people) and I didn't think people self-hosting minecraft servers usually did that?

shortformblog · on May 16, 2022

I think honestly it highlights the power of marketing as much as anything else. In some ways, building an open network is always going to put you at a disadvantage to a company that can throw money at user acquisition and PR teams. That federated networks like Mastodon have seen growth reflects the fact that word of mouth still means something in 2022.

Elvie · on May 16, 2022

isn't Discord a bit like IRC used to be?

ori_b · on May 16, 2022

How do I connect to a self hosted discord, and then connect it to my friends self hosted one?

And where do I get the RFC for the protocol so that I can write my own compatible implementation?

IRC isn't a product. It's a standardized protocol sufficiently simple to implement in a day or two.

mywaifuismeta · on May 16, 2022

I no longer see Google as a neutral "search engine" the way it used to be. Now it's just another company that owns and promotes certain types of content, no different from reddit. For some things Google has the best content, for some things Twitter or Reddit have the best content.

maxwelldone · on May 16, 2022

Back in 2000s Google used to be the place for any type of search (IIRC).

Now, I've been conditioned to use it only for specific use cases, mostly for convenience. Some examples include:

1. Anything programming related (searching for man pages, error codes etc) is straightforward. (I do have some UBO filters to exclude SO copycats)

2. Utility stuff like currency conversion, finding time in another city, weather etc.

Where Google has really fallen behind is in multimedia search. Not sure if it's due to copyright issues or not but Bing and Yandex provide way better service in this regard.

Not to mentions the "reddit" suffix I need to add to any search that even remotely calls for public opinion. In many cases, Google is just a shortcut to take me to the relevant subreddit.

ufmace · on May 16, 2022

Programming-related stuff seems to have gotten a lot worse in the last couple of years. Now most terms, at least for common things, return a ton of blogspam, when the official docs or SO are usually the best source.

LegitShady · on May 16, 2022

another thing seems to be prioritizing current news over past news which makes searching for old.articles youve read quite difficult.

photochemsyn · on May 16, 2022

I find one of the best ways to find interesting content on specific subjects using Google is now to start blocking all their top returns (a lot of SEO spam). This is somewhat tedious (lots of -site:seospam.com) and Google doesn't like automated queries. However, a few rounds of this often turns up interesting content down low in the search results. Just don't take what's on offer on page one of search results, basically.

Where it's gotten really bad is on news searches as Google either now has some kind of shitlist of independent news sites that it won't allow to show op on, for example, site:youtube.com searches - or, it's filtered through a guest list. It's hard to tell which strategy they're using, but news is definitely being heavily filtered based on very dubious propaganda-smelling agendas.

xvello · on May 16, 2022

You might be interested in using uBlockOrigin and https://letsblock.it/filters/search-results to easily block these domains. In addition to your own domain list, you can use the community-maintained SO / github / npm copycat lists.

dixego · on May 16, 2022

Google is an advertising company. It has been for a good while.

big_blind · on May 16, 2022

Yeah I use you.com and kagi.com. No advertising on either. Less SEO spam too it seems.

Cthulhu_ · on May 16, 2022

I don't believe the WWW internet is dead; there's still millions of webpages being made and published every day. However, the traffic numbers are skewed in favor of the big socials and aggregators; I wouldn't be surprised if the 80/20 rule applies there.

pnutjam · on May 16, 2022

There seems to be a tendancy towards video that undercuts the "old internet". I prefer instructions in a text or list format, but that's almost impossible to find for things like, changing the headlight bulb on my traverse.

1. turn the wheel so it is pointed hard in the direction of the bulb you are changing.

2. remove the hex screws from the shroud in the wheel well

3. pull the shroud down, it's pretty flexible plastic.

4. reach up and change the bulb. The wires are a bit short so you might need to get both hands in there. I have big hands and I'm able to do it.

---- There are innumerable videos explaining this process, but very few text directions.

ElevenLathe · on May 16, 2022

I think this is actually because real, fluent literacy is still rare even in highly developed places. It may be easier for a very literate someone to dash off those instructions but most people are 1000x more comfortable making a little video. Same goes for reading vs watching the video.

This is my same theory about meetings being universally preferred to asynchronous email, even when literally all the questions someone asks at a meeting have already been answered in my long form email.

Most people, even if they can read, are not really comfortable with it. Doubly so for writing. There used to be no choice to function in society, but increasingly we can use technology to substitute for reading and writing effectively, so people do.

pnutjam · on May 16, 2022

You're probably right, it's just so frustrating.

I think I'm going to start compiling stuff like this in my git repo.

Jiro · on May 16, 2022

Even something like that flounders on the question "these instructions say to pull down the shroud, what is a shroud?" or "I can't find those hex screws, where are they located?" Repairs are inherently visual, although text with illustrations might work.

captainmuon · on May 16, 2022

But Twitter, Reddit, HN, and most other such places are just websites and can be indexed fine. Same with Wikipedia, which is very much a silo (they don't have regular links in text in the hypertext spirit, but only footnotes).

Facebook and Instagram are more of a walled garden, like Quora, but there is a lot of junk there anyway.

It's sad for the WWW, but I don't really think it is a fundamental problem for search engines. In fact Twitter for example gives a direct pipe to Google. If you tweet something, it is immediately findable. Similar for StackExchange, but there I think the site is so "small" that Google can afford to just continuously index it.

ratww · on May 16, 2022

Twitter and Reddit still can be indexed, but they've also become increasingly hard to use without an account. Reddit doesn't let you fully expand threads when you're unlogged. Twitter limits the amount of things you can read and shows a modal. Both of them heavily limit usage on mobile devices without installing an app.

Sure, an account is free but might require giving information you don't want to give. Twitter asks me for a phone number a few minutes after creating an account, even if I don't post anything). Reddit at least lets you skip giving an email.

Sure, there are workarounds such as using lite versions (old Reddit, mobile Twitter), but that's not known to all people coming from a search engine.

It feels as if HN are the only one that's not a partially walled garden yet (and Wikipedia of course).

airstrike · on May 16, 2022

> Reddit doesn't let you fully expand threads when you're unlogged.

that's what old.reddit.com is for!

FargaColora · on May 16, 2022

old.reddit will be gone soon, it is inevitable. Especially once they go public.

aceazzameen · on May 16, 2022

Yup. It's bound to happen. And when it does, Reddit will no longer exist in my eyes.

azemetre · on May 16, 2022

Agreed. IDK how I feel about Reddit. I've been on it since 2010 when Fark lost its spark. I remember some great times but a lot of it was "junk" content that in the end was very meaningless. I wish I could say I used it to develop my career in tech but that isn't true either; I use specific blogs, books, and tutorial sites to learn instead.

I suppose I mostly view it as a continuous party, yeah it's fun if you attend but after a few hours I wish I was doing something more productive.

ntauthority · on May 16, 2022

Isn't it a bit ironic that a site - or its operator - 'going public' means all the content on said site actually 'goes private'?

ratww · on May 16, 2022

Exactly, I mentioned it. But not only it's bound to go away sometime, it's also not trivial to find to anyone who's not an expert Reddit user, unfortunately.

TheRealDunkirk · on May 16, 2022

And isn't great to get a link to Reddit or Twitter, and you click the link, and try to navigate to the comments for context or the answer, and you go to click the link to expand it, and then you get a demand to log in and install their app? Don't talk about walled gardens and not include Reddit or Twitter just because they let you look at one brick before demanding their tax.

simion314 · on May 16, 2022

This is not true, maybe for a subset of Internet users.

For example you have Wikis and forums. Wikis are good for communities that are passionate about a topic and they collaborate on buidling content for their passion. Reddit is a valid alternative to forums but if the community s older and has members that are technical competent then they usually have the forum customized for their purpose and the forum will continue to exist , especially if you want to avoid some third party censorship.

I never ever search for something and found answers on Facebook, sometimes very rare I find something that points to Instagram blogs/posts but never Facebook.

Probably depends on your location and what you search for, so it might be possible that 99% of your Internet consumption is satisfied by 5-10 websites.

baxtr · on May 16, 2022

I am not so sure...

I think what happened is this: the WWW was everything back in the days. But in the "old days," only 10% of all people were online, the web elite. Then, AOL came, and the rest came online slowly but surely. The so-called "mainstream" people were no geeks, and these people were "just" ordinary people. Almost all were captured by what you call "big websites".

Now, we see the 100% being dominated by the 90%. That's why "Google results are bad". Bad for us! Not maybe (most probably) not for them.

nl · on May 16, 2022

Eternal September was Sep 1993. AOL hit the internet in March 1994.

Netscape didn't launch until December 1994 (and the WWW was nothing before that. I subscribed to a mailing list with new sites that were released and I'd visit most new websites on the internet on most days with the Cello browser in my uni labs most days).

AOL users have been there since the beginning of the WWW.

https://en.m.wikipedia.org/wiki/Eternal_September

CWuestefeld · on May 16, 2022

My recollection is that the AOL event you reference was only making usenet accessible - a point that makes good sense in the context of the eternal September.

But when talking about the WWW, that's a very different story. I think that AOL didn't incorporate a web browser until quite some time after that.

nl · on May 17, 2022

The WWW took off when Netscape shipped in late 1994.

AOL users could use Netscape from the beginning.

jspaetzel · on May 16, 2022

This is so incredibly false, I've been working on a project for the last six months and MoM I've seen steady increase in usage. Tbh much much higher usage then I expected. Most users find my site via Google or Facebook however they are looking for content that's not in those silos and have no problems leaving them.

If you have high quality content and you get it indexed properly by Google, users will come.

There are reasons users are not using your website.

1. It's not solving a problem people have.

2. Users can't find it.

Who, in their right mind searches for search engines? Nobody I know.

If you want users you have to go out and get them (literally pound the pavement and talk to people) or create a LOT more content ironically, so they can find your site on the search engines they are using today.

psyc · on May 16, 2022

Based on my observations over the past year, I’m certain that Google and Bing choose not to show us most of the web anymore.

I usually find what I’m looking for. It just takes literally three orders of magnitude longer than it used to for the same kind of stuff. I used to use Google a lot to jog my memory about various things I vaguely remembered. Type a few associative words and snippets, press Enter, done. Google’s useless for that now.

If you’re looking for hot pop shit in trendy publications, things to buy, commercial services to subscribe to - G has you covered. That’s what they do now.

jrussbowman · on May 16, 2022

"Want to become rich? Make a search engine which indexes the fresh relevant data from the big siloed websites, and ignores the general dead Internet."

Did that to some degree. Unscatter.com pulls from reddit and twitter to source links.

I found reddit only created an echo chamber bubble of obvious bias and twitter only diluted it a little.

Hnrobert42 · on May 16, 2022

As you describe this, it makes me think about how populations tend to migrate to cities and away from rural areas. There’s even a parallel to white flight in the emerging popularity of the chan/gab fora.

rchaud · on May 16, 2022

> Want to become rich? Make a search engine which indexes the fresh relevant data from the big siloed websites, and ignores the general dead Internet.

That would be a great service, but it certainly wouldn't make you rich. Where's the money going to come from? Google got rich because they acquired an ads platform (DoubleClick) and an analytics platform (Urchin) and started monetizing the vast amounts of data they had. That was years after Google had established goodwill as the best search engine.

big_blind · on May 16, 2022

I use beta search engines. On kagi.com and you.com you can preference and filter top sites. There's also no advertising on either. I've just stopped using Google altogether and its improved search so much.

stackbutterflow · on May 16, 2022

I think you're generalizing your own behavior. I regularly use google to search for topics that cross my mind and I end up on many websites that are not one the giants in your list. It's a fun activity. If people stick to the same 10 websites that's on them. Nothing prevents you from exploring the web.

MockObject · on May 16, 2022

> Nothing prevents you from exploring the web.

What prevents you from exploring the web is you can't find but the same 10 sites through search engines.

PragmaticPulp · on May 16, 2022

> If I want to search for something topical and relevant, I go to Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc.

Maybe we’re searching for different content, but I disagree. While Google results are not without noise, I think it’s a huge exaggeration to suggest it’s useless. I still regularly find quality results from a quick skim of the first or second page of Google results.

Meanwhile places like Reddit, Twitter, and Hacker News are full of very strong opinions that feel truthy, but are mostly noise. Unless you go in with enough baseline knowledge to filter out 9/10 underinformed comments to dig out the 10% who actually have direct knowledge of the subject and aren’t just parroting some version of something they read from other comments, skipping straight to social sites becomes a source of misinformation.

altairprime · on May 16, 2022

If you want to be rich, solve search without full-text indexing of sites. Pagerank only ever worked because of human curation of webrings. Full-text search made is easier to find content, and opened the door for spammers. The only viable route forward for search will be to replace full-text indexing with human curation, somehow. Solve how to scale that up instead, so that when everyone else realizes we need it for the health of the Web, you’re ready.

hn_throwaway_99 · on May 16, 2022

Doesn't this site, and all of the content it links to, pretty much disprove your theory?

Yes, sure, I often do go to the "top sites" when searching for content, but I still usually start at Google. And, despite all the SEO spam, Google still does a fairly decent of landing me on, for example, the appropriate Wikipedia page, Stackoverflow post, travel site, etc.

Jenk · on May 16, 2022

> If I want to search for something topical and relevant, I go to Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc.

High chances you will find a link to an external site over content actually on those big named sites though, right? That tells us the organic web isn't dead, it's just hard to discover/navigate - because of SEO wars, most probably... The problem isn't the lack of content, it's the number of shitty spammy sites standing in your way of the sites you actually want to see. Like a sleazy salesman trying to direct you to the crap laden three wheeled rust bucket when you were heading toward the family sedans.

samstave · on May 16, 2022

This MUST be the reason that they threw their purchase of Postini in the garbage and my GMAIL INBOX is filled with spam, and my "social" and "promotions" tabs dont filter....

GMAIL is garbage now, I literally use it as my spam email any more. Which sucks because I have had it for a really long time.

Annecdote on Yahoo! Mail ; years ago I wrote to yahoo support asking when I created my Yahoo Mail account (i'd had it from the 90s when it was very early available...)

And support told me that they couldnt tell me when my account was created as that was *proprietary company information*

So I deleted my Yahoo account. Im about to DL all my gmail and do the same.

mrtksn · on May 16, 2022

It has been dead for a while now and the whole society feels it globally. Things were getting so good then things become horrible and whoever cracks the path to the goods stuff again will find great riches at the end of the path.

dotnet00 · on May 16, 2022

I agree that this seems way too reductive. I was recently reflecting on this and noticed that I constantly run across new blogs and sites whenever trying to learn something. I just don't usually pay much attention to the site name in the way that I remember HN, Reddit, Twitter etc.

So, while I would agree that some aspects of the old internet are dead (like 'small' ~1000 user forums focused on specific topics having largely been replaced by generally inferior subreddits and discord servers), I think it hasn't gotten as bad as you're making it out to be.

DebtDeflation · on May 16, 2022

Unfortunately, correct. The average Internet user accesses it via a phone, not a desktop, laptop, or even tablet these days. Most of that access is through apps, not a browser. To the extent that a user is looking for a factoid answer and does a search, a Google Knowledge Graph result with a Wikipedia link is probably enough in most cases. If they want a technical question answered, Stack Exchange; a product review, Reddit; nearby restaurants with reviews, Google Maps; etc.

hombre_fatal · on May 16, 2022

I don't get how TFA shows evidence of the Dead Internet Theory just because their site manages to attract ~zero users.

Just host a <form><textarea><button></form> at an IP address and notice it's just spambots submitting it with backlinks, not actual users. Doesn't mean the internet is dead nor that the indieweb is dead.

It doesn't really show anything other than the only people able to extract value from your creation are the spammers.

Shinchy · on May 17, 2022

I think you're thinking too narrowly about general chit chat content. E-commerce for example is still very much in the function of using your own website. As I would say is documentation, e-learning, saas, company information, etc. It's a more purposeful web.

What is dead though is the general blog like content and community platforms of old, the era of Wordpress blogs, forums and hobbyist websites is certainly gone.

derefr · on May 16, 2022

> Make a search engine which indexes the fresh relevant data from the big siloed websites, and ignores the general dead Internet

I don’t understand why Google themselves don’t do this. LinkedIn v. hiQ demonstrated that they won’t get in trouble for scraping users’ subjective views of data within these silos and then stitching them together to form a cohesive whole. So where’s the effort to do so? It seems like the obvious step.

omoikane · on May 16, 2022

I think the Dead Internet Theory bit is just a bait to get more comments. It's a bit of a stretch to conclude that the internet is mostly robots just because one website sees mostly robots. This extrapolation would be convincing if that one website is a high ranking website that sees a lot of traffic, but searchmysite.net does not appear to be one of the top websites.

throw10920 · on May 16, 2022

> I agree: the WWW Internet is dead

I've heard this claim a lot, with 0 supporting evidence. Do you have any?

My own experience is that there are thousands of content-rich, high-quality blogs still being written by real humans, because I regularly find and bookmark new ones weekly, without even looking for them, so: please provide evidence for this claim that runs counter to my lived experience.

lkxijlewlf · on May 16, 2022

> If I want to search for something topical and relevant, I go to Facebook, Twitter, Reddit, HackerNews, Instagram, Google Maps, Discord etc.

Interesting. When I search for something topical I search those sites using Google because al(most) (I don't use some like FB and insta) all those sites have really shitty search.

Schroedingersat · on May 17, 2022

If they wanted good content, they shouldn't have coerced everyone with good content into turning it into illegible seo spam in order to appear in the top 10 pages. The writing was on the wall the first time a recipe site had to start writing stupid stories about their dog.

dageshi · on May 16, 2022

I agree with you to an extent. The web is less useful than it used to be. BUT I would say a lot of that usefulness has diverted into youtube. There are people who would previously have made sites who are making youtube videos instead which of course is owned by google.

NicoJuicy · on May 16, 2022

The big siloed websites are just indexes of fresh content though.

With a generic way to place comments on it.

ClumsyPilot · on May 16, 2022

> If you think it's bad for you, imagine what it is like for Google Search! Their entire business is indexing a medium which no longer has any relevancy.

Google was the one (among many) that killed it - so I am not gonna shed any tears.

ouid · on May 16, 2022

Google is still pretty good at searching reddit. Maybe reddit can acquire them.

big_blind · on May 16, 2022

site:reddit just is the best search engine at this point. I still don't like Google though.