Hacker News new | past | comments | ask | show | jobs | submit login
The internet is an SEO landfill (sendwithses.com)
598 points by itom on June 23, 2019 | hide | past | favorite | 421 comments

Google any recipe, and there are at least 5 paragraphs (usually a lot more) of copy that no one will ever read, and isn't even meant for human consumption. Google "How to learn x", and you'll usually get copy written by people who know nothing about the subject, and maybe browsed Amazon for 30 minutes as research. Real, useful results that used to be the norm for Google are becoming more and more rare as time goes by.

We're bombarding ourselves with walls of human-unreadable English that we're supposed to ignore. It's like something from a stupid old sci-fi story.

Those are the absolute worst. A recipe with someone's jackass life story for 2000 words. You start to wonder if it's even a recipe at all. You have to scroll for a good minute, hoping you didn't miss the recipe. Then come the ads that reformat the size of the page. Everything jumps around and you have to rescroll again.

Exactly why I stick to cookbooks in the past year or so. More and more, "fuck the internet", long live murdering trees.

One thing about cookbooks is that the massive barrier to actually publishing and distributing is a natural filter for credibility; most cookbooks have tried-and-true recipes and are either authentic or a genuinely good adaptation by a skilled chef author.

The internet is full of garbage recipes that either just don't work, inauthentic, and uncreative, all at the same time.

As far as murdering trees go -- let's not forget that libraries and e-books exist.

The flip side of this is that the cookbooks rarely have super niche recipes. Even some of the highest rated "authentic" cookbooks have the flavor level tuned way down to be more "suitable for American tastes".

The best cooking tutorials are barely translated Youtube videos. Or even better, recipes going through Google Translate.

(Or get a multi-lingual friend... :) )

Cook's Illustrated also does some 100% legit international recipes, they are smart enough to A/B test against the country of origin and try to match up the flavor profile with ingredients that are available in America, which can be more work than just directly translating to "closest cultivar of a plant".

I've also enjoyed the organic explosion of low carb cooking that has happened online. I have been witness to the community growing from its very beginnings less than a decade ago when it was just some people trying to make weight loss taste good, to professional level chefs jumping in with their recipes. It is now possible to source low-carb recipes online that can go toe to toe with any other genre of food (e.g. https://alldayidreamaboutfood.com/chocolate-hazelnut-sandwic...).

Of course there is also an explosion of low-carb blog spam. Ugh. I do sometimes miss when everyone in the community was there because they loved just trying new things out and sharing their experiences. The same SEO problems exist...

I absolutely understand what you're getting at. But, I have a reason to disagree. I find cookbooks help expand the cultural horizons to cooking. Only because you can get "authentic" cookbooks of other countries. The Polish ones I have, are in Polish. So, that's easy. The French ones I have, though in English, are written either originally by Frenchies and translated or they also know English and have a good editor. Same goes for Japanese, Thai and German. But I also look for keywords about their grandmother. Best way to know the authentic level of a cookbook in English is how much they admit they're stealing from their grandma. The higher, the better. Especially if they talk about struggling to get the right ingredients.

Honestly, I miss absolutely nothing from the internet when it comes to cooking. But I also wanted to be a chef. So I studied how to properly cook long ago and I cook almost every day... so I'll admit in already an outlier. Still, the way the internet treats cooking is retarded. It doesn't surprise me that people don't like to cook if they grew up with the internet as a learning resource. All these people overcomplicate easy recipes and substitute anything because it makes them feel like pretty snowflakes.

is how much they admit they're stealing from their grandma.

Does that still work? Most people's grandmas today where very much alive and cooking in both the 70s, 80s and even 90s and thus as steeped as anybody in all the international influences and 'foreign' ingredients that those decades brought.

the way the internet treats cooking is retarded.

And here you and I disagree. I like the way that 'modern' cooking is willing to revisit well established and 'sacred' truths about food and put them to the test. Perhaps a housewife in the 1920s isn't the authoritative source on the best way to prepare a dish, and even if she is there is no way to know without someone actually testing that hypothesis. Sites like Serious Eats and all the sites they begat bring both rigor and the joy of experimentation back into cooking. I mean why not substitute one thing for another just to see what happens or apply the 'wrong' technique to a standard dish? Many a dish that is considered a 'classic' today no doubt got their start that way.

Cultural is just one dimension. I've had to change my diet drastically recently for health reasons and the niche I fall into (no starch, no dairy, very low sugar others known as no dairy keto) is barely catered for, even on the Internet.

Although the trouble sometimes with "authentic" cookbooks is they occasionally depend on a certain level of shared intrinsic knowledge of the cuisine! That's a tough one to solve on your own.

It's amazing really. My other half's family is Greek and it took me years to get the hang of some of their simplest recipes. The whole time they were saying "It's only olive oil and salt! What can be so difficult?""

> The best cooking tutorials are barely translated Youtube videos.

Yeah I love those. The "random grandma with a gopro" tutorials always work, modulo language barrier. The video editing is crap and you sometimes have to keep fast forwarding while stuff is cooking but the recipes are good.

Could you give us a link to a great random grandma with a gopro video?

I have been to what I thought was deep in the Internet recipe mines and never saw this.

Probably not a GoPro but Chinese food channels seem to fit this description fairly often. Here’s one https://youtu.be/fz_aSsuhg8Y

I learned a dozen things watching that video. Wow, thanks to you and the chef there.

If I live to be 90 I'll still never have the confidence to peel ginger like that...

You may like the "Pasta with Italian Grannies channel" channel. I recently discovered it and it's the best.

I don't know why but I find it really relaxing to watch those kinds of cooking video with a slower pace.

It's good to slow down and enjoy those fairly simple things.

>> "I've also enjoyed the organic explosion of low carb cooking that has happened online. I have been witness to the community growing from its very beginnings less than a decade ago when it was just some people trying to make weight loss taste good"

This is way older than a decade. I knew people getting into it and sharing recipes on Usenet. The current boom is new, but online low carb communities are ancient in internet terms.

It wasn't very palatable back then, but people sure did try.

The original low carb fad was Dr Atkin's diet in the late 90s.

Atkins goes back to the 1980s, and keto diets well preceded him.

There was already an active Usenet community for lowcarb diets in the late 1990s (alt.support.diet.low-carb).

In my experience, ebook cookbooks get annoying when you have dirty fingers and the screen goes dark because you forgot to switch it back to always on. And there's a great tendency that the recipe is always fucking formatted so you have to scroll more or "turn the page". Cookbooks, at least the ones I own, are all formatted so you never have to turn the page. Seems silly, but that's important.

Yeah library, to check if you want to buy it or just write down a handful of recipes.

My strategy is to bookmark the sites of the people who have good published cookbooks. You have a much better chance of getting good recipes from someone who knows what they are doing, as you said, but you also get the additional convenience and variation from being able to search online.

In my experience, these sites tend to be less spammy too, but there's certainly variation there too.

I don't know about that - for every one 'Salt, Fat, Acid, Heat' there's 99 'middle aged white persons easy diet fad of the year 5 minute meals'. Published cook books dont really seem better to me.

There seems to be two distinct 'genres' of cookbooks. One written by people who have studied cooking, and one written by people that have studied marketing.

It gets better. Or, well, worse. Last time I ran into one of those, I wanted to make curry. After the 2000 words gushing about how it was the most delicious thing ever and how they "couldn't even", the recipe was to pour a jar of premade curry over rice.

We started using an app called Paprika that parses the page and extracts the recipe.

Although, I will admit, my youngest daughter and I get a kick out of the stories. We even try to predict which theme the writer will go with. Some are like Christmas Hallmark movies. lol.

oooh but look at all that "Dwell time", that's how google's algorithm knows you're having a great time.

I switched over to duck duck go several months ago. it's just as good as google for almost everything.

The reality is these stupid recipe websites are just reproductions of other people's recipes anyway. Look for an "adapted from" credit - it will only have minute changes. Maybe they halve the cream somewhere "because it's too rich for their family" or something.

Yep, maybe using magazines and other gatekeepers of quality was not such a bad idea.

Blame the web, not the internet

For most of us "the web" = "the internet"

Re: recipes, one of the potential reasons for this is that bare recipes are not copyrightable by law. Courts see them as algorithms, almost -- e.g., there are only so many ways to cook a grilled cheese sandwich.

> 17 U.S.C. 102(b): In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.

So it's been a tradition in the recipe space for decades to include lots of descriptive "filler" text when you're writing your recipe book/blog. That filler text is copyrightable, which gives you legal recourse against people who might try to wholesale copy your work.

Of course, in the modern world, it's definitely true that they can also help with SEO. And honestly I wouldn't completely discount the idea of people actually reading them -- I know my wife has gone through phases where she has a couple different blogs bookmarked and will read through them to get ideas, but also because she liked the writing of the author.

Subscribing to r/baking has been enlightening. The people who write those pages of text you have to skip over to read their recipes actually think they're adding value and that the readers of their recipe blogs are there for the stories as much as for the recipes.

Maybe they're right and some people do love the stories, but they're definitely not just there for SEO anymore. At the very most cynical interpretation, scrolling through the story is the "cost" of the recipe - the blogger gets an ego boost thinking thousands of people want to read all about their life, and the readers get a recipe.

Many essays-before-recipes summarize the experimentation process, what worked and what didn't, why an unconventional ingredient is used, why no stock is used when traditional recipes call for it. Without the context, many recipes just seem wrong and some real gems would be passed over

When those are the case, I really like it. Things like The Food Lab are joys to read, because they go into the detail of why ingredients are present and what variations/substitutions preserve the flavor of the dish. But those are few and far between. Most of the time, the filler consists of stories and anecdotes about the childhood memories that the recipe invokes, followed by the exact same recipe as is on a dozen other sites.

That's not quite the same as the pattern the parent describes I think: There's a longer post before the recipe, not integrated into the recipe.

Ya, I believe the OP was talking about the five pages of story-telling before the recipe that requires you to annoyingly scroll until you find the recipe ingredient list. I came here for a cookie recipe, not a story about how you discovered you liked oatmeal cookies when you were 12, but then became a raisin-cookie fanatic, before moving back to chocolate chips, when you suddenly remembered your Grandma baked you peanut butter cookies as a kid. /sic

Oh, you forgot all their allergies. Every cook on the internet has an allergy to everything but the substitute. "That's when I found out I'm allergic to flour, eggs, milk, butter, salt, chocolate, meat and fruit. Here's my healthy recipe for chocolate chip cookies made from tofu, himalayan sand, gerbil presents and wheatgrass." No! First, it's not a chocolate chip cookie then! It's an unholy abomination of sadness. Wheatgrass is just lawn clippings! Something is wrong with you!

Look, I have the very sad allergy to hazelnuts... I love hazelnuts... but they don't love me. Also soy. Soy is borderline deadly to me even in small amounts. So, I can't have asian food without cooking it myself. Fish sauce sans soy fermentation is my substitute. I do get the allergy issue and being left out in many foods. But I find it so damn hard to believe that all these people are allergic to everything but these "healthy alternatives" that end up being nutritionally empty. Then they wonder why they're suddenly anemic. Then they don't get enough sun because their skin "suddenly" became sensitive. Then they have depression issues. Then they can't shit straight. Then they start cutting back on other foods. Then they have to live on horse pills.

Sorry, rant. Got angry.

Fun argument to have with the next anti-vaxxer you meet. If they argue about mercury or lead being in vaccines, agree with them. Just flat out agree with how terrible it is. Then ask if they like Himalayan sea salt. There's like a 99.9% chance they love it, along with essential oils. "You do know there are more trace amounts of mercury and lead in Himalayan sea salt than in vaccines?" You will then agree with me that mental gymnastics needs to be in the next Olympics. It's amazing how they'll talk around it. Make some popcorn, open a beer and watch the show.

So true. Whenever I look up something now I just add Reddit afterwards. It's been working well so far, but I guess that can be manipulated as well.

There’s a reason that works (sorta, mostly, and when it does).

The infrastructure of web navigation and search is foundationally built on textual, first party published content that enjoys some presumptions of good faith authorship: that it was written to be read, by a reader, and meant to be informative, persuasive/argumentative, or recreational/entertaining.

Think: journal articles, Usenet debates, faq compilations, RFCs, product spec sheets, forum postings, IRC logs, ur-blogs, even the original social media content types (tweets, wall posts, comments).

The cycles and epicycles of architecting atop this “real” content gives us mechanisms that reward and incentivize content produced neither for the ends of the writer writing nor any reader reading, but rather in service of manipulating search and driving ux patterns of various darkness.

All the good stuff on the internet is reborn in a spasm of returning to people creating content they care about for other people who care. Then eventually with success it gets polluted into shit with bots, seo, mass media and agencies, influencers, etc.

One silly rubric I use is: could this page / app / venue have been there in 1999? If so it’s probably worth reading. If not, who knows.

(Of course there are cesspools that qualify too, like the chan world and certain subreddits, but there were cesspools in the early days of the web as well.)

Site searching Reddit can at least give some ideas for different words to try. Most of my problems finding things on Google stem from its declining ability to make a reasonable guess based on what I put in. It's not enough to have a word anymore. You have to know the right keyphrase or it shows a whole different world of results.

It used to show its best guess, then branch off from it with less specificity and more variety and, in general, a result reflecting what I meant within a few pages. Now Google is so desperate to get it right on result #1 that it never admits failure no matter how many pages you go.

When I search for free software I usually put forum in the search terms as it helps filter out the malicious sites and the software where they clearly ripped off the free software but ask for money to unlock the pro version. Plus I get to read why people recommend the software.

Not OP but yes, I’ve also been doing this for a long time and it does help.

Or “wiki”.

or use duckduckgo's bangs

site:reddit.com; after every search!

Try google something like "how to do X on Y" or "what's the shortcut key for A in B".

You have to crawl through piles of garbage like a racoon to find the one-liner you were looking for.

I have a good example for this, try googling how to make Linux ask for a password when using sudo.

yes, at the beginning Google got a huge market share because they were able to return the better result and don't return spammy and unrelated results.

but they are not doing that anymore, it is really hard to find a not spammy (what you described) result on Google these days.

and other search engines are only copying the Google algorithm we need to get back to judging a web page by its content, AI should be able to help now or soon.

Google actively encourages the recipe sites to add the fluff.

Why doesn't Google disrupt itself -- the most concise, cleanest pages that focus on what the user is actually looking for (recipe, not a narrative) should be #1. It's even in Google's interest!

I think their market share encourages them to do nothing. You actually have to offer a good product to grow market share, then slowly run it, then users switch to a new product which originally starts good, and the cycle repeats.

Nobody would start using Google Search in its current format.

It has, and that's mentioned in the article. If you want to be #1 on the search page, buy an ad, because that will always appear above every "organic" search result.

Ads are Google's bread and butter, and has been for the entirety of its existence as a public company.

Re: recipes, you might be interested in this browser extension, which skips to the actual recipe text:


Google any recipe, and there are at least 5 paragraphs

I just Googled Peking duck to try something at random. Of the links on the front page, one was to wikipedia, one was to an article about Peking duck, and the rest where recipe sites (BBC Food, Allrecipes, Serious Eats etc.) or serious looking/sounding food blogs that had the recipe front and center. I don't know if it's a geographic thing (I'm in Europe) or if Google has learnt my preferences, but finding 5 reasonable sounding recipes for Peking duck with Google was trivial.

Edit: Played around some more and spotted an interesting pattern. If a look up typical US dishes, like Key lime pie or BBQ brisket, then I got results more like the ones you describe. But looking up more 'international' dishes like Peking duck or coq au vin then the results where almost all good quality recipes.

> Google any recipe, and there are at least 5 paragraphs (usually a lot more) of copy that no one will ever read, and isn't even meant for human consumption.

And here I just thought people were basically self-absorbed wankers.

My most recent impression:

"I first learned about this recipe from my dog's aunt who was a lenape indian during the pre-cambrian era and she something something memories of childhood blah blah dinosaur steaks yap yap tedious digression on gritty herbs fluff fluff and so I was talking to my local grocer who put me onto an organic vegan gluten free biomagnetic interspordidial hyperflorpic gmo-free meg-free bpa-free taste-free free-free that reminded me of the time that..."


People are lazy and mostly do not want to write a 1000 words essay around the recipe they want to share.

It takes a very unique kind of person to write those things. And those are the ones Google is giving all the voice, while taking it away from normal people.

that really made me laugh - i'm glad i had no liquids in my mouth at the time. =)

You could always, stay with me here, pay for content. Cooks Illustrated (no affiliation) is an awesome resource for those interested in recipes that are proven without the garbage.

>Google "How to learn x"

On the other hand, if you Google "How to do X" you will often find a youtube video in the top results. But it will skip the intro and jump right to the important part.

Sure there is a lot of garbage out there (e.g. everyone uploading 10 minute videos to hit the YouTube optimzation algorithm) but it does seem like Google does try to cut through some of the SEO garbage to get you the useful info faster.

The fact you have to skip the intro to YouTube videos speaks to Google's algorithms. Even if people don't have ten minutes to say about a particular topic, they'll upload it.

When people find out the metrics to getting more views, they become meaningless.

This starts happening the instant a medium becomes a potential source of revenue. Look at most average TV: endless repetitive nonsense plots, drama, and other filler to serve as a medium for ads.

Don't forget video games and loot crates! Most big-budget games are just a formula to encourage kids to shell out more money.

Is this the internet we are warned wouldn't exist without pervasive ads and spying?

What I don't understand is why Google doesn't hell ban these pages. It seems trivial, at least as a human, to detect them. You don't even need to do this for all of them. Just banning a few hundred will deter everyone else.

What I don't understand is why Google doesn't hell ban these pages

Because Google doesn't have a financial incentive to show you quick, quality answers to you questions.

Google makes more money from you flailing around and fruitlessly searching over and over than giving you the right answer the first time.

Google makes more money from you flailing around and fruitlessly searching over and over than giving you the right answer the first time.

That will work right up until the time somebody comes along and build a search engine (or other app) that will just "give you the right answer the first time."

For all their size and power, there's really nothing - in principle - stopping somebody from coming along and doing to Google, what Google did to Altavista. Yes it would be hard to execute, which may be why it hasn't happened... yet.

What exactly is being hellbanned here? Any web page with an uninteresting preamble like most articles and blog posts online?

Spammy websites. A flag tool given to long time Google account holders would be helpful.

To be honest, I just tend to use one recipe site now - BBC Good Food. I ended up subscribing to the magazine since I was getting so many recipes from there. The user ratings on the recipes don't seemed to be gamed either - since there is no incentive to do so.

I just add "serious eats" to my searches.

Ya, but how do you deal with it? You're just fighting a tug of war. Unless we go hack to curated content or paid, it is always going to be algorithm against SEO in an infinite race.

See also: most long-form journalism.

Long-form journalism is what it’s advertised to be, at least.

Recipes are really bad on Google. There are also some odd aggregator sites I guess? Or just seem to copy from other places. Idk what their story is but they have tons of low effort recipes

Also superficially relevant recipes or just regurgitated content.

bounce rates are supposed to demote this kind of content, even tho it doesn't seem to work properly imo

Indeed most of the content on the web is SEO fluff or veiled attempts to push an affiliate link.

Worse, almost every site has a cookie “preference” popup (when everyone knows the preference is just the minimum necessary cookies), a newsletter signup popup, and a browser notification request popup.

Add to that the autoplaying videos that will pop out into their own overlay if you scroll away plus the rotating ads of different sizes which cause the page content to shift up and down... oh and the fancy morphing page headers that hang down over the content you’re trying to read if you don’t scroll enough at first.

The web is currently a shitshow of comic proportions, the likes of which not even the most cynical comedians accurately predicted.

The desktop app situation isn’t as bad, but it follows a similar trend of demanding more from the hardware and user (and network) while providing less.

It might actually be better overall if mobile and desktop performance had not increased in the last decade. I swear the end user experience hasn’t on average improved even a fraction as much as the hardware capability has.

What a dismal road we are on.

I still remember to this day the experience I had with the Miranda IM messenger -- I was signed it to at least 6 services (ICQ, AIM, Yahoo and a few others). 40MB of RAM during the time when 1GB RAM on an average PC became the affordable norm, no lag ever, instant rendering, everything works instantly, plus a lot of possible customisation -- themes, emoticon packs, you name it.

Same with web. Search for something, get 7 useful results on the first page, get your work done in minutes. No banners, no ads, no consent popups, no constant nagging for signups. Just content and some personal expression -- which, while comical at places, is still vastly preferable to the crap show we are enduring today.

Fast forward 12-15 years forward and I am absolutely amazed how awful the state of the desktop and the web is. Things are only getting worse with time.

Miranda was cool. I vividly remember it.

I also vividly remember banner ads at that time, there were a few popular sizes. Some of the first ad-filtering proxies based their blocking algorithms on the sizes.

When AdWords came it felt almost beneficial, because connections were slow and often metered.

SEO tricks proliferated at the time (say, 2000), too, of more stupid sorts because search engines were not that sophisticated yet, including Google.

Is there an adblock which removes these sites from search results?

Check out millionshort.com

I append "reddit" to any query not meant for a productive task. Except now the internet is mostly reddit for me, which kinda bums me out. Especially given the new js-heavy design. In 2019, if I want reliable search about something I know nothing about I must first search on google, then try again with "reddit" appended, then change www.reddit.com to old.reddit.com, then parse through what are often questionable answers from anon users, possibly still influenced by marketers, and then maybe, I get the answer or next lead that I was looking for.


Most of the times I Google something I get content farm rubbish.

Know also that low paid "freelance writers" are often writing a lot of this information that so many people rely on.

I use reddit queries for things that aren't really critical, say a video game recommendation (try Googling that and see how many "top 17 adventure games" you get)

For more serious things like "do I have lactose intolerance?", I google something like [wiki lactose intolerance]

I do wonder why one of the world's most advanced intelligent systems built by thousands of the world's most intelligent people employed by a company which pretty much explodes in a burst of millions of dollar bills when you tap it lightly can't seem to show us results that aren't full of rubbish.

It's like Google's "discover" feed - it shows me trash news from a tabloid.

Google knows _A LOT_ about me, don't they know I despise tabloid news? Gossip about Kylie's breast cancer? No.

> then change www.reddit.com to old.reddit.com

You can opt-out of the redesign in your settings so www.reddit.com and old.reddit.com both show the old reddit, and you can use new.reddit.com to view new-only pages.

I'm logged out by default and only log in to comment, which is rare. I don't like leaving my username up for display if a friend/family member has to borrow my laptop for something. I've engaged in a couple of communities that are a little more counter-cultural than my peers.

You can create another account, or another 10, reddit doesn't mind.

Done this. Eventually one account always becomes my main account and I'm back to square one. I don't care enough about reddit to keep up with 2 separate accounts.

There’s an extension that redirects to old reddit.

You can opt out of the redesign in settings. I don't know how long that will last. Sometimes it shows the new one, but it's rare.

Yep, I’m aware of that. But the OP said they aren’t logged in. My tip was in response to that.

Google used to have a filter to only show discussions which I used constantly until they removed it. It's the only way to find actual reviews of some products without a million pages of blogspam

Oh, you're right - I was going to say that I often used this filter, but apparently I didn't even notice it's gone.

"site:reddit.com" is more useful, and filters out pages that just happen to mention reddit.

I use ddg, and I use site:reddit.com. Reddit's internal search engine isn't as good for general web searches like this (which is what the ! redirects to).

Or even "site:old.reddit.com" if it means that much to him.

I thought I was the only one doing this.

> parse through what are often questionable answers from anon users, possibly still influenced by marketers

I mean, that's Reddit though? No one's making you look there first, it's not "The Internet's" fault that forums are unreliable.

Search the site using the site: operator to get better results; consultanta can find a way to SEO those kinds of queries.

The SEO mongers are on to that search already. I get spam blog posts with reddit in the title for no reason.

Isn't that solved by using the more specific filter site:reddit.com?

Am I the only one who thinks Reddit really sucks?

It's a cesspool of kiddie memes, shock, ill informed opinion and political posturing.

Besides, if I wanted to search Reddit I'd go to Reddit. It being front page for _everything_ I search for now is so annoying.

The front page of Youtube is going to be like the front page of Youtube...full of lowest-denominator stuff like awful memes, Youtuber drama, superhero movies and video games.

But it is maybe the only major site online where you can at least get a real person's opinion on common questions, like recipes and cooking methods, without the fluff of the productized, Adsense-driven sites that exist only to make money.

Case in point: r/gifrecipes. Recipes shown in gif form, so 60 seconds or less. Discussion can happen in the comments, but there won't be any inane babble about how the recipe has been passed down their family for generations. The lengthy pre-ambles on recipe blogs exist only to circumvent Google's penalties for "thin content", i.e. pages with less than 500 words.

Content is being written to serve Google's requirements, not the users.

I was really late to the reddit party and only made an account a few years ago. Thought it would be a great technical resource... Every time I had a more advanced question the place seemed to fall flat. You'd go to a very specific sub thinking there should be some informative people but you come to find it's full of people in the same boat as you. They wind up guessing or throwing out whatever to get karma simply being unhelpful. One answer to a question I had about data recovery was replied to with "your approach is very amateurish" I reply with "please explain to me how it is amateurish" and get no response. It's just a big fuck you waste of time.

I tried helping in some subs but was met with hostility. Tried directing someone to the proper channels (mailing lists) to search for help with a openbsd hardware issue and I get insulted in return because they didn't like my answer. And it wasn't a rude "RTFM" reply but an honest helpful post explaining the mailing lists and how to search and ask. Fuck that noise. I got better shit to do.

The real purpose of reddit is to aggregate people around faux community to sell ads. You have nothing but circle jerks and fanboyism but no actual meat and potatoes. Even after deleting every single worthless default main sub the subs I try to watch are all "look what I did!". Actual questions are never answered. It's all about showcasing to get circle jerks going so people can live vicariously through the achievements of others. That keeps the eyeballs on the ads.

I find myself going back to IRC where the barrier of entry is much higher so you wind up with people who know a thing or two.

As always: It depends. There are great communities and in-depth discussions on niche topics to be found on reddit, but of course there's a lot of low quality content. Particularly when it comes to product recommendations, many reddit communities seem to have become an echo chamber and settled on the same few products they have been suggesting for a while. Often those aren't necessarily bad recommendations, but they are far from in-depth and rarely backed by any actual experience with or tests of the product in question - instead it's become a popularity contest.

There's some discord channels that host good stuff, with communities that like diving into harder problems 'behind closed doors' so to speak.

I've found discord even worse in terms of cliques and meme spamming. And the closed off walled-garden nature doesn't sit well with me.

I do visit some niche Discord channels which are well guarded and the help is certainly there. Though, as another poster mentioned and I have said before is Discord allows too much distracting visual fluff. IRC keeps the conversation focused thanks to the lack of said visual fluff. It's also extremely cross platform and isn't hostile to community made clients.

Reddit has a lot good communities sharing good info for niche subjects. The top 100 or so subreddits are as you say, but there are some good ones further down. And the reason to not use reddits search engine is that it still is very bad.

> It's a cesspool of kiddie memes, shock, ill informed opinion and political posturing.

You could make the same generalization about the internet as a whole. Reddit has so many communities and there is a lot of variety. r/fountainpens for example has none of the ills you describe.

The people here aren't talking about finding a meme that contains words in their search.

Reddit is simply the largest site that disincentivises content-for-contents sake. When you find someone asking a complex question, the person answering it does it because they have a real answer- not because they can make some money by googling the answer and rewording it. That's what the rest of Google's content often devolves to. For certain topics.

Browsing reddit is as described, I never go to r/all for that reason, not do I feel obliged to login, comment or vote in most cases but if you limit your consumption of reddit to a few subs, the difference is huge.

You have to curate your feed, same as Twitter, Facebook, Insta, etc. If you only subscribe and never unsubscribe you will end up in a trash heap.

Most of reddit does suck. There are still really useful communities—my go-to example is /r/askhistorians, which has some of the highest quality analysis and research on the internet.

I like it.

Yet you browse and comment on this website, clearly a bastioned of informed opinions, and no political posturing.

Really it's worse off for not having the occasional meme.

You can have greasemonkey fix the Reddit links for you.

Are you me? This is me exactly. Google has become almost useless and I now have to append certain domains to find anything valuable.

Protagonist shames people who work in marketing, calls them bottom feeders, low skill workers and spammers. Protagonist’s product offers ‘marketing emails’ service

I'll add to that:

Protagonist is trying to make money selling email marketing services. It isn't working. He wants to make money (it's not a charity)

There are millions of dollars per month to be made in ranking #1 for Mortgages. Protagonist says it is easy. Protagonist isn't making the money that would bring in. But it's easy. That it takes no skill. But he can't make the monies.

Why this article is upvoted is very odd. The article does bring up an interesting topic, there are issues with the incentives SEO places. Just like the incentives that directories placed, anyone remember AAAA Plumbing Services? Or before that, with paying writers per word/line (I'm looking at you books from the 1800s)

Aligning incentives is hard. And it's interesting. But the arrogance of the person who wrote the article is amazing, not to mention the cognitive dissonance he must experience. I personally couldn't get over that to really appreciate the larger topic being covered.

Well, protagonist isn't wrong... they just also happen to be a bottom feeder too.

I noticed that too and am shocked this comment isn't at the top.

Related, I'd find an attempt to measure the amount of waste in SEO blogspam and email spam/ham very interesting. I.e. who gets the real landfill designation.

Does that make their point(s) any less valid?

> Protagonist shames people who work in marketing, calls them bottom feeders, low skill workers and spammers


> Protagonist’s product offers ‘marketing emails’ service

Also true.

Would you be happier if the protagonist was an entomologist or truck driver?

A truck driver's indignation would at least be understandable, but you have to assume that an email blast service has some knowledge about how the sausage of online marketing is made.

It takes one to know one?

There's a bit of a difference between talking to people who have asked to hear from you and obfuscating the way people find information by creating noise.

This is 100% how search engine optimization works, the consultants are correct.

As a marketer, when I search for marketing information on Google 70%-90% of the time all of the stuff on the 1st page generally isn't teaching me anything I didn't already know.

But as someone who does this exact thing, I know how expensive good content is - hiring an author to write 3,000 words of valuable information for a reader can cost $500 - $1000 per page. Add another $400 - $1,000 for the consultants time on that page.

Or you can hire writers that produce lower value information for $30 - $120 per 3,000 words that ranks just as well as high quality content.

The first thing I thought when I saw Elon Musk's writing AI was how powerful of a search marketing tool this is.

Reducing your content cost to effectively zero gives you a HUGE competitive advantage.

I know a publisher programmatically generating content and generating millions of organic search visits a month for marginal costs reaching 0 - they pivoted to a tech company after identifying their #1 problem was content costs were too high and ad revenue was too low. Now they're selling this technology to other publishers.

What I don't like about your line of thinking is that creating such content should even cost money.

Knowledge should be free at our point in time and supposed evolutional stage. But it's not.

I'd gladly donate some local lectures on the stuff I know if I had the time. But I wouldn't charge money for it.

Knowledge should be free. And internet should have become mega yellow pages on steroids like a decade ago.

>I'd gladly donate some local lectures on the stuff I know if I had the time.

It sounds like your time is a limited resource and you choose to allocate it to other areas of your life instead of for the good of spreading knowledge. Why then, do you think others should give their time away for free?

Not trying to be hostile, just a thought for you.

No offense taken, yours is a valid criticism -- until you meet 20+ internet marketers in real life.

There are a lot of very greedy people with a lot of time on their hands but they choose to try and strangle the internet instead of bettering it. And I am left saddened that I have a family to support, health to improve, and work on having my own retirement fund. Maybe start a business or two as well.

Trust me I get it. But currently I have to think of the future and there are only 14-15 active hours a day.

Yours remains a valid remark. If we want to have useful internet then maybe a lot of personal lives have to be sacrificed.

I get where you're coming from. But I'm going to offer another perspective. There is a shitload of content being made on the internet, and I believe you too would agree the vast majority of content on the web could be hardly called "knowledge" and are more likely to have a negative effect than good on society. And it's literally all free, they don't get paid for that, it's just what people naturally do, and technological advancements have made it really easy to do.

In such a landscape, how do we incentivize the proliferation of actually useful content and knowledge? It's not so easy as to just putting words out there most of the time is it? The content we consider valuable have significant amounts of time, research, and thought behind them I would assume. Time is a limited resource, and it is often that the people who have the most valuable knowledge to offer have less of it while those who post meaningless crap have the most.

>> "Knowledge should be free at our point in time and supposed evolutional stage. But it's not."

We only evolved technologically. Our economic systems are still stuck in the 19th century. The planet's cultures are still recovering from a string of empires that valued domination and assimilation over cooperation.

For now, I have to charge for my skills to live. Some day I hope to be able to have hobbies and share their output without having to monetize them to fund the efforts.

Can't disagree, you're crushingly correct.

I don't want to charge for programming either but yeah, this is the era we're living in.

Knowledge can be free, with the caveat that you'll usually get what you pay for. We think back fondly of the early Internet when there were all types of truly informative sites.

But we forget that few people even had the Internet back then, and it wasn't commerce-oriented in the way it is now. And back then, those with the capability to put websites online, without things like Blogspot, Wordpress, and the like, were a pretty select minority who had the technical chops and a desire to spread information without expecting any compensation.

If you think knowledge should be free, contribute to freely available knowledge in some useful way you can.

I do. My main barrier to being more useful and community active is my limited time. As I said above, I wouldn't charge a penny if I had the time to share more knowledge.

The knowledge is free. It's available with a click from Google.

It’s also available without Google.

But then the people writing high quality content end up drowned in the deluge, and the market drives itself towards the local minima of quality. That is very, very fucked up.

> The first thing I thought when I saw Elon Musk's writing AI was how powerful of a search marketing tool this is.

"Search marketing" is the current euphemism for spamming?

Can you share the publisher's name? My company would be very interested in this service.

This is a direct effect of using search engines to navigate the web. If we'd stuck to following links curated by actual people (link-rings and other nice inventions) then we'd have never had this problem in the first place. Unfortunately the garbage is here to stay and the good content is totally drowned in a sea of trash.

Once the curated list gains critical mass, it sells out, the list turns into garbage, and the cycle repeats itself.

It's happened over and over and over.

Product Hunt comes to mind.

If search engines were never invented the web wouldn't be a fraction as useful or popular as it actually is today.

I know it's fashionable to hate Google nowadays but come on.

The older I get the more I realize that there is often an inverse correlation between popular and useful/valuable.

I was “on the internet” before Google and a little before the web became popular when all you had was Gopher, Usenet, and Veronica. The internet was much less useful then.

On top of that, without the web becoming popular, there wouldn’t have been the investment in fast home internet or fast cellular data.

Would you have needed fast home internet without the web being popular though? Granted, they didn't look as good, but I know plenty of sites that loaded more quickly 15 years ago when my internet speed was less than a percent of what it is now.

Of course, the past is always rosy, especially when you were young, and new things are always exciting and lose their novelty when you get used to them, but I found the internet much more interesting back then.

I was trying to download shareware from various freeware ftp sites (infomac mirror sites). It was painfully slow over 56K dialup.

It took literally hours to download a five minute QuickTime video clip. Streaming audio kind of worked but streaming video with RealVideo was painful.

Surprisingly enough, graphical remote access to a remote Windows computer actually worked decently well over dialup with PCAnywhere.

I think I first had high speed internet around 2002 via FreeDSL and when that went kaput, DirecTV owned a company called Telestream that offered DSL service.

Yeah, I remember the times I didn't dare move because it might kill my download, back before resuming and download managers were a thing. Still, costly things feel more valuable, so I'd take a good hard look at what I really wanted before I committed to an 18 hour download for an installer ISO, and I'd certainly use it. Ubiquitous availability diminishes perceived value, for me at least. The large shift was from dialup/ISDN to DSL, I think, because it gave me more than ten times the speed (instead of the small increased with 9.6 => 14.4 => 28.8 => 56k => 64k).

Sadly, just turning down the bandwidth doesn't turn back time - I've experienced that first hand when somebody killed the box connecting the building to the ISP and I had to fall back on mobile... it was like playing Tetris on level 99 and then being thrown back to level 01.

Google search was and is a big improvement, but we could use another couple Panda s.


Panda was an update aimed at downranking or removing some of the super thin SEO content from google's index


Just start using curated link sites instead of google. Make your own curated link site about something you know and care about. Nobody is making you google anything.

Sadly, this was LookSmart back in 1998. It was an in-house edited directory, but I told the CEO we should build a pagerank that diffuses to unlisted sites as well.

A pretty good web directory at the time was cheap, about $12 million a year for editors, but Google had 10+ years where it did better with unsupervised algorithms, so nobody spent any time on integrating human curators.

Google exists because the web was built without backlinks, according to Jaron Lanier

It would be nice if every blog had a "Sites I frequent" section.

I just wish search engines would let me block sites. There are maybe a dozen SEO land grabs in my domain that are trashing my search results and provide no value.

I use a user script called Google Hit Hider by Domain that adds this functionality to pretty much every major search engine. Can't imagine searching without it these days.

This is perfect. Perma-ban, so sweet. Thanks GraemeL!

Too bad search engines aren't using this to improve search results.

Google used to do this, but removed it at some point. As others have noted the same effect is achieveable with browser extensions.

I miss blocking w3schools, now i tend to append `mdn` to my searches.


I'm trying that, and upvoting, on my little search engine project: https://glorp.co

The concept is interesting. The search I tried wasn't helpful: https://glorp.co/Search/vue%20vs%20svelte

Thanks for the feedback! I see a few interesting results on the page that seem relevant to the query (like the jsreport.io link). What did you hope to see that you didn't?

The first n results were not relevant, and Google's were.

Here's what a Google search returns:

* Vue and Svelte — A Lot Alike, But Some Important Differences

* Why SvelteJS may be the best framework for new web devs - Dev.to

* Svelte vs Vue.js | What are the differences? - StackShare

* Svelte.js — First impressions? (vs. React and Vue) – Milosh N. – Medium

* Top 5 Reasons You Should Use Svelte on Your Current Project Right

* Vue and Svelte · Issue #4491 · vuejs/vue · GitHub

There are browser add-ons that do that.

I think search engines need to realize the value of raising their results' quality by allowing users to tell them which sites they think are scummy.

Make it an easy-to-click report button and pretty much every site whose politics I disagree with will get a downvote... every article by a person I disagree with as well. The question is: how useful is that to the search engine, or indeed to myself?

> pretty much every site whose politics I disagree with will get a downvote

If this happens to all sites, wouldn't only the poorer sites stand out? I don't see why one strand of politics would be significantly more likely to downvote.

The more extreme the politics the more able you are to rally the troops for report bombing. You can see this on sites like YouTube.

Great intro, but where’s the actual article?

(Also it’s pretty harsh to say that it takes “little talent” to become an SEO expert. Like all industries, there are charlatans selling snake oil, but there are experts who have invested a lot of time in developing their skill sets, and it’s not nice to be so dismissive.)

From my personal experience with SEO people: the experts make large sums of money running their own sites. The non-experts run around with lofty promises and "consult" for other's sites.

I don't even think the author is talking about charlatans. I think the point is the barrier to entry is pretty low, and how do you define what is an "expert"? "SEO expert" is just a few letters you stick on your LinkedIn profile or business card; some will be experienced, ethical experts that get results for sure. I see it the same way I see network marketing: yes, some turn Mary Kay or essential oils into viable careers, but that doesn't invalidate the common criticisms.

Lol when I thought I got to the actual content, I was one paragraph away from the article ending

If it requires so little talent why in 2019 is the quality a lot of websites so poor?

some examples is why do sites still misuse H tags or have problems like creating a simple xml site map that handles escaping characters when required

"but where’s the actual article?"

There isn't. It is another article that will end up in the SEO landfill the author is complaining about along with other "spammy" articles that already exist in the landfill.

In essence, this is another "spammy" article to promote his product cleverly disguised as an anti-spam/anti-SEO article.

I suspect an SEO agency actually told him to write this article to generate conversation (people who resonate with the intro will share it on all social circles and cause it to go "organically" viral). As they say, all publicity is good publicity.

This is as meta as it gets!

We tried paying a big SEO consultant with a long track record of big-name successes (allegedly) $10k/month and the poor results we’re getting had as much to do with the whole hiring high paid consultants who waste your time in weekly meeting busywork than actual SEO not working. Not a good idea for small businesses or startups, even if you can afford it, IMO.

SEO is still a long term investment that every company needs to make for the benefit of their users, helping people find you and your information on search engines is a good thing (assuming you’re providing real value).

The only problem are the spam sites who still succeed occasionally on the fringes. Which was why we were motivated to get SEO (and SEM) help in the first place - as one of our biggest competitors is a shameless gray/black hat spammer that their poor customers keep finding on Google.

Search results are so heavily weighted toward commerce, products and services because those are the people that can spend money on SEO. It's made the internet, seen through the eyes of search results, seem like an aggressive, shady market bazaar.

I try searching for anything remotely bike related, bike community, mechanical information, or just general cool bike stuff and I can't find the human community underneath all the fluff articles trying to sell me shit. The internet, as free as it is, has been overflowed with commercial activity. Which pretty closely reflects the real world, but damn that's a shame.

Don't forget the "showing results for Y, click here to search for X" when X is what you actually wanted to search for.

It's ironic that, in trying to make search "human friendly", Google has also succeeded in giving it all the negative traits of a human --- the "human" that is more like a commissioned salesperson.

I know the argument is that humans frequently make mistakes so "we should just show them what we think they wanted", but that's just opening the door for manipulation.

Search was much better when it was closer to a "grep the Internet". IMHO machines should remain "dumb" (for lack of a better adjective) and leave the important decisions to the users, keeping the latter in control.

Grep the internet and sort by pagerank was great. Then everyone removed links that went offsite. Then link farms. The fake blogs and influencers are the evolution of link farms. Unfortunately sellers will game the system. How do we create a way for buyers to game it?

Pagerank was great when it was new because it had never been used before. Once Google got popular enough to be worth gaming, it became an example of Goodhart's Law.


Maybe it’s time to bring back curated web directories like the original Yahoo?

Yeah, I think so.

Curation-as-a-service could even become a viable market. It's arguably already happening to a certain extent through various platforms, creators, and aggregators; but it's not really split off as its own service yet.

Pagerank could be gamed too. Buy expired domain with high pagerank, establish a forwarder to your domain and your domain inherited the pagerank.

I thought that lapsed domains got their pagerank reset (or significantly reduced). Is that not the case?

I remember the days of hiding text in the same color font as the background so that grepping would make your page the top result. So I’m not sure those days were better.

The dumber search is, the easier it is for the bad actors to game it and drown out the legit sources.

Maybe page rank (mentioned in a sibling response) helped address this, but then it’s really no longer just a dumb grep.

I remember the days of hiding text in the same color font as the background so that grepping would make your page the top result.

A simple "-site:somesite.com" removes those, or just scroll past them. Keyword spam is easy to ignore because the title of the page, the domain, or the path often has nothing to do with what you're searching for, and the preview text is nonsensical.

The dumber search is, the easier it is for the bad actors to game it and drown out the legit sources.

The dumber search is, the easier it is for users to ignore the crap and find what they're looking for, even if it's not the first result. The "smart" omission of results has the effect of removing a lot of useful ones too. Google's claims that they have X number of results for a search is essentially meaningless if you can't actually see them all.

> Search was much better when ...

If you really believe that, maybe it is a nice startup opportunity for you :)

Trying to make money is precisely the reason Google is the way it is. I don't think starting another business is the answer.

I've been thinking about a distributed search where each participant in the network crawls and answers queries related to a part of the web. Make search more like bittorrent. Is there such a project?

Cool concept. Let me know if you find anything!

I have long wondered how much such a project would cost. I have no actual idea how much hard drive space would I need, nor how much processing power.

Nah, it wouldn't be profitable, at least not in terms of money

Pretty much captures it. That was the founding principle of Blekko which was that humans could curate a core set of 'good' websites for a topic and all the crap would not have a foot hold to show up on the page.

So to share some of the challenges with that (if anyone out there wants to try again) they are as follows:

* 'Search' means different things to different people, and we've been trained that a 'search engine' finds anything on the web (for the most part). The product Blekko built could more accurately be called a 'reference' engine which was used successfully by people trying to find facts or data and were not generally trying to find things to buy.

* Have your own advertising system, all in house, where you don't have to "revenue share" with anyone if you don't want to. At its peak Blekko was serving over 10M queries per day which, if we had owned all the advertising revenue on those searches would have kept us going and growing. That said, building an advertising system is both difficult and fraught with patent risk / bad-actor risk.

* Don't let anonymous users use the service. This is perhaps the hardest thing, most people won't give up an email address for even the most useful of services, however since you're spending money serving up search queries you don't want to waste that money serving up queries to bots and other bad actors. At any given time when I went through the logs there were between 5% up to nearly 18% of the queries were 'suspicious' or likely bots. That is 18% of your capacity you can't give to "real" humans if you can't control that traffic.

* Build a relevancy ranking system rather than a popularity ranking system. Search results have two metrics of interest, precision and recall. For a reference engine you want to focus on precision over recall. And while existing search engines use the "virtuous cycle" of search & click to track popularity (which can be an indicator of precision but is better at indicating click-baityness) build your ranking engine using NLP based evaluations.

* Your document index size should target 5 billion documents with a goal of 10 billion documents. Scale your cluster and algorithms to process a query to that index in 100mS or less.

Do that and win :-) Or find the next barrier to creating a useful way to search the web for information.

I've been lamenting this demise of Google over the last year or two, but it's been especially foul the last few months.

Similar to your biking anecdote, I was trying to find any simple trouble shooting help for a home coffee brewer that suddenly stopped working. I couldn't find any results in the first 4-5 pages that weren't trying to sell me a new machine.

The manufacturer is also partly to blame, because I couldn't find anything remotely relevant either on the product page or in the PDF owner's manual.

It's a shame what these tools have become when they could be so much more.

I’ve tried switching over to DuckDuckGo.com a bunch of times over the years, but this time the move has stuck.

I think one of the reasons is that I do not get a bunch of the ‘smart’ stuff that Google tries to do for me. In a way it’s like using a search engine from 10 years ago ... and it’s better.

There are things that I do miss — quick cards for things like currency conversion, flight details, that sort of thing - but the mighty Duck is getting better at those, and if I need to reach in to Google I can always just !g it.

I find that hard to believe. I used the phrase "home coffee brewer stopped working" just now in Google and see nothing but troubleshooting articles and forums from the very first result onward.

If you put in an exact model then I suspect the results will be very different.

The other infuriating thing that's related is if you're trying to find a schematic or service manual, Google thinks you're just looking for the (often useless) user manual. Searching for the user manual does not make it think you're looking for the service manual... it's absolutely idiotic, because who would search for "service manual" but actually want a user manual? Beyond some weird conspiracy theory involving anti-right-to-repair, I can't explain why.

I don't even understand the thread-OP's complaint. Googling "bike forums" gives me pages of what they say they can't find.

Just curious: Did you ever find what you were looking for? If so, can you find it with Bing / DDG? It not, is it in the web at all? Maybe there just isn't a webpage for what you want.

One nice solution would be a search engine that throws out anything related to commerce or linking to commerce.

Not entirely throwing out commerce, but I sometimes use https://millionshort.com/ where you can remove up to the top 1 million websites from the results.

lol, if this becomes popular then everybody needs to reverse their seo efforts and optimize for position 1 million instead.

Amazing! This does something that I wish they would all do — allows me to block a site.

I never, ever, for any reason, want to see results from the Daily Mail. I don’t even care if it’s the most relevant content: I just never want to see that site in my results. I know I can block it in hosts or whatever, but that’s not what I want. I don’t even want to see the link.

I wish DDG would give me this option.

I feel the same about sites like inc.com, fastcompany.com, entrepreneur.com, forbes.com. They are the absolute worst, shallowest content for business journalism, but they're right at the top of search results, and their blogspam fills up my Pocket "Discover" recommendations.

Hey, thanks. That's a great site! You get some very interesting search results and – even better – there's no third-party JavaScript or other trackers.

Jeez, that's surprisingly (and kind of tragically) effective!

Thank you, I already knew about it but had somehow forgotten.

Then the search engine itself would be probably thrown away :-D

My favorite parts of the internet have no relation to ads or commerce at all.... including wikipedia. When you’re looking for information commerce is a massive distraction.

sadly true, but an open source engine like searx could be made to do such a thing

The real cost is in the infrastructure needed to run it unless you go full P2P

There's YaCy already: https://yacy.net

It's important enough, and currently wastes enough of my time (read: SEO crap, Medium and Reddit are top results for... everything) that I'd happily pay for a good search engine!

Yes, Search could use some Bittorrent-like disruption.

I wonder though - is too much of the P2P universe now mobile devices instead of regular computers that it wouldn’t be feasible?

P2P can be done on mobile but you're fighting against limited resources (storage, bandwidth, CPU). The reason P2P got good on PC is because storage got cheaper, bandwidth was increasing and CPUs kept getting faster.

Right. So texting on P2P is easy and appropriate, but I can’t see how building a search engine on P2P would really work well given those limited resources.

P2P it is then!

How could it be sustained? Search engines are expensive to run.

It could be a subscription service.

Agreed. Much like a spam-filter. Seems straightforward enough. What am I missing?

What you're missing is the fact that larch search engine companies have no profit motive to do this. They get the vast amount of their money from commercial ad sales -- ads for commercial results. Why would they purposefully castrate their highest paying customers?

And if you think the answer is to create and host a new, independent engine, you will be hard pressed to develop a good one, and to find the money to keep it running.

user smt88 offered a fabulous idea :

>Wikipedia is basically an information search engine without commerce or social media. The problem is that your results page is the article itself, and you need to scroll to the bottom to get to an external site.

>It's theoretically possible to change the UI or analyze Wikipedia to make a pretty solid search engine powered by millions of person-years of curation.

>user smt88 offered a fabulous idea : >Wikipedia is basically an information search engine without commerce or social media. The problem is that your results page is the article itself, and you need to scroll to the bottom to get to an external site.

Using wikipedia as a data source for a "search engine" does not fulfill gp's (jacquesm) request of "One nice solution would be a search engine that throws out anything related to commerce or linking to commerce."

For example, go to the wikipedia page for the film "Groundhog Day" and look at the external links: https://en.wikipedia.org/wiki/Groundhog_Day#External_links

- one of the external links points to IMDB.com -- a commercial website owned by Amazon.

- another link points to Punxsatawney Groundhog Club website that's advertising a $30 beer tasting event (Hogtoberfest).

- another link points to a souvenir shop

To continue the example of commercial links, a lot of math articles on wikipedia point to MathWorld which has advertising for the commercial Wolfram Mathematica software package.

IMO, I don't believe a search engine based on wikipedia's list of external sites (even with ignoring the commercial links) is going to be useful for the mainstream audience. I probably do 50+ searches on google every single day and maybe 1% might be answerable from a "wikipedia search". For example, I needed a DIY answer to disassemble a Moen faucet and Wikipedia has zero articles with those instructions. In contrast, the Google (and Youtube) search results has the information I was looking for even though it has the unwanted ads and "content marketing" blogs from plumbing brands.

How about a layer over Google? Take Google's search results and filter them. Any obvious barriers to that?

>Any obvious barriers to that?

Rate limits.[0]

One can't really build a robust and comprehensive (server-side) search engine on top of Google's search API. In other words, Google Inc isn't going to provide an API so powerful that one can build an "alternative to Google" with it.

On the other hand, if you're talking about a "client-side" filter instead of a new search engine, (e.g. a client-side webscraper that flips through the Google results pages), you'd run into multiple problems. E.g. random CAPTCHAS screwing up the automation script, and the difficulty of creating a "rules engine" (e.g. machine learning) on your local computer that recognizes blog articles with a commercial slant.

[0] https://www.google.com/search?q=google+search+api+rate+limit

see user smt88's idea above.

They are pretty aggressive about detecting that and throwing up captchas.

see user smt88's idea above.

Though I'm no expert at this but these days negative SEO has also become a part of the search engine optimization. What this means is that some competitor can create thousands of bad links to your site causing your site to incur a penalty in search ranking.

Google has created a disavow tool for this but for someone who runs a blog and not super tech savvy may not know about it and regardless it is still a burden for anyone to keep up with all this.

Again, I'm not an expert at this and I'm doubtful that a billion dollar search engine can be tricked by such shenanigans but it may be something worth looking into. if your site traffic drops

For many topics I now go straight to search Reddit. It avoids all those articles that SEO the shit out of a topic where I was just looking for a two sentence answer.

Edit: I'm often surprised how search engines fail to exclude obvious SEO farms. Especially looking at articles about stock, you frequently just find a ocean of what send to be clearly autogenerated articles.

Any time I want to search reddit, I use Google and put “reddit” at the end of the query (I know about the site: query, but there’s really no point to type the 5 extra characters).

Same tactic to find genuine content, but I find Google’s results to be better than Reddit’s own search.

This is what happens when you don’t give users control over how search works. The weighting is blatantly user hostile by default because it’s so directed towards people who pay for your eyes. I would pay a massive amount of money for a search engine with a “no commercial results” switch.

Of course paid search introduces class issues. I think we should view search like a public utility because it’s absolutely necessary for surviving the world.

The original designers of the web envisioned a world where every user could have their own chosen "agent" to navigate the web for them. This is where the User-Agent HTTP header comes from. There would be search agents, data mining agents, and interactive browsing agents.

We missed out on like 90% of what the information revolution was supposed to bring to users, and instead the data giants have captured all the value.

We expect everything on the web, including our user agent, to be free as in no cost. The only way to pay for the development cost is to sell to third parties aka advertisers. If people actually paid money to develop more advanced user agents there would be a market for them.

Why do people prefer free over paid? I think people underestimate the influence of advertisers and underestimate the value of their own time spent researching and dealing with crappy products. It’s almost like a psychological bias.

I like the idea of a better user agent but would I pay for one? Judging by my refusal to pay for quality journalism I’m guessing it’d be a hard sell. That’s another thing I should be willing to pay for but don’t for some weird reason.

The guy who posted the story of splicing a 500kW cable wasn't looking for money. He wanted to amuse people and thanks to him we got amused endlessly. The costs of putting things online are $5 per month at Digital Ocean.

> The costs of putting things online are $5 per month at Digital Ocean.

Plus the knowledge you can, and non-trivial technical expertise like: registering a domain, DNS configuration, setting up web software (even if "one click"), and maintaining it over time.

> the story of splicing a 500kW cable

Do you have a link to this? I've tried googling "splicing a 500kW cable" (and variants) and haven't found much relevant.

Warning (NSFW?): if you're coming from HN, the link redirects to this somewhat entertaining image https://imgur.com/32R3qLv

> We expect everything on the web, including our user agent, to be free as in no cost.

I don't think this was an accident. Back in the 90's I paid for Netscape Navigator and for email. People used to even pay money to indexed by search engines (like paying to be in the Yellow Pages). Companies started "giving these things away" in a calculated move that I guess the nascent web population was just not cynical enough to reject.

Web browsers used to cost money. I vaguely remember getting a boxed copy of IE from an MS rep that came to my school as a kid. ISPs started bundling Netscape if I remember right, then MS started giving IE away for free with the OS. In the end we got Firefox, so I'm glad for that, but we can blame the browser wars in part for the unwillingness to pay for agent software.

If you use a Mac, DevonThink and its associated products are great examples of paid user agents that seek out and organize information for you. Their free trials are fairly generous.

I don't use a mac. Is there any equivalent for other platforms?

The closest I’ve ever found is https://www.zootsoftware.com/ - famously championed by James Fallows of The Atlantic. I don’t believe its web crawling tools are as sophisticated as Devon Technologies’, but it can do a lot and has a lot of power tools to process what it ingests. I haven’t used it since 2010, so my information may be out of date.

> I would pay a massive amount of money for a search engine with a “no commercial results” switch.

Yeah but what does that even mean. How do you define commercial? Who is going to go through all of the pages and categorizing them into commercial and non-commercial? And how do you keep shills and “influencers” from littering your results with seemingly innocent but actually commercial content?

Agreed that it’s semantically tricky but if you give me a bag of heuristics to work with it’d be worlds better than what we have now.

Keep in mind there is incentive to have a commercial meta tag so that when I’m actually looking for a good or service I’d be able to easily find it (or see the lack of it). The other 99% of the time ads and products are just an annoying distraction and a waste of money to serve.

TBH google kind of sucks for looking for stuff, amazon + reddit recommendations are my main source of surfacing things I want to buy.

Here's an idea : do the 0.1 version by crawling and indexing the web in a perfectly standard way, apply the usual, run-of-the-mill search algorithms, then diff it against the Bing/Google/etc first results which will be biased towards commercial stuff. The higher a page is among their results, the higher the probability that it's commercial in some sense.

So...you want to build a Google except each query starts on the last page of results?

No, it's just a simple heuristic to get started.

In my recent experience, the only thing which causes me to tag a !g onto my duckduckgo searches is a need for recent results (as in, the last couple of hours -- for the results of a sports match for example).

Anything else seems to be served by the other providers just as well/poorly, and occasionally better.

There was a search feature in Google called "discussions" I think. You could find forums. IIRC then same was possible for blogs.

Both were very precious and both are gone. There was no money in them I guess.

This needs to be fixed.

Huh. Maybe we need two distinct search modes: library vs. bookstore.

And not just the "shopping" tab but a fundamentally different set of algorithms for page ranking.

I've noticed the same with song lyrics when I try to find a particular song I've heard somewhere. Hundreds of results of the same handful of "top" pop music industry artists but anything alternative is hard to find, even if the search query very clearly is not a good match for those popular results. I know the stuff I'm look for is out there, but it just won't show up.

I wonder if there's a ripe market for a search engine that specializes on "clean" search results. Google grew into its current titan status by, in part, providing the results with higher relevancy and accuracy than the competition, and avoiding the primitive SEO cheats of those days (like spamming the page with irrelevant keywords to capture traffic). Maybe that's the time for an engine that could avoid modern, more sophisticated SEO cheats.

I have made the same experience. Funnily enough, I started using the Russian search engine Yandex a while for these use cases. Results are not bad...

>"Which pretty closely reflects the real world, but damn that's a shame."

This sums it up nicely haha

And it has resulted in communities like the ones you mention moving to aggregators like Reddit, Facebook, or even HN. And I’m ok with that - it’s like how communities IRL gather in churches, bars, or other areas.

Why shouldn’t the internet imitate actual life?

Our culture is a frog nailed to the floor. The nail is commerce.

The frog can still sorta function. Wiggle in little circles. It ain't pretty or healthy or free. No hopping.

I wonder what it would be like to not have that nail through us.

Commerce -> revenues -> viable businesses -> sustainable jobs -> peaceful livelihood prospects.

No commerce -> no jobs -> unrest -> violence and wars (citation: centuries of human history).

Maybe, commerce is the nail which constrains humanity both in a good and a bad sense - by providing a peaceful alternative to violent lifestyles and by creating a consumerist culture?

When there's a trade that makes both sides better off that increases the happiness in the world. It's a Pareto improvement.

When marketing convinces someone to buy something that ends up being a bad trade (i.e. the person that buys it is not made better off) then there is no such improvement and the world is not made better off.

This hints at a common sentiment I see on HN. The idea that everything you and I buy and do is very considered and intelligent. But the great unwashed masses don't put such thought into their purchases, and are being "duped" by evil conspiratorial marketing. If only everybody else was as smart as us, right?

Have you considered that maybe the majority of successful companies in the world are fulfilling customer wants and needs and thus making what you call Pareto improvements? And maybe it's just that the other "dumb people" derive joy from different stuff?

The market sorts these things out. It is mostly efficient. Not perfectly. But mostly. A company that is providing net-negative value to the world is a short-lived company.

I don't consider myself immune from hucksters. On the contrary, I've certainly be tricked and manipulated before and I expect I will be again.

Given that hucksters have existed for as long as markets have, I don't have your touching faith in the power of markets to eliminate them.

It’s almost like there could be an inbetween because full throated commerce in every aspect of life and community and no commerce at all.

False dichotomy. There is such a thing as too much commerce, as is evidenced by our current state of ecological self destruction

No, commerce doesn't have to be a nail. It's just restricting the utility of the internet at this point, which is limiting commerce. The internet used to be an antidote to the shopping mall, but now it has become one.

Aren't plenty of (most?) modern large scale conflicts fueled by commerce, though?

It might be possible to have both commerce and some form of democracy though.

> (citation: centuries of human history)

That’s not a citation. And given your thesis is a sprawling abstract claim, it really needs some real citations.

It needs some citations if it were an academic reference perhaps. As a response to a comment that is speculating without evidence on the similarity between on our culture and a nailed from, it seems sufficient.

The likelihood for an individual to die by violence has been much greater for most of human history and prosperity and peace typically go hand in hand. I could provide references for these claims, but I think they are well known and references for the counter claim haven't been provided.

It’s just totally ascientific thinking. You’re connecting things with no evidence of connection.

And evidence aside, your analysis is so coarse it’s basically impossible it could be true. Nothing that coarse is true.

And one last point: I have no doubt your head is filled with examples from history that back up your claim. What you almost certainly don’t have is evidence that there isn’t an equal raft of examples that contradict it.

I find some of my top 'competitor' blogs are worse quality, but somehow perform better due to having tons of links/content.

Something like 'Save Money on Food', I am positive I am best-in-class, but I am only front page google, on occasion. Without a doubt, I should be the number 1 result, and instead, well SEO'd mom-science wins.

Btw, does this mean google should be looking for crappy code/design=genuine website?

The catch of course is that anything search engines start looking for, SEO will start optimizing for. If crappy code == genuine website, then get ready for a shitstorm of shitty looking SEO-enhanced pages.

Come to think of it, that might be kind of funny for a day or two.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact