Hacker News new | past | comments | ask | show | jobs | submit login
Search engines and SEO spam (twitter.com/paulg)
592 points by iamjbn on Jan 3, 2022 | hide | past | favorite | 534 comments

This was in response to mwseibel's thread, which had a big discussion yesterday:

Google no longer producing high quality search results in significant categories - https://news.ycombinator.com/item?id=29772136 - Jan 2022 (1167 comments, spread over multiple pages - note the "X more comments" links at the bottom)

To some extent, I worry that the problem with search engines is that there isn't any data worth returning. Yesterday's thread talked a lot about reviews. Writing a review is hard work that requires deep domain expertise, experience with similar products, and months of testing. If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find. Instead you'll get a list of "Best TVs 2021" or whatever, with some blurb and an affiliate link, not an actual review. That's what people can make for free with a day's notice, so if you write a search engine that discards those sites, that's fine, you'll just return "no results" for every interesting query.

I guess what I'm saying is that if you want better reviews, you probably want to start writing reviews and figuring out how to sell them for money. Many have tried, few have succeeded. But there probably isn't some Javascript that will fix this problem.

I think one of the fundamental things that make search work well about 1-2 decades ago was that web sites would link to each other, and that those links could vaguely correlate with reputation. There were link spammers, but there was actually a some decent organic content as well.

What's happened since then is that almost all the normal "people linking to things they like" has gone behind walled gardens (chiefly Facebook), and vast majority of what remains on the open web are SEO spammers.

Why has blogs and articles stopped linking to things? I'm reading a restaurant review site, and they won't link to the restaurant. The chef name is a link to a list of all articles tagged with the chefs name, rather a wikipedia link or something useful that can tell me who that person is.

Average websites goal is now to keep you on them as long as possible. According to some metric folks, the longer you stay on a website the more money you spend there. Linking to another website destroys that metric.

Also if you are going to make a purchase somewhere, any website would try to get a cut of the money you spend by actually sending referral links to the product. So small websites that do not allow this service will not get linked so much.

On a metalevel it is thus that links or connections between items are information. Information is money. And as soon as that became evident links and connections also became more scarce.

> According to some metric folks, the longer you stay on a website the more money you spend there.

That is really sad. Metric folks inventing metrics for the sake of metrics, which dubiously correlates to profitability of the company.

Yup and developers have been allowing the marketing and product teams to break the back button as well opening every external link in a new window instead so users have to keep something open to their site. You always had middle-click to do this, but now it's being forced on users.

I just noticed even the goobers at GitHub break the back button when you click a project link too. I don't know why people champion this brand when they have dark patterns and shoehorn 'social' functions into the proprietary platform.

This is a prisoners dilemma of sorts and the whole free web is loosing in this.

Because, years ago, linking to lower reputation sites would drain your page rank.

So everyone worried about SEO became afraid to link to anything except:

1) Their own website 2) High reputation sites like NYTimes, etc.

It's sad. Makes it harder to navigate the web.

Bang on. Saying that "there isn't anything out there anymore" is missing the point: Google's algorithms created this situation, intentionally or not. Before Google, people linked to what they wanted and communities would naturally cluster around topics of interest. Google came in and made reputation into a currency which effectively destroyed all these communities through incentivizing selfishness.

"When a measure becomes a target, it ceases to be a good measure"

-- Goodhart's Law.

Google's algorithms didn't create this situation; people chasing high Google rankings did. Had Google used completely different algorithms yet became equally dominant, people still would have poured their hearts and souls into getting higher rankings.

Basically, an application of the tragedy of the commons. Or: "why we can't have nice things".

But that's taking for granted that Google would have become dominant. Perhaps if they hadn't chosen the algorithm they did then they wouldn't have been as overwhelmingly successful. Instead, I could imagine a world in which there are multiple search engines and none of them are all that good. In fact, that's the world I remember from before Google existed. Search was bad but communities were strong and life was good.

Then Google came along and we all found it a lot more convenient than the bad search engines we were used to. And of course, we all know where that led. In some sense, Google built an 8-lane superhighway and bypassed all the small towns.

We all traded away paradise in exchange for convenience. Now we have neither.

On the glass-half-full side of this: we're getting those communities again! Here on HN, on reddit, for certain topics on various social media (there are pearls there too), on Mastodon, various blog authors, Ars Technica, Quanta, etc. [1]

It's just fragmented - i.e., catering to a specific group. Because if it isn't, it's awesome for 5 minutes and then monetization rot sets in.

[1] None of these work for everyone; conversely, all of these are seen as great things by some and have people who prefer that one thing over others for its quality.

The trouble is, you are no longer "surfing" the Web, you are digging through your RSS feeds and links to interesting sites, fediverse subscriptions etc,.that's not good UX, perid.

>Google's algorithms didn't create this situation; people chasing high Google rankings did.

You're technically right. You'd be more right if you said people chased the highest spots on search engines for the widest breadth of queries.

If there were implicit alphabetical ordering of search results I guarantee you'd end up a bias toward A's, Z's or otherwise in people trying to get top spots.

>Google's algorithms didn't create this situation; people chasing high Google rankings did.

But lowkey Google incentivized such behaviour by not being open and transparent on how exactly their algorithms work.

That would have allowed people to artificially chase rankings even faster and more efficiently. It makes the problem worse, not better.

How is transparency worse than smoke screen that we have today? For example healthy and good websites could rank according to good content, good optimization, variety of multimedia content, decent design and UI etc. You can't have too much of good things and qualities. That would be something like writing a too good book or making a too good product.

Because the rank algorithms are subjective heuristics, not absolute metrics. All rank algorithms always have been. It started with the link metrics, then people started gaming that. It's been a signal/noise war ever since.

It's also dangerous to ask for the exact criteria because they are ever changing. Google et al don't want to be prescriptive about what a good site is, they want to recognize what a good one is. You make a good one, they'll figure out how to recognize it.

They can't sit down and publish "The Definitive Guide to a Good Website". That's just not their role and it will be out of date before it's published.

I understand that Google can not prescribe and direct how websites should look like but more transparency on their part wouldn't hurt.

A big problem is that a lot of community content went behind Facebook. Instead of creating webpages or forums people started using Facebook pages and Facebook groups. This is the main reason I have been anti Facebook for over a decade. Not because of privacy reasons as many are but because I saw that Facebook will put the web behind it's closed doors. Even today some of the best reviews about any product or service are usually in enthusiast community forums. But a lot of that activity has gone behind closed doors of facebook and now reddit. Most of the current thriving forums are those that pre existed Facebook.

Surely there is just a different algo that could bring about better communities?

Different, but not better.

The incentives to game the algo remain. People adapt to the environment.

That's why mechanism design [1] exists as a field of study. The whole idea of that field is to provide the proper incentives to steer the participants towards your objective. Yes, considering they will try to "game" the system however they can.

I'm pretty sure google could do strictly better (i.e.: better in all reasonable accounts) than they do now if they focused on the users' experience instead of revenue for a couple terms.

[1] https://en.wikipedia.org/wiki/Mechanism_design

> The incentives to game the algo remain. People adapt to the environment.

Perhaps it could work if the algorithm changed its algorithm all the time.

You don’t think Google does this already?

Paul Graham says Google doesn't want to follow anybody down that road (human intervention in search). But ISTM the problem is that even though they don't, they can just throw a giant pile of money at it if they needed to crush a competitor. Also, VC will refuse to invest in anybody doing it because Google.

PaulG doesn’t know what he’s talking about.

I disagree, he's a pretty good guide to how VCs think, though not always in the ways he wants.

Only if implemented by the monopolist.

People's best chance is stopping using Google and pushing for it to be broken-up.

I wonder if punishing presence of advertisements would filter out most pages that are SEO'd to the max and instead promote "labors of love" type pages.

This is an interesting idea because it would create a type of non- or anti-commercial SEO that could counteract the commercial one. However, Google would never do it because they sell most of the ads that would be (not) hosted on these sites.

Google owns the largest online advertising network though, so that’s definitely not where their bread is buttered.

Wouldn't it be reasonable from Google to show how their ranking algorithms work so all webmasters and content creators know how to behave on the web. Now we have black box that's causing confusion and is misdirecting websites and web users.

I don't think so. If the problem is people gaming the system, making it easier to game isn't going to improve the situation. It's not going to put good content creators on a level playing field with spammers, because good content creators simply don't care as much as spammers about search engine gaming.

But people are already gaming the system, and on any search from product reviews to code snippets, I see SEO-optimized spams populating the top results. Good content creator don't have the time or the technical inclination to reverse-engineer the ranking algorithm (or to brute-force by creating tens of thousands of sites and see what sticks in a giant multivariate test).

Knowing the actual rules might give them a fighting chance, since the bad guys already know these rules anyway.

> It's sad. Makes it harder to navigate the web.

Some would even say it killed the web by centralizing all the content in the hands of a few [0].

Which is the direct consequence of everybody optimizing to better show up on Google/Facebook/Amazon/Microsoft and ultimately even migrating all their hosting to these companies.

[0] https://staltz.com/the-web-began-dying-in-2014-heres-how.htm...

There aren't as many blogs now as there used to be.

That will get worse yet most likely. Younger people no longer produce public text to the extent they did prior to the the smartphone heavy era. Supply of that blog style content will continue to dwindle as the producers age out. I'm sure there's a stability point it may reach, of course, because some tiny percentage of people will always want to write long-form.

Younger people TikTok, they Instagram, they chat in private conversations with eachother, they occasionally post short messages in walled gardens like Facebook, they YouTube, they listen to music, they watch Netflix & Co. That's what they do. They do not persistently write LiveJournals, Tumblrs, blogs. That pre video/audio-focused era is over and it's not coming back (even if there's occasionally a bubbling up of hipster fakery centered around how cool it is to write text).

I heard an interesting theory the other day: blog viability declined because Google killed Reader. Which indirectly ends up poisoning Google's biggest well, since blogs are an important source of relevant cross-domain links.

I'm somewhat skeptical, it seems a little too poetic to blame Google's ultimate downfall on a decision that was notably hated at the time. But it's plausible. If you want it to be a conspiracy theory, you can posit that killing off independent blogs was the intent, to convince bloggers to migrate to Google Plus.

Google used to prioritize blogs and original content like forum posts in search results, but they don't anymore.

What do they prioritize now, "reputable" news organizations enrolled in Trusted News Initiative?

Blog spam and pages filled with AdSense ads.

I find that claim surprising considering how many more people there are simply using the internet at all.

Fewer unique blog domains due to “blogging” sites that aggregate users? Sounds plausible. Fewer people blogging overall? I’m not convinced yet.

I'd believe it. As an IT consultant, I interact with a lot of people who are semi-techs themselves- mostly small business owners who are used to wearing a lot of hats, and also the type to have been motivated to run their own personal blogs about diving/photography/conlangs/quilting/gardening/whatever their personal hobbies are.

Ten years ago, the majority(!) had at least something up and running, where they would post essays, thoughts, whatever came to mind.

Nowadays? All gone. All! When asked why, the answer almost always is along a mix of ever-increasing negative feedback and harassment from randos, and aggressive automated spamming of their forums. Loss of the pseudo-anonymity plays a large role as well. Many have deleted years' worth of work, simply because they are afraid of someone trolling through their posts to find something to harass them with.

I was never a blogger myself, but I am sad about the change. There was a lot of good stuff out there for a while, and sometimes it just plain made me happy to read someone joyfully nerding out on a favorite subject of theirs.

I think a lot of people are still writing this kind of content, but you have to look elsewhere for it: Reddit, Facebook, Twitter; to name the obvious ones. It’s also harder to find, but you can find all kinds of personal content written in comments and posts on these sites.

I realize that this is a hard thing to 'prove', but I am personally certain that the amount and quality of such things has dropped significantly from a decade ago.

Not to zero. You can still find things tucked away in a post on reddit or the like. Almost never, as far as I have experienced, on Facebook or its ilk, as the affordances are different. I genuinely think there has been a loss.

It used to have positive utility, as before you were acquainting with people you would literally have had nil chance of acquainting with before.


Nope. Putting anything out there is basically just doing the rest of the world's Open Source Intel for them. Maybe it isn't the Net that changed. It's just there's way more sharks out there that can't just leave well enough alone.

I frequently append site:reddit.com to searches for a niche search term these days. I think a lot of people who would have blogged or commented on blogs are posting there instead.

I wonder if they'll do a walled garden after their IPO. I've always found the site pretty useless, outside the 'old.reddit.com' version. On the bright side, maybe this will open up space for one of the federated clones to grow.

I think the bigger issue now is that more content is inside social media "silos" like twitter, instagram or youtube. I don't have the numbers though.

Why is this a problem? Can't google index social media silos?

Which ones. They can index their own, but for the others only the public stuff. Facebook has a lot of things private so nobody can see them except your friends. (they are by no means perfect, but a lot of things are private and only seen by friends - most of it isn't of interest to a search engine anyway but comments of the form "I love X product" could in a perfect world be indexed as a sign of what people find good)

> I find that claim surprising considering how many more people there are simply using the internet at all.

Most of these many more people are mobile users, where creating long-style text content can be quite bothersome.

What ain't bothersome, with a smartphone, is taking pictures and videos to slap filters over them, alas that's why we are where we are with TicToc, Instagram and Twitter dominating large parts of the web.

It's even noticeable in a lot of online discussions with text outside of these communities; The average length of forum posts feels like it's gotten way shorter over the decades. People have less attention to read anything that looks longer than a few sentences, often declaring it a "wall of text" based on quantity of text alone.

Imho it's a big part of what drives misinformation; Doing any kind of online research on a small phone screen is extremely bothersome compared to the workspace an actual computer/laptop, particularly with multi-monitor, gives.

There's also the difference in attention; When I sit down at my laptop/desktop, I actively decide to spend and focus my attention on that task and device.

While smartphone usage is mostly dominated by short bursts of "can't do anything else right now", I don't chose to take out my phone and surf the web, it's something I do when I'm stuck in some place with nothing else to do and no access to an actual computer.

But for the majority of web-users [0], that smartphone access to the web is all they know, which then ends up heavily shaping the ways they consume and contribute to it.

[0] https://techjury.net/blog/what-percentage-of-internet-traffi...

> TicToc, Instagram and Twitter dominating large parts of the web

For my part, I'm glad these fora aren't indexed well; I don't want my search results dominated by single-sentence posts and photos. In particular, I don't have accounts on any of these services.

I'd be happy if search engines would decline to index sites behind paywalls. Links to Medium, Substack and Washpo are very common, and if the first thing I see is a popup demand for payment, that browser-tab gets closed.

I wonder if it would be possible to have a big filter button “commercial” or “non-profit” or something along those lines. So you get results that are not deemed commercial or are.

Don’t know how hard it would be to know which is which. Maybe non-commercial : don’t run ads, don’t sell a product or service and provide information only.

I wonder if the majority are moving to vlogging instead?

The original Google algorithm was a clever hack but it relied on web being a hypertext, and links being used contextually.

The algorithm made linking valueable. So instead of writing hypertext people tried to create isolated sites and boost their rank. Remember rel="nofollow"?

Eventually bigger sites took over the small ones.

Noticed this with online newspapers too to the extend they are reporting about a website or product and don't include a URL to it.

I agree very much with this. It seems that between the walled gardens and also people being so reluctant to have “their” audience leave their site/page/etc the discoverability of the web has dropped dramatically.

That's an interesting observation. IMO, we stopped linking to good content because Google was good at finding it. Now Google is suffering, and we need to go back to doing more links.

Yup, early Google relied on a lot of unpaid , unseen human intervention and choices. I ran some weblinks and curatorial sites during the search wars, and PageRank could only work because there were people behind the sites choosing links based on their usefulness to their audience.

I wish FB would be more open, but since they have all this walled garden info, are they well placed to start a competing search engine? Would be interesting if their activity could help filter out seo hackers.

FB search seems to have gotten worse and worse. Unless I can remember the specific Group where I saw something, it's very unlikely that I can find it again. And they know which posts I've been highly engaged on...

This explains why running a search engine on the original Google PageRank algorithm would not work as well today as it did back then.

But Google doesn’t run on PageRank anymore. PageRank is merely one of hundreds of signals they use to sort results.

Does this mean that Facebook is the only company well poised to take on google search?

The main driver of SEO spam, and online scams in general are countries that have little to no opportunity for economic growth. There are literally millions of Internet savvy people who would be able to survive on what we would consider barely anything profit-wise in adsense revenue, which also usually pays out in US dollars. In this currently terrible global economy, desperation turns the most intelligent minds bound into poverty into bootleg SEO engineers, online catfishers, scammers, and ransomware creators, and God bless their creativity...

Instead of creating income opportunity and crowdsourcing people in foreign countries for common (more positive) good, companies rarely create opportunities for the people who would normally turn into spammers and scammers, and that's what creates an endless army of people that constantly destroy online communities like Soundcloud, FaceBook, Twitter, and TikTok with stolen content, trend scams, fake news, and spam messages.

Google search has been invalidating and subverting their most accurate search results based on abstract SEO rules for quite some time now. It was likely done so that they could implant paid ads first into content, because that makes them the most profit. Doing that has destroyed their reliability and reputation as a search service leader, and they're never going to admit it, but payola is the undertone that is ruining their search results... There is a certain type of corruption that occurs when a company turns away from upholding customer service and value towards a monopolistic "profit-first economic stranglehold" business model... That strategy never ultimately works out well for both companies AND users in the long run. The next leader will likely be a search that avoids the same pitfalls until they themselves become a profit-driven monopoly.

There is no algorithm that will usefully and fairly counter spam based on desperation, companies need to realize that creating opportunity for people to operate equally on their platforms is the best move, otherwise, spam will drive any community of rule abiding users away or into madness.

>The main driver of SEO spam, and online scams in general are countries that have little to no opportunity for economic growth.

Not quite right because cybercrime aka hacking, cracking, spamming etc. originated in US not in East Europe, Russia and third world countries which are dominating hacking and spamming scene today. Main motivation of cybercriminals is quick money and ease of getting away with it since you are not physically committing a crime but digitally/electronically.

It is quite right. Those are the main drivers and it's due to lack economic opportunities.

Hacking heavily originated in the US because the US practically built the entire modern tech universe from the ground up. The US was far out in front when it came to utilizing the Internet and the Web, so of course unethical people in the US pioneered various types of online crime, the US was the early adopter.

If you're an elite engineer in the US, you can make millions of dollars doing legal work for big tech. It helps in a big way to drain the labor pool as it pertains to criminal activity online. You generally can't do that today in the countries that dominate SEO spam, online scams, etc. In those countries elite engineers suffer terrible wages doing legal work compared to what they should be able to earn for their abilities; commonly they can earn a lot more doing illegal work instead, it's a very potent lure.

You're an elite engineer in Russia, top ~1%-3% globally. What do you do? Earn several thousand dollars per month doing legit software development in Russia (with either zero or little consequential equity compensation); flee Russia for a more affluent market; or do illegal work where the rewards can be dramatically greater. It would be difficult to resist if you were unable or unwilling to leave Russia.

Furthermore, the penalties for cyber crimes, and ability to track footprints are much more articulate and accessible for authorities in the US and UK for citizens that hack and abuse US & UK systems, which makes enforcement upon US and some EU hackers much more harsh/severe/less complex to enforce, and more likely to be apprehended... Many users in economically disadvantaged countries use older devices like PCs running older software, routers that don't allow Mac level reporting, and well past EOL cell phones that don't leave the kinds of footprints that modern devices do (on top of the well documented now legacy security measures they can take).

>You're an elite engineer in Russia, top ~1%-3% globally. What do you do? Earn several thousand dollars per month doing legit software development in Russia

Become software entrepreneur?

And many international software companies have software development teams and presence in Russia.

In places with a less-established legal system it's harder to make money by above-board entrepreneurship and keep it instead of handing it over to local strongmen (two colorful examples that have stayed in my memory and unlike many others have become public and have also been described in non-Russian media - https://www.independent.co.uk/news/world/europe/valery-pshen... and https://abcnews.go.com/International/wireStory/us-embassy-ru... , but of course those are the exceptions because the usual result is complying with threats and handing over your business or most of it). But it's not really about Russia, it's a general issue with parallels in other countries as well. And of course, there's the issue of the local market; the financial advantages for a skilled tech person going towards entrepreneurship legitimately are less attractive in most places compared to USA; heck, even EU potential tech entrepreneurs often just go 'across the pond' to start their business.

If you can't get a work visa to a first world country, you do have less options than someone already living there; and the salaries offered by first-world "international software companies" in their remote subsidiaries tend to be 'according to local market rates' (the same "several thousand dollars per month" mentioned by the parent poster is a decent rate) and thus not as competitive with "black entrepreneurship" which pays according to global standards.

> Become software entrepreneur?

Exactly. Hacking for hire, making cheats, botnets, SEO farms, selling exploits and hacked social media accounts; practically anything you can think of that US software engineers can't be bothered with, as they already earn a healthy salary. That is entrepreneurism.

It's hardly only shady stuff; Kaspersky, ABBYY FineReader, VKontakte, Telegram are Russian software products that come to mind.

Russia also has its own SaaS enterprise sector with companies like SKB Kontur or Diasoft.

Just like the US has to this day warez and cracking groups, where it's for the longest time mostly been about scene prestige, and not making the big bucks.

I wasn't speaking about that kind of entrepreneurism but about making legal software and legal web services that solve problems and are useful. So many Russian hackers got arrested when they travelled somewhere outside Russia and now they are serving 10 or 20 year sentences in US jails.

> So many Russian hackers got arrested when they travelled somewhere outside Russia

How many? 20? 30? 50? IMHO the cases are rare (and get widely publicized whenever that happens, creating a disproportional visibility), you get a couple captures per year but the number is just a tiny fraction of the actual participants, more like an exception than the rule.

My assumption is beyond 20. US is only after big time cybercriminals[0] smaller ones get away.

[0] https://www.fbi.gov/wanted/cyber

> making legal software and legal web services

SEO is perfectly legal. So is spamming, regrettably.

I think Goo no longer cares about the quality of search results; they have other business priorities, so SEO works. Spam is another thing again; we still, after 30 years, don't have an agreed definition of spam. We still don't have a flawless spam filter - far from it. So it astonishes me how much email spam I get promoting SEO services for my non-existent website.

>SEO is perfectly legal. So is spamming, regrettably.

SEO is legal but spam is not at least not in US and many other jurisdictions.

>I think Goo no longer cares about the quality of search results; they have other business priorities, so SEO works.

Google cares about spam but there is so much data and information on the web that it is impossible to figure out what is spam and what is not. Another big problem is fake data and information that is also very hard to figure out. Generally Google prefers popularity over quality because it is easier to detect what is popular than what is of good quality.

>Spam is another thing again; we still, after 30 years, don't have an agreed definition of spam. We still don't have a flawless spam filter - far from it.

Definition of spam is unsolicited message. So if I get pharma emails in my email inbox that I didn't ask for it is considered spam. SMS messages are another example for example if I get SMS message promoting free coupons but I didn't ask for it then it is spam.

Considering how much spam Google saw in the last 20 years both on the web and on the Gmail they should have some decent machine learning/AI algorithms which could flag spam pretty efficiently.

> Definition of spam is unsolicited message

That may be your definition. As I observed, not everyone agrees with it. Most definitions of spam include the word "bulk", for example.

Bulk unsolicited messages?

Reviews are a special category. It suffers from a couple of issues:

1. You need to have enthusiastic reviewers (people who care enough about a product category to review them semi-throughly.)

2. Proper reviews can take time and may need domain knowledge.

3. Competition. When there were one or two people doing reviews on some category of products, maybe the economics worked out. Once you have hundreds or thousands competing with you, the time demand may be overwhelming and not worth it.

4. If you are a trusted reviewer or site, you will get economic pressure to review a particular thing or brand you may not like very much but the money may be good. So you will begin to experience conflicts of interest.

5. If reviews are just a hobby and not a way to make money, eventually you will slow down or move on, opening a hole that gets filled up by spammers.

7. Some things are timeless (a pipewrench, let's say) and some are seasonal (consumer electronics, toys, etc). The former deserves a through review but the latter doesn't deserve as much but it may get the bulk of interest due to seasonal demand). Does it really matter if the latter's latest iteration has 2% increased battery life to discuss?

I'm sure there is a lot I didn't think of. But it's a doomed category, unless people are willing to pay for professional reviews (Consumer Reports types and other independents).

If reviews are just a hobby and not a way to make money, eventually you will slow down or move on, opening a hole that gets filled up by spammers.

In my experience, the best reviewers are hobbyists. The thing is, it's not reviews that are their hobby. Rather, they review the products go along with their hobby.

So, for my hobbies (espresso and aquariums), there are tons of easily accessible reviews on all kinds of aquarium gear and coffee machines, grinders, etc. On the other hand, nobody does plumbing or HVAC as a hobby (that I know of) so it's very difficult to find high quality reviews of water softeners or furnaces. It takes a very special rare sort of person who would install these things just to review them. The closest thing I could find was this video [1] on a DIY water filtration system by an RV/off the grid type hobbyist (from what I can tell).

[1] https://www.youtube.com/watch?v=WCC4TOYYGF8

People like to talk about their work too. There's plenty of those sort of reviews out there. Mostly on reddit because like others have mentioned organic search results are completely gamed.

> Reviews are a special category. It suffers from a couple of issues

Review sites suffer from a singular problem. They are overwhelmingly SEO spam content farms. People go find some product niche and pay some Fivvver/whatever people to write literally fake reviews of products. Because they're pulling all the SEO tricks and are in a niche category they shoot to the top of search results for that niche.

Their reviews sound realistic and viable but they're pure fantasy. The writers never touch the products being reviewed. Many times they'll pull details from Amazon listings (including factual errors) and even other "review" sites.

Once they get established in their niche they'll accept paid placement from product manufacturers without marking it as such. A single scammer might own dozens of these sites, even supposedly competing ones.

I pay for Consumer Reports. I'd encourage more people to too. I don't trust it completely but it's a good companion to manual searches on Reddit/HN/car forums, etc.

Someone pointed out yesterday on that other search thread that [most?] libraries provide free access to Consumer Reports through a membership. I just looked at the San Francisco Public Library and it does indeed give me access to the magazine and a searchable database.

One way out is to grow a community of enthusiastic reviewers. LibraryThing succeeded for book reviews. LibraryThing book reviews are better because LibraryThing caters to bookworms.

Goodreads also has a community of enthusiastic reviewers, but because Amazon owns it I'm afraid they don't have much incentive to improve the site or change anything.

While I agree with everything you said in general, Ken Rockwell seems to buck the trend for photography gear, especially lenses.

> If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find.

That's not strictly true, given that reviewers are often sent pre-release versions of things in order to do that work before release day.

Not sure why you're being downvoted, as you're correct - however to point out there seems to be a trend where reviewers are only given pre-release versions if they practically always give favourable reviews to the products they list, especially if they're provided the product for free; there doesn't have to be an express relationship or contract between a reviewer and a company either, it's the reverse of how Bill Gates apparently has given $200 million+ to different news channels/media organizations - and so they're less likely going to as freely share negative news about him or perhaps his organizations, so then ; this makes me think, similarly to how stocks being sold by CEOs (etc) must be pre-planned to avoid shenanigans like market manipulation, that anyone giving large sums of money to any media/journalism organization must divide the amount up over 20-40+ years, so that organization at least has a runway and not dependant on larger "dopamine hits" at shorter intervals.

Yeah, that's been a problem with reviews for a long time. In fact it's what Consumer Reports used initially to differentiate themselves: their "thing" was that they only reviewed products bought anonymously at retail (no free samples or manufacturer-provided review items) and didn't accept any advertising from manufacturers either.

Sites that receive free review samples and are supported by affiliate links are kind of the exact opposite model.

It does in a funny way provide something of a metric for how willing the site is to be critical. Several video game reviewers I follow have stopped receiving product from some studios, which I think is a badge of honest review. Although it's not something you'd know easily so it doesn't help much in terms of finding good reviewers.

I trust DC Rainmaker's reviews of fitness tech products because he always returns products back to the manufacturers after writing reviews. So there's no conflict of interest based on free products.


If companies don't like his reviews, they'll stop sending review units. That hits both in the pocketbook and the race to be one of the earlier reviewers of a new product. Reduced conflict, perhaps, but not none.

If companies don't send him review units then he just buys them retail. He has already done this for many products.

Yes, I'm aware. That's less money in his pocket, and less ability to have the review be available on or before the product launch. There's still some conflict of interest, even if it's lessened.

Only purchasing review units at retail would remove this conflict.

Incorrect. He can then return them.

Depends, if someone is popular you can't afford not to have them review your things. A a certain point a bad review will still generate more money than no review at all. Few reach that level though, most reviewers don't have that much following.

This presupposes that companies think their products are bad. If you have (what you believe to be) a good product, you definitely want DC Rainmaker to review it. I think this is a reasonably general point across industries - companies want to get their products into the hands of the most reputable reviewers.

In DC Rainmaker's case it is probably the opposite. A fitness product not reviewed by him is a bad signal.

Any serious publisher has that policy. Here's Wirecutter's (NYT) take on it: https://www.nytimes.com/wirecutter/blog/yes-i-work-at-wirecu...

Search engines are pretty good at solving the problem they were designed to solve, which is "finding pages which contain all the query words". But they are pretty bad at solving the much harder problem of rating the trustworthiness & authenticity, intentions of the owner, monetization of the site, etc.

One possible solution to this could be:

- Let the community vote on the most trusted sources

- Include results from enthusiasts that have little incentive to write biased reviews (Reddit, HN, expert forums)

- Look at the ownership of the site and how transparent they are about it

- Regularly reassess these criteria

This wouldn't scale for a generic search engine, but I'm working on a service that does this for many product verticals/niches.

Agreed here, but in your second bullet, people have great incentive to write good quality reviews on Reddit, HN, expert forums... karma/recognition etc. It just so happens that these "forums" have built in voting systems that they spend time preventing from being gamed so the search engine doesn't have to.

Not sure if this is a good model for a search engine, but it does work to a small degree in those forums.

people have great incentive to write good quality reviews on Reddit, HN, expert forums... karma/recognition

Internet points are a terrible reason to write anything. They're completely meaningless. We should all judge comments on their own merit and not because the author has a lot of karma. Apart from mine, obvs.

This resonates with my experience. A couple of years ago I invested more time than I proud of into buying the right bluetooth headset for me. I have found a site with pretty detailed reviews and tested their reviews standing in stores and trying dozens of headsets out. I also bought 3 headsets on Amazon and sent back all of them later. My impression was that the reviews on this particular site are 100% unbiased, where all other reviews I read just want to sell whatever product is in focus.

I wonder how a search engine could distinguish between "honest & professional" and "fake & amateur" headset reviews without having a head and two ears?

This resonates with me a lot. A few months back I upgraded my desktop's insides. New motherboard, CPU, graphics card, etc. That was the first time in about seven years I've gone looking for review for that sort of stuff.

I remember doing the exact same thing in the past and being overwhelmed with information. The detail and data in reviews would take a long time to collate and make sense of. But this time even the big name sites seem to be much shallower. Less models reviewed, less testing and benchmarking, more regurgitated press releases and other news.

Last time it took me a while to sort out all of the information, this time all my time was spent trying to find any that wasn't 100% fluff.

To address your issue, we can simply accept that certain information cannot ever be available day and date. The best results I've ever got from Google is by doing

> site:reddit.com <QUERY>

If I want to know about headphones, or TVs, I'll find better answers in the sidebar than anywhere else on the Internet. But, I will not be able to find those quality answers until a reasonable period of time has passed where products can be tried and reviewed by real people.

The issue is the immediacy. We want answers now, but we won't ever have them until later. This requires a cultural shift in consumerism that companies will not like: "Wait and see."

The same problem happens for people pre-ordering video games that end up releasing in a state of complete trash. You can only get reliable information once the early adopters have tested it. If you are an early adopter, you are shit out of luck, but you are doing the rest of us a service that we appreciate greatly.

I do the same reddit.com trick, but for TVs and headphones (and anything else they cover), I can wholeheartedly recommend rtings.com.

They're in Montreal. They buy their products retail. They get their funding through subscriptions and affiliate links. Overall ratings are formulaic based on measurements applicable to the category. The formulas are available on the site. So are all the test designs.

They're surprisingly thorough too--when I was shopping for some over-ear headphones I really appreciated that they measured the clamping force (when you wear glasses, something that clamps onto the arms too hard gets really uncomfortable pretty quickly) and breathability (temperature differential between your ears and ambient when wearing the headphones). These are pretty important for all-day comfort but don't really factor in most of the time.

Their methodology is all available:

Breathability: https://www.rtings.com/headphones/tests/design/breathability Clamping Force: https://www.rtings.com/headphones/tests/design/comfort#compa...

Maybe there's a hardware engineer out there with a decade of experience shipping and reviewing TVs publishing his thoughts to his blog. He's heard about the latest and greatest and he's offering his expectations based on the promotional material, his friends at the company, history from the brand - whatever. Maybe, if he's built a reputation of good reviews he's got a big audience. Big audience? TV brands give him an early review model.

Modern Google actually makes the content problem worse. When our notional TV blogger is starting out in our world he publishes two or three essays, nobody reads them, he stops putting in do much effort, posts occasionally, dwindles off. In a world with a perfect search engine his early essays get some attention to encourage him to post more, a feedback loop starts, and before you know it he's a full time TV reviewer.

> If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find.

I think in practice this is actually largely untrue -- with technology products, video games, movies, and just about anything I can think of, most well known reviewers are given early access to the product so that reviews can come out on or before day 1 of general availability. That said this does create a dirth of 100% trustworthy reviews on day 1 since companies are naturally disincentivized from giving early access to reviewers who they know are going to write a negative review.

There are still good reviews. For TVs, RTINGS produces high-quality reviews (although they're not listed super straightforwardly). For computer internals, AnandTech does even better. You don't have to talk about the absolute latest product out that same day for you to have quality reviews of other options in the meantime.

Everyone just makes blogspam because its far less work than actually buying products and developing expertise and testing them and writing out a whole thorough review. Google's algorithms just can't tell a quality review from a surface-level, uneducated take.

Its a double edged sword. Reviews take effort so you want to make them easier for customers to write. But making them easier for your users also makes them easier for those trying to game the system. This is why Amazon's product reviews are useless as well as pretty much any other community based review system.

But on top of that you do have the problem of whether or not someone is really qualified to write a review. So Joe User thinks product X is good. What is their metric for good? It reminds me of an LTT review of the Amazon TV from a few months ago. They gave it an awful rating but noted that the reviews on the product page were generally very positive. And their reasoning was that the people buying these TVs and reviewing them didn't have a good comparison point for what a good TV actually is. They are probably comparing it to a much older and less advanced product not to a contemporary one.

So then you think the answer must be get reviews from industry related media. But then you fall into the classic problems of unethical journalists or simply ones that are out of touch.

It's not a question of whether someone is qualified or not, everybody is more than qualified to write about their own feelings toward a product they bought or service they used. In fact no one is more qualified to talk about their own feelings and experience. How useful that review is to you is a combination of the writing quality and depth, and how similar the reviewer is to your own experience and preferences.

Professional critics usually try to distinguish themselves by producing well written in-depth reviews but not from their own perspective but that of a hypothetical everyman who, ideally, is similar enough to a critical mass of their audience.

So it always interests me when people complain about popular gaming review sites being out of touch because almost always it's the reader that's out of touch but doesn't realize their bubble. It's not an absolute rule but I'm in enough niche hobbies to realize that my desires for products are way out of whack.

There’s a lot of good quality reviews on YT on launch date of pretty much anything these days.

It’s not a problem of doing the review, it’s that there’s not much of a market for written reviews, most people would rather watch a video instead.

Interesting. I never watch video reviews. They're painfully slow and impossible to search.

> It’s not a problem of doing the review, it’s that there’s not much of a market for written reviews, most people would rather watch a video instead.

I'd say it's more that YouTube offers a clearer path to content monetization than text does. YT is a much more lucrative platform for the same level of effort as SEO for their text blogs.

There’s not much money in written reviews, and people can’t find them amongst the automatically written SEo/affiliate crap

How do you compare 3 or 4 videos before watching them? Watching video reviews of the reviewers?

Like most things, it’s a reputation ladder.

There’s top channels like LTT and if what you are looking for is out of their niche, you look for the biggest channels in that niche and go mostly by association (who they have made collaborations with,..).

EDIT: of course the big win of video reviews is that you can see the thing working.

I mentioned subreddit searching in another post - this is a good way to find reviews. I often have other forums I will google search inside for opinions on something. It's more effort than a review site, but it is due-diligence. Since there is money in it for someone lying to you, unfortunately its a "why we can't have nice things" scenario. It is why you can't ask the car dealer what the best car is for you. If you want a good understanding of something, you need to dig. If I see a site with Amazon links, I close it in < 500ms.

Developing common standards/protocols for everything required for a quality review vs. a "candy" or shallow hype review would be a good place to start, making it culture that everyone educated knows about to follow - and then they can only go to or support reviewers who list what testing protocols they follow.

Industry has already done this with the "food pyramid" - influencing, capturing governments to make the food pyramid more based on economic reasons and much less on science - with the government putting it out and distributing it into schooling of different levels, giving it an unearned or undeserved authority which then people blindly trust/follow - not understanding that or when systems and their output or oversight have been captured; why the pandemic bringing the classroom home via Zoom, so parents could see/hear the learning material has outraged many parents - an example I've heard, where white children are being taught to feel guilty about their 'white privilege', or parents being upset their children are being taught at a very young age that they can decide what gender they are; I'm not stating what I believe here, just giving examples I've heard of.

This capturing of the government is why I think ultimately the government should be developing and maintaining such platforms, as per law, and requiring individuals and organizations to in real-time add and update their data (simply example being restaurants, their menu's ingredients, their open hours) - in part to de-risk the government having an unnatural power as "the single arbiter" of truth, perhaps instead to de-risk capture that the government funds multiple independent organizations at the federal level - that States can decide which ones they follow, if necessary, part of why States exist - to de-risk the potential capture of the Federal umbrella; however the system is in an imbalanced, broken, captured state - with the duopoly evolving to be more extreme lead or formed by the establishment, with a broken voting system in arguably most countries of the world, and mainstream media being captured by for-profit industrial complexes that fund MSM through ad revenues - which further develop or mould our culture and narratives/talking points and beliefs, whether truthful or not; without fixing these the other platforms/systems excelling won't be possible.

Regarding finding reviews, I think we have to look at the problem in a different way.

Instead of having 1 search engine which returns the same results for everyone (depending on interests, etc, like Google does), we could have trust networks. E.g. you trust a few people, those people trust other people. From this network you could build something like PageRank, which computes some kind of transitive closure of trust for one given person. This will then determine the search ordering for that person.

Well said - it is among my biggest annoyances with the web. Reviews are almost always packaged into best-of or top-X lists. The quality of the Wirecutter is gradually trending down but it is still the website I use to find the "best" of something. I don't have to waste time sorting through hundreds of list-spam sites.

I always liked the wire cutter for just kind of cutting through the crap and saying “this is the one”. I wonder if we need some sort of thing for reviews where humans filter out the sites that are credible.

It’s a bit funny because this we sort of done by Jason Calcanis’ Mahalo back in the day - but maybe he was just ahead of the SEO curve.

Jason Calacanis was never ahead of anything, except a few more recent idiots.

"Obvious, unactionable, or wrong" is a term of art created specifically to talk about Calacanis' advice.

"Mahalo", a Hawaiian term loosely translated as "who is this freak and why does he think our language is there to promote his shitty startup?" was the answer to the single most terrible thing VCs could identify at the time: that someone would create Wikipedia without a plan to monetize every aspect of it. And yet Calacanis did a worse job than the "original fool", Jimmy Wales, whose similar attempt at least didn't end up as a parked domain of SEO spammers.

Companies like Sweetwater do this right. They have “sales engineers” that help you find what you’re looking for over the phone or text message or email. It probably doesn’t scale but as a customer, I don’t care as it saves me so much research time and I consistently get what I’m looking for.

"SEO" spam is "Google SEO" problem. So SE ranking Optimization is not (yet) so much a problem for other Crawler/index SEs (Bing, Mojeek, Gigablast). You might say that Amazon (in eCommerce) and TRIP (in Travel) have cracked the problem of combining good/deep Content/Reviews and Category expertise with Search.

We regularly see partnership opportunities with customers interested in our API [0]. I presume Bing see the same, though their terms are more fixed and require you sharing more data. Definitely big opportunities in other categories, which are often squandered through a naive, if understandable route, of choosing a Scrape and index route.

[0] https://www.mojeek.com/support/api/

> Why not try writing a search engine specifically for some category dominated by SEO spam?

Back in the olden days, there were lots of organizations that collated high quality content from the best writers. They nurtured expert writers and paid them well. They fact-checked the content and employed diligent editors and proofreaders so it was accurate and well-written. Over the years, they'd build a reputation for reliability and trustworthiness that kept people coming back for more. If you wanted to learn about fitness, or cars, or cooking, or science, you'd find a reputable author and publisher and buy their magazines or books.

But then, in the early 2000s, the geniuses from SV "disrupted" the publishing industry and its financial model. They brought us a much better way to find content, the search engine. Because they were so much better than the old-fashioned publishers, search engines gobbled up the advertising money and became the dominant gateway to content. Publishers had to abandon expensive high-quality writing because rankings and eyeballs now mattered more than quality and trustworthiness. Instead of investing in writers, they invested in marketers and SEO specialists.

The result: worthless content, writers banging out garbage for peanuts, and useless search engines.

Two decades later, looking at the barren wasteland they had created, the SV geniuses thought: I know what we need, more search engines, but smaller ones that collate high-quality content from the best writers. There must be money in that, right?

It's been really sad to grow up and watch the cool techie optimism of the 2000's internet get sucked dry by profit motives and left to rot. The change has occurred pretty much entirely within my adult lifetime (I'm only 27 and I still remember when Google was the cool new thing on the internet).

It went from "search engines and the web will usher in a new era of wisdom and democracy" to "useful content is dying at the hands of monetization schemes, and also the internet will be the death of liberal democracy, woe unto us all" in about 15 years.

I tried creating a search engine for recipes. It works well and people like it, but the struggle is no one remembers that it exists and Google is just their default for search.

So from an individual developer perspective, it's very hard to get people to change their habits. And Google/duck/Bing is the one stop shop.

It's still out there, but I haven't worked on it much lately. I always think that if I had some good advertisers, a better UI, and a salary coming in, maybe it could take over some of Google's usage!

> I tried creating a search engine for recipes. It works well and people like it, but the struggle is no one remembers that it exists and Google is just their default for search.

Link please

Just an idea but what about making it easier for folks to remember to use you search somehow‽

I like and use Duck Duck Go‘s !bangs [1] all the time, maybe try to add your site with a rememberable name.. may I suggest !garlic ?

[1] - https://duckduckgo.com/bang [2] - https://duckduckgo.com/newbang

Just submitted it, let's see what happens!

This is really cool! Bookmarked!

One piece of feedback: It seems to somewhat liberally fuzz the term, doesn't tell me about it and not toggle it in the UI. I searched for "Natto" and got some great results at the top (Spicy Kimchi Natto!). However, a few results down the recipes start including "Nattu" which seems to be a Indian chicken dish and then quickly have nothing to do with the term at all and give me stuff like Tafelspitz.

Yeah, that's the typo detection in search. The key thing is that "Natto" is prioritized, so that part is working correctly!

Tweaking search can be a whole full time thing. I'll write it down and maybe look into if I can tweak the search so if there are exact matches to not show alternatives.

This is great. I've been juicing a lot lately and every site you go to is full of adds and useless content and you have to click `Jump To Recipe`.

1 thing that would be helpful is to add in the ingredient amounts.

Yeah, that was part of the long term monetization plan. Paid for user accounts ($3ish/month) that allow people to clip recipes, view previews, discuss, modify, and share.

Never quite got around to it unfortunately, as I wanted more users before new features.

Have bookmarked. I shall try to remember to use it.

Always nice to see other sites using Svelte!

(1) What is it?

(2) Do you have a bang on DuckDuckGo? I'm pretty aggressive with bangs, and I suspect a lot of DDG users end up being aggressive with them as well.

Linked above, I didn't know you could just submit a random site to a DDG to be included in bangs

I think the name may be part of the problem there. Most things people want recipes for probably don't contain or go with garlic. E.g. if I wanted to make a cake, garlic is one of the last things I'd think of.

Something else that has largely disappeared is that there used to be a fair amount of organization of content, whereas now a lot of content is just thrown into a big pile and the user is left to go fishing on their own with search engines, whose ability to search seems to be declining (ie: Google often seems to no longer support mandatory include/exclude search parameters). Generally speaking, the result seems to be decreasing order and increasing chaos.

Of course, the massive volume of content creates a fundamental problem, but user curation & categorization on sites like Youtube would be possible, were Google to provide the software support so people could do that. Whether this and similar decisions are deliberate or accidental is likely one of those things that we will never know.

> Google often seems to no longer support mandatory include/exclude search parameters

I've noticed this, and it's frustrating. I have assumed it's intentional. I am left to guess as to what a change in this behavior would accomplish.

>Because they were so much better than the old-fashioned publishers, search engines gobbled up the advertising money

No, what happened is that the publishing industry lost their monopoly. They could no longer extract monopoly rents from advertisers.

One problem with search engines is affiliate marketing. If Google de-indexed the junk affiliate sites, the web would be much less polluted with affiliate spam.

> No, what happened is that the publishing industry lost their monopoly. They could no longer extract monopoly rents from advertisers.

Publishing was not a monopoly. Google/FB are a duopoly. If publishers capture rent from advertisers, they plow it back into content, aka the thing consumers actually want. If Google/FB capture rent, they don’t provide a living wage to content creators and plow the money into buying other startups and whatever “metaverse” is.

An "industry" can, almost by definition, not be a monopoly.

It can if there are only a few big players and they're all in cahoots.

Popular HN post circa 2015: “Content Marketing Handbook“

Popular HN post in 2022: “Search engines and SEO spam”.

Inevitable popular HN post 2025: “How to avoid getting flagged when content marketing“.

I cannot find the link now, but there is great one page graph / chart that shows all the categories of Craigslist that have been cloned/displaced by start-ups. Does anyone have a link to that report?

I think you're unfairly putting blame on Silicon Valley. Publishers were only able to produce high-quality content because, with no conversion metrics, advertisers were willing to overpay for placement. Tech undermined publishers' revenue, but what it revealed was that people don't actually want high-quality journalism, they want entertainment, and they're definitely not going to pay a premium for it. This was hidden behind publishers' business model.

>Publishers were only able to produce high-quality content because, with no conversion metrics, advertisers were willing to overpay for placement.

This implies that big budget advertisers (the CPGs, like Coke and P&G), are buying Google/FB because they have better conversion metrics. That isn't true today; only SMBs and gaming companies care about conversion metrics. There are interns in LA/NY probably collectively spending millions on FB for P&G and only reporting the number of likes back to their bosses. Google and FB has never meaningfully delivered on conversions past anything like app downloads.

Tech undermined publisher's revenue because the internet cratered distribution costs. Advertising revenues for big media crashed because the eyeballs moved away, not because it was any less efficient.

> the CPGs, like Coke and P&G

Did any of these heavily buy newspaper ads before 2000? Definitely TV, possibly magazine, but newspaper? I just don't remember seeing ads for Tide in newspapers.

I thought for a moment about your reply. Do you remember Saturday / Sunday coupon sections? These are essentially adverts. Those are 100% consumables -- like branded health, food, and cleaning products.

That's a fair point; I forgot about those. That said, the earlier comment was on conversion metrics. Unlike brand advertising, brands track conversion metrics on coupons closely.

I stand by my point that tech just exposed a bad business model. Newspapers were only viable because they were the ~only game in town.

Now what actually killed newspapers wasn't search, it was Craigslist killing the classifieds...at least according to a study Google paid for.

I guess there's also the rise of influencers in the mix here. The commoditization of publishing means content creators can more easily work independently.

Those guys covered literally nothing compared to what I can get recommendations for with “product type Reddit”. No thanks.

You may not be aware, but the written word can be used for more than product reviews.

Oh, they were just usually wrong on everything else. Fortunately, these days we have individuals debunking the nonsense. Back then, people just uncritically believed total horseshit.

The invariant has always been: find people who make falsifiable predictions and improve. Back then the pool was small and you had no choice. Now, fortunately we have a choice.

What does SV stand for?

Greed, mostly.

Silicon Valley, i.e., the California tech scene.

Silicon Valley

I've been troubled by the just plain awful results being delivered by Google search over the last few years. I think these are just plain hard problems to solve and that Google is not incentivized to solve. Google wants you to click on ads at the end of the day, full-stop.

Often times I find myself searching for "best ($product|$thing_to_do)" which I think many other people do as well because we all want the best. Other times I'm looking for a music or a book recommendation with some depth. This of course nearly always leads to SEOd trash. There is no relevance nor is there trust. So, I like others to use keywords like "reddit" or "forum" to get to real humans who I trust and intentions are not to sell via affiliate links.

These issues often lead to the need in finding trust in real human-centered recommendations that stem from real human interests and needs. I've never found an algorithmic solution to this problem. This is why I think college radio stations or those south-of-the-dial end up being so, so much better. And why beer recommendations from your local brew-shop owner are better than anything you can find on the net.

I think building search vertical that are hand-curated would be very interesting to see. But I also think we need to build more communities which allow recommendations to be shared without an incentive to get hits via search and aren't paid for by large corporations and where community impact/quality _is_ incentivized. I do worry that those days may be gone and there are just not may be enough folks (not in tech) willing to spend so much time online and contributing to niche communities. A lot of folks spend much of their time in walled-gardens like Facebook, Instagram or Twitter, so it'll be challenging to be sure.

> Often times I find myself searching for "best ($product|$thing_to_do)" which I think many other people do as well because we all want the best.

I do too. I'm wondering why didn't Google invest some effort into "best X" searches? I bet they could extract such information from the web and correlate various sources. They already answer all sorts of semantic knowledge questions.

> So, I like others to use keywords like "reddit" or "forum" to get to real humans who I trust and intentions are not to sell via affiliate links.

And therein lies the problem. Reddit makes very little money. Forums probably make negative money nowadays. Google has decided to demonetize the organic internet and subsidizes SEO crap and AMP or whatever dumb thing their signals consider valuable. We get what we incentivize, and right now the incentives in almost all of tech are pretty atrocious.

Did forums ever make money?

yes i think. until they were demonetized

What do you mean by demonetizing forums?

> I think building search vertical that are hand-curated would be very interesting to see.

That was my inspiration behind a side project I made a few years ago — a decentralized, hand curated "search engine" [0]. Never got beyond the side project stage. But I see promise in this in the future. Eventually we'll figure out that moderated crowd-sourced curation is better than the best machine learning. The filtering capabilities have to be pretty sophisticated to make it work, though.

[0] https://github.com/emwalker/digraph

In some ways paid search disincentives Google from delivering quality organic results.

The larger the gap between paid results vs organic results, the more users click the paid results.

Not sure how to solve this problem.

But paid results do not always, if ever, answer the search query better in any shape of form.

So this would end up with displeased users and bounce backs.

>This may not just be a problem with Google but possibly also the recipe for beating Google. A startup usually has to start with a niche market. Why not try writing a search engine specifically for some category dominated by SEO spam?

>You might need to do a lot of manual spam fighting initially. That could be both the thing-that-doesn't-scale, and the thing that differentiates you by being alien to Google's DNA. (They must hate manual interventions; so inelegant).

Is he describing...Yahoo circa 1994? A manually curated directory service.

Some of the examples used in the Twitter thread Paul was referring to would be better served by a manually curated directory service with a possible addition of a search engine only surfacing content from the sites in the directory.

For health information and recipes in particular there are only a handful of really high quality sites that have quality content for 95% of the information most people need. I bet if you wanted to increase the coverage to 99%, that list would expand to less than a thousand sites. At those numbers manually curating the information would be easily achievable.

How to get people to use your top notch Google replacement instead of Google, however. That's the hard problem.

Isn't that what google Programmable Search is?

https://cse.google.com/cse?cx=dc408db269da4e769 (try searching for something you want a review of)

Make a search, whitelist the domains. Every time you run into a good review site, add it to the searchable list.

That's all fine and dandy, but the goal isn't to just make some good sites a bit easier to find, it's to keep the top of your search results from being interspersed or superseded by SEO spam. Unless I misunderstood your suggestion.

It's really hard to get people to use something other than Google. If you were to launch such a product, it would have to be so much better that people recommend it organically to other people.

That was what Google was to Yahoo/Altavista back in the day, a 10x improvement. Reading this thread, people feel pain enough do all sorts of hacky stuff - appending 'reddit 'or 'forum' to queries, blacklisting spam domains, switching search engines depending on topic. If G keeps declining and a new product does things better, the penny will drop and people will swap.

Siebel and PG see blood in the water no doubt, they see G's market share and want to fund companies to take some of this.

Should be able to run on top Google in a browser extension to insert itself only when the topic allows.

And makes me think that StumbleUpon had a similar curation ability, in that the value qualifier is how often [hopefully] real people interact with content - tracked by who's using SU and agreed to allow tracking; can't remember if sharing that was optional or not?

The gamification of the system then would have to come through onboarding fake users, pretending/mimicking real user behaviour to send that signal into the system; not sure if SU ever ran into that problem or was actively paying attention to trying to identify and removing fake or suspicious signals from their output?

I feel a much better system is easily within reach, it's simply getting the right structure to it, the right foundation, and then it will quickly take off due to the quality difference. I've already figured out a design pattern that Twitter and Facebook has indoctrinated us with, making us think it is normal - and keeping us blind to an actual normal way or organizing or communicating, but that isn't conducive to control or ad revenues - and so extending my future plans to include a better search-directory system would fit snugly into my efforts.

SU was a great way to surface random interesting stuff. I bet most blog entries today could be picked from Twitter, even if they are unlinked.

I always used DMOZ more than Yahoo! Directory. It looks like [dmoz](https://en.wikipedia.org/wiki/DMOZ) became https://curlie.org/ which is still active.

He's right as often as he's wrong

What has he been wrong about?

Okay. I was gonna respond with something snarky about what a crappy mod he was, or how he was a Saint and you don't deserve to worship at his functional feet. But I'll tell you what he was wrong about: He was, as a leader and a human and a mod, petty. He pied pipered himself into a sweet spot and no one would deny he's a good coder, but there the ego took off and forever left behind a skidmark. The cool exterior, the sense of self-importance, the punching down, above all the love of spreading one's revelatory wisdom to the poor little guy; you can love that sort of thing too much, and he did. Perhaps you weren't here or didn't interact with him directly. In my view he became dismissive and derogatory toward people who worshiped him (like you) once he acquired a small degree of fame.

What in the world are you babbling about? Who here does worship who?

I'm going to guess Paul Graham, who wrote the tweet. One of the founders of YCombinator and HN (just in case people who read this don't know).

I'm starting to think Yahoo circa 1994 might be better than Google today.

I wouldn't just complain about Google. Google search results mostly reflect a deeper problem with the web today. I do miss the simplicity of the 2000s.

The funny thing is that if the people who worked on spam at Google were free to talk about it, I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence. It's a ridiculously hard problem, especially when people are targeting you directly. But they aren't free to talk about it, because if they did it would just give more assistance to the spammers, and make the problem worse.

I'm not saying that curated search results for particular verticals is a terrible idea (though I'm sure like anything the devil is in the details), but on the whole Google search is very, very good considering the constant assault they are under from spammers (which most other search engines are not, at least directly).

The problem isn't that Google doesn't employ these people or invest in their activities.

It's that Google has destroyed their own search results in order to continue to expand their revenue opportunities.

If Google:

- Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

- Allowed you to block certain domains from your search results, like YT videos. (If they added some kind of "coordinated network detection" and down-ranked domains coordinating with ones you've blocked, that'd be pretty cool).

- Allowed you to create your own custom search engines, like "Programmable Search Engine".

That would be incredibly valuable. They already have most of the tech. They could even create a subscription service around custom search engines if they really wanted. Plenty of people would find something like that incredibly valuable.

Anyhow, buried in there is your startup idea. Remember: your startup doesn't have to generate the same revenue or profit as the incumbent on day one to be successful.

The biggest perverse incentive for Google is that making better search results can mean less clicks to ads (clicking an ad because results are crap, going thru more pages of results means more ads). Clicks are revenue which is much easier to optimise for.

Internal Search owners can push for better algos, but what if the algo causes revenue to fall? Are there strong forces strong enough within the organisation to ensure that search quality prevails?

If this is the case, the problem is existential. It can only be arrested at the very top


The biggest perverse incentive for Google is that making better search results can mean less clicks to ads

This gets close to the real root of the issue -- attention is monetizable independently of the quality of content. There would be much less incentive to create SEO spam if search engines negatively weighted pages with ads and affiliate links, and if manufacturers were barred (e.g. by the FTC) from owning or imitating reviewers.

> The biggest perverse incentive for Google is that making better search results can mean less clicks to ads

This is also something that Google can control if competitors come along.

i.e. If a reasonable competitor comes along that is willing to sacrifice ad revenue for better search result quality than Google, google can just adjust their search quality upwards to knock them out (and then adjust it back once the competitive threat is gone).

Perverse incentives from Google are all over the place - Searching for the delivery business "Just Eat" in the UK for instance returns an ad for their competitor Deliveroo above the legitimate organic search result for me - and I can also see that JustEat are trying to pay for their own brand name just to compete - and IMO this sort of behaviour is anti-competitive, borderline extortion considering Google is the de-facto way of searching for a business, and shocking from a search-quality perspective (where the wrong result is intentionally shown at the top because they paid more money).

If I had to pay $10/month for good search results, I absolutely would. I think most people would. Get rid of the ads and spam, and you have a service worth a premium. The solution is to make it user-centric instead of advertiser(spammer)-centric.

Some kind of browser or extension that re-ranks and filters search results on the web.

The custom search engine is harder than you'd think.

Google's search algorithm is tuned up for searching the whole web. It turns out the heuristics you need are very different depending on the size of the collection.

When Gerard Salton was doing IR experiments with punched cards he was working with collections of as little as 70 documents and in that case you are going to be very concerned about recall and not precision. Maybe there is 1 relevant document and if you miss it you failed.

If you had 70 billion documents you might have 10,000 relevant documents and if you lost 60% of them you still have 4,000 documents. The end user gets more results than they can sift through.

Thus I always groan when I see a site is using "Google Site Search" because the relevance is usually worse than you'd get with the alternatives.

Connected with that is the tuning work: Google has sufficient data to tune up a big model for everybody but true personalized search eludes them because they don't have enough data from you to tune up a model for you.

I agree with you that "true personalized search eludes them because they don't have enough data from you to tune up a model for you". That's what Larry Page said as well "Google doesn't know what you know". His ultimate goal is Answer Machine powered by AI but that's not happening anytime soon. I think internet search engines that we are using today are primitive compared to what we will have in the future.

The problem with all of this is it would help us greatly, but it would be useless to the 99% that the internet is increasingly being designed for. Modern UI trends are becoming obsessed with removing as many options and features as possible so the dumbest humans bordering on smartest vegetables can still use the service.

And customization breaks caching.

It does not if there are common interests and characteristics among users. Let's say for example I'm a young African-American girl who wants to learn how to code and I query "how can African-American girl learn coding?" and Google shows me Black Girls Code a non-profit organization that focuses on providing technology education for African-American girls. Considering that Google knows that I'm African-American girl and that I want to learn coding, how many other African-American girls want to learn coding? Probably many so caching doesn't break customization and personalization as long as Google knows my characteristics and interests and characteristics and interests of other people that are similar to mine.

It doesn’t have to actually, there are some pretty advanced caching mechanisms that allow you to combine cached elements together. For the web at least, you could do this back in the day with Server Side Include, and a place I worked at used it to cache logged in content.

> That would be incredibly valuable. They already have most of the tech. They could even create a subscription service around custom search engines if they really wanted. Plenty of people would find something like that incredibly valuable.

Why would they do this? Google's customers are the advertisers, not the end-users. And no one is going to pay for a search engine, it's been tried and has failed.

> And no one is going to pay for a search engine, it's been tried and has failed.

Always curious about things like this. I certainly would pay for this; it sounds like many other people here as well would. I'm curious if the constraint is that there aren't enough people to actually pay for the investment required for the service, or if there aren't enough people willing to pay to meet the standard VC notions of success. We seem to have a problem with building and supplying services for niche (read: "not expressable as an integer percent of the world's population") customer bases, and I'm never sure if that's a business problem or a cultural problem.

The people most able to pay for a service like this are the people that advertisers most want because they’re the people with enough discretional budget to spend on things like better Google search results. Allowing someone to buy something like this also reduces your attractiveness to your advertising clients.

I think you have to look at it more like Amazon Prime.

Nobody is going to pay for /just/ a search engine. But they might pay for, say, a /better/ search engine, plus additional features around gmail/gcal/gdrive.

Think of it more as subscribing "to google" and less as subscribing to "google search".

Regardless, the point isn't to "fix" google. It's to highlight a possible path for a new market entrant.

... If an existing player wanted to make a move here, I would say that both Mozilla and Apple are well positioned to add "personalized search" to a subscription service. Same with Microsoft. DDG could also make moves here if they expanded beyond search.

You only need ~1/10000th of Google's revenue to be a financially successful startup. 1/1000th and you'll have a great business, and at 1/100th you'll be somewhere between a unicorn and a decacorn.

sure, but you'd need a better search engine if people are going to pay for it.

A company with an objectively superior search engine could make even more money with ads so now you’re back to the beginning

I don’t know if that follows. Google has been maximizing revenue at the expense of the search result quality.

An “objectively superior” search engine, from an end user’s perspective, might have to make engineering choices that come at the expense of ad revenue.

But we’re all just talking hypothesis, it’d be cool to see someone launch a startup to get some answers.


if you think about it, Google provides advertisers a customized search engine to find customers. So it is not you searching the web, it is web's advertisers searching leads

there is a smaller niche market. SEMrush is a tool used in the digital marketing industry that is now public and has a multi-billion dollar market cap. It originally started as a search engine. When they didnt gain traction they used the tech to monitor Google and interface it for customers who are tracking their performance in search results (and much more).

Can I ask what you mean by public?

It's not open source as far I know and there's only the free trial way to try it.

Probably that they completed an IPO (initial _public_ offering)

How do you fight brigading, the organization of groups elsewhere to collectively vote on something? Eg white supremacist groups get together and vote down everything by people of color, and vote up their pages about how great they are?

Randomly select votes that are actually recorded. Then add in metavoting that votes on the votes with random sampling. At Google's scale with a sufficiently random sampling you'd be extremely hard pressed to successfully brigade or spam the voting.

Google could easily use its current fingerprinting to constrain (to an extent) multiple votes. Even knowing only a portion of the population will participate in the voting they can use a Wilson confidence interval[0] or similar to properly weight votes.

Random sampling works here since you're not guaranteed one vote per user per page and the outcome in binomial, seen and downvoted or seen and not downvoted.

[0] https://www.mikulskibartosz.name/wilson-score-in-python-exam...

easy, voting blocs, you assign yourself to the results of people who vote similarly to you. additionally there'd be local and regional blocs too. I can't think of a reason that the naive everyone sees everything everyone else is doing would work in the long run. That's Twitter, and it's garbage.

This is a great point. I would think Google could rank users from low quality to high quality in terms of the quality of the websites which they recommend or downvote. Tricky business and could be difficult to control, but basically the same thing they currently do for websites, but extended to humans.

How does Google already handle this exact problem on YT?

They don't. There's a lot of pretty obvious manipulation that goes on in YouTube recommendations and search results.

> - Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

They're going the OPPOSITE DIRECTION from this!! They recently removed all downvote visibility of YouTube videos from the user, so now downvotes only feed into their algorithm. So in the last line of defense of me ending up watching a shitty video, one of the most valuable tools has been removed by my betters. It's preposterous that people think that Google is doing a good job. They're actively getting worse, and ignoring everyone saying so.

They're doing a great job. I'm so happy dislike visibility have been removed. It removes the effectiveness of pile on coordinated harassment which many youtubers have fallen victim too.

Except the ratio is visible for the video creator, so they know nonetheless. And there's the possibility of disabling the the like/dislike for each video.

All the big channels I follow are mad at this change, and there's a coordinated effort to bring it back.

>Enabled downvoting on results, like YT videos.

you mean the dislike counter they just disabled to force people to sit through more low quality content and pre-roll ads to claim increase in platform engagement and viewership?

The only thing matters is revenue and Google had increases in acquisition costs in prior revenue reports. Expect to see the data points for the latter metrics to be highlighted on the earnings announcement, and a record quarter for YT coming out of the change.

> Allowed you to block certain domains from your search results

I would love for Google to build this in. Until they do, there is a WebExtension that does this: https://addons.mozilla.org/en-US/firefox/addon/hohser/ ("Block or Highlight Search Engine Results"). I use it to block stuff like W3Schools so when I search for something, MDN is always #1. Saves me a lot of time having to add "MDN" to the end of every query.

Those shitty SEO spam sites exist only to serve ads, and Google has a monopoly on internet ads. So there is no real incentive for them to solve the problem.

Google has 28.9% share, Facebook 25.2%, and Amazon has 10% and growing fast. Not a monopoly, and the incentive is there: if search results are consistently bad, people will stop searching as much, and revenue and market share decline.

Google had the same incentives in 2011, 2012 when they built and released Panda and Penguin.

Real review sites serve ads too. I don't think Google has any incentive to make things worse, and they still want people to google reviews instead of just asking friends or people on reddit for reviews.

Some kinds of "spam" can improve search results.

Things have changed in the past few years, now that Google has developed advanced transformer models, but for a long time Google's question answering facility has been: "let spammers make 10^8 pages where the title is the question and the answer is in the page".

The trouble is that there's a fine line between "answer is in the page" and "word salad!"

> - Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

Not convinced this would help. The spammers would just hire people to dislike competitors

> - Allowed you to block certain domains from your search results

This I would use. Never show me results form collider, watchmojo, ranker,

> - Allowed you to create your own custom search engines, like "Programmable Search Engine".

I think this would lead to people writing highly polarized engines. The Red Pill engine for example and we'd have a new problem, the proliferation of popular highly biased results. Of course that's not to say Google's results aren't already biased but they certainly are trying to cover everyone.

> Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

Are there any search engines that do this? It's a great, simple idea.

Not really that simple, I see a lot of potential for abuse - using bots and brigading to mass downvote your competitors or political opponents.

Couple of positions up or down in google results for somewhat popular and valuable keywords can mean the difference in thousands of dollars per day of ad or affiliate revenue. I suspect it would get pretty wild if google launched something like this. There already are black-hat seo methods and services, but something so simple and direct would turn it up to 11.

> I see a lot of potential for abuse

They already have the tech to fight this on YT. They, in theory, are supposed to be doing the same thing to detect inauthentic behavior on ad placement and click abuse.

Downvotes could apply just to future recommendations for search results you see, and not apply to advertisements.

> Allowed you to block certain domains from your search results

Blocking pintrest would be a dream come true.

This 100%. In travel, we see Google constantly tweaking its algorithms, and compared to Bing, Google surfaces a ton more small, well-written travel blogs [1]

Not only that, Paul and Michael have seen plenty of startups, and at least in recent memory, the number of vertical search and consumer startups that Y Combinator has funded hasn't been that high

As a consumer startup, I know this issue firsthand. Paul and Michael assume that if you build a better product, they will come! That's simply not true these days.

Instead, you need to:

- Build a better product

- Option 1: Figure out a channel with enough growth on an existing platform. This likely means you're doing SEO for your new search engine

- Option 2: Get your customer lifetime value high enough so you can pay for ads. This is tough, since it's a bit of a chicken and the egg problem since most search engines are monetized with ads

As the founder of Wanderlog (YC W19; https://wanderlog.com), a consumer vacation planning app [1], I definitely remember the idealistic days when I thought the best consumer product on its own would win! But growth doesn't just come, and the same can be said of vertical-specific search engines.

[1] Try searching "[your city] itinerary" on Google vs. Bing: it's much more likely you'll find a small blog rather than Lonely Planet or the local travel bureau as the top result

Hi! I used Wanderlog to plan a recent month-long group trip, which was definitely the most complex vacation I've had to plan. For context I am very active when traveling (e.g. multiple activities each day); so not sure how my experiences map to others.

The best part of it was (going to a foreign country) being able to find / identify all the attractions relative to each other, so I could go to cluster A on Monday, cluster B, on Tuesday, etc.

The hardest part of it (and why I needed to create a separate google sheets anyways) was--once I figured out opening hours of different locations, hard-to-book activities with limited reservations--the ease of moving things around more fluidly e.g. cluster B on Monday, cluster A on Tuesday, etc. and having a more information-dense view so I could see larger portions of the itinerary at once.

It would be cool to have an "input everything" --> "input time restrictions / unmovable things" --> output planned activity cluster type workflow.

[1]: both signed in, but with the profile image removed

Bing: https://i.judge.sh/ShareX/2022/01/www.bing.com_search_q%3Dat...

Google: https://i.judge.sh/ShareX/2022/01/www.google.com_search_q%3D...

Interestingly Google didn't have a top-result ad and the google.com/travel carousel is 4th from the bottom.

For the actual results, both thefearlessforeigner.com and paigemindsthegap.com seem to be actual travel blogs (the pictures didn't appear in a reverse image search, so they are probably organic), but they're clearly geared towards being a 'faq' for visiting the city and have affiliate links where appropriate. Bing went straight for discoveratlanta.com, and frommers.com is well-thought-out but not a personal travel blog.

>> - Option 2: Get your customer lifetime value high enough so you can pay for ads. This is tough, since it's a bit of a chicken and the egg problem since most search engines are monetized with ads

nonoonononooonono. No. Don't monetize anything for the first 10 years. That's the only way it can work. Then you can go monetize it and buy an island and not give a shit if you destroy what you created.

Oh but don't worry. You'll have investors.

Also, i'd be very surprised if they didn't have tens of thousands of workers aiding in spam review already.

The hard part in all of this isn't finding and stopping spam - it's defining what spam is. Are all the pie recipes where there's a 2000 word essay about their grandma at the top 'spam'? They still have the recipe, and Google Home devices pick up the recipe instructions just fine so people end up not reading it, but many people would still consider that spam since it adds such an obstacle to getting the information you want. Same for cnet articles like "Best smart home devices to buy in 2022" - it's a reputable brand with a list of smart home devices, but it's hardly a review and exists to funnel people to their Amazon affiliate link.

> The hard part in all of this isn't finding and stopping spam - it's defining what spam is.

This is one area where Google could use personalised results to provide a better experience for the user. Let me decide what spam is for me. Let me mark results as good or bad, so that the algorithm knows what kind of pages should be prioritised or filtered out the next time. Google SearchWiki was a step towards this but they killed it off.

Is conservative leaning info spam or not spam? What about liberal leaning info?

We have seen what this leads to inside the social networks as well as YouTube, and at a macro scale I think we might want to have a shared concept of what constitutes a good search result for a given query.

At micro scale, it can seem more optimized to get exactly the type of result you want, but if we take an absurd example like an Apple Pie recipe shouldn't we all have shared understanding of what types of ingredients would make for an Apple Pie?

The shared understanding, I believe, is core to communication. If all of us have our own specific ideas of Apple Pie, then who is actually right on what an Apple Pie really is? What happens when your search results insist that an Apple Pie doesn't actually have apples in it, but instead pears?

Let's have niches where the content is hand curated by human beings instead of pure statistics by machines.

Hmm why stop there let's actually make the users do the curating and even the content creation by rewarding them with social validation. Let’s have hard working moderators who work on the community full time.

Then we could just build a search engine over it. We could call it Reddit. Or HackerNews.

Maybe the users aren't all as good as professionals at curating the information. Let's hire professionally trained curators pay them well and we could call them newspapers. Then we can come in disrupt them and replace them with an algorithmic marketplace that eventually becomes infested with click bait.

AFAIK the 2000 word essays in recipes are Google's fault - it prioritizes pages with a lot of content, so you have to add that junk to the top in order to rank highly. While I'm sure there's more going on behind the scenes than I'm aware of, it does seem like the rules could be altered on a category-specific basis where a lot of text isn't necessarily a positive.

This reminds me of the page inflation that struck tech books during the late 1990s / early aughts. The Marketing Wisdom was that fat books sold (or took up more shelf space), so texts got padded with weak writing, gratuitous puffery, and other elements, which (much as the recipie essays) simply got in the way of delivering actual informative content.

(The fact that many of these books were rushed out with very poor quality control also didn't help.)

Recipe intro text is useful for contextualizing the recipe and copyright purposes. In RSS days, it was a way to get readers to click through, so the author got the ad views. Also people who write recipes like to write about food.

I'd say it can be useful, but that's not often the case (especially not to the tune of 1000+ words).

This one is hard because it does actually seem to be the case that the cruft around the recipe is valuable if the content is right. Most of the recipe blog stuff is garbage, but if you look at youtube it is clear that creators who add extra flair around the recipe are a powerful force.

prioritizes is correct, but in some ways it's not the best descriptor.

Google's algos, while advanced, still rely a ton on text to actually tell what the page is about. They need it.

If they just relied on other factors (title, links, website, etc.) they would end up with worse results for users. Im sure they've tested it.

Google's core algo in a lot of ways is much simpler than people think (in other ways of course it's very complex).

While I think the essays are excessive, I appreciate that some of them document that the blogger actually made the recipe with progress pictures. With the more basic recipes websites, I wonder if anyone's actually made it before or if the recipe is from some scrapped database of unknown origin and quality.

Years ago I wanted to pursue micro blogging, but this "feature" of Google search stopped me from doing it.

What's the point of writing succinct, to-the-point mini articles about problems and solutions if nobody finds them on Google?

This is largely because micro-blogging means less content, and less content means you could write five 300-word blog posts instead of one 1,500-word post.

I've done blogging for the last 10+ years, and many of those I spent as a freelancer working with startups/brands/editorials. Everyone is after "word count" and I absolutely hate it.

Whenever I work on articles for my own blog, I just don't consider word-count at all. I think if your content is great and informative, then readership will be natural.

This is a very interesting approach. Do you have traffic data collection on your blog?

I collect post views, but not using Google Analytics or anything like that. I built a pretty substantial developer blog (tips, resources, etc,.) back in 2014. I think it peaked at around 350,000 monthly visitors after 12 months.

Later on, I sold it because I needed the money. Not so much that I didn't want to keep working on it. Unfortunately, the new owners didn't have any idea how to maintain a "healthy" content blog, and it has plummeted down to around 30,000 monthly visitors. All the content they're publishing now is some thin headline-clickbait bullshit.

I even gave them free advice on how to fix it, but I think that for a lot of people, they just don't care and will mindlessly pump out as many pieces of content as possible. And such blogs can be identified from a mile away.

And therein lies the problem with Google SEO at the moment. Even myself, someone who has done SEO work for more than a decade, I can see that results are getting worse. In some niches, the same crappy articles that dominated 6-7 years ago are still dominant today.

I guess we're stuck in time, or so Google thinks.

Could it also be due to reduction of public interest in blogs over the past few years? Most stuff are now published in the form of vlogs instead of blogs. I do miss the good old blogs era, tho, and I wish there were still high quality blogs around.

It's two-fold. If Google priortizes pages with a lot of content that's one thing, but longer content also means more space for ads, or more scroll events to trigger ads, etc.

Incidentally, prioritizing long content seems odd to me, in my experience the best pages are short and get right to the point, at least in the context of something like a recipe or other "how to" resources.

Yeah, the newest nuisance seems to be sites that clone Github Issues and StackOverflow with a crapper interface. Somehow they rank higher than the original sources. I'd say it's spam but it's definitely not traditional spam.

I'm not going to say solving spam programmatically is easy, but the gitmemory garbage site (for one example) has been around long enough that there's no excuse for not downranking or removing it. How hard could it possibly be for humans to spot these few sites and nuke em? I'm sure Google engineers see them all the time.

And the strange Wikipedia mirrors that are shown in Google Verbatim searches instead of the original. If I disable Verbatim, they disappear and I get regular Wikipedia instead.

Somehow you got down votted by their creators here :)

> and Google Home devices pick up the recipe instructions just fine so people end up not reading it

I think this isn't entirely related, but that's perhaps the beginning of a bias you might end up having that everyone experiences technology in the same way as it marches on. I've yet to encounter a Google Home in the wild, I imagine far more people are consuming recipes on phones, tablets and PCs.

Compounding the problem, the 2000 word essay is sometimes really useful if it's describing a technique used in the recipe (cf Stella Parks' recipe for homemade bagels on Serious Eats: https://www.seriouseats.com/homemade-bagels-recipe). But somehow only spammy blogs with plagiarized recipes, AI-generated "essays," and affiliate links for every ingredient and tool used make it into the first page of results on Google (or DDG, for that matter).

At some point, Google must have moved away from using site-level reputation in search rankings, as I almost never see recipes from reputable sources like King Arthur Baking, Serious Eats, or Food52 in the first page of results.

Your point is good but I'm not sure I'd say very good given how easily the same SEO spam domains can stay at the top of search results for ages simply by scraping someone else's content. What I'd be most interested in knowing is what their success metrics are defined as — for example, how much of a problem does Google's management consider it if someone searches, finds the answer they were looking for on someone's Stack Overflow rip-off, and stops searching? I could easily believe that a significant amount of what we're seeing here is that they're focused on some kind of user frustration metric which doesn't include things like damage to other businesses.

Yes, I've noticed this particularly with technical results. A lot of sites seem to have scraped StackOverflow and GitHub issues, put a crappy ad-loaded interface around them, and somehow out-rank the original SO/GitHub content.

It's like the bad-old-days of ExpertsExchange, which somehow was never delisted by Google for its shady SEO tactics.

You just have to look at Google's profit motive here. Their motive isn't to provide quality search results. Their motive is to show users ads, either in the search results themselves or on the destination sites via their ad network. The SEO spam sites aren't a bug, they are a feature of Google's profit algorithm. Google's search quality will never improve so long as their motivation is to show you ads. Why should it? Competition may help here, either by an outsider like the OP suggests, or via breaking Google up with anti-trust enforcement, or both (my preference).

As a user, your best personal and ethical move is to install an ad-blocker, to make ad-based business models less viable, which will help promote business models that don't abuse the customer.

The core problem, I guess, is that search engines view all their results as ads. That’s why they got into the ad business in the first place.

>" The core problem, I guess, is that search engines view all their results as ads. That’s why they got into the ad business in the first place. "

This seems a bit overly cynical. Some search engines only served ads, but they're long gone. The survivors are those who dedicated themselves to finding links which were responsive to people's search intent. They seem to have gotten into ads because it was the best business model in this market.

> It's like the bad-old-days of ExpertsExchange, which somehow was never delisted by Google for its shady SEO tactics.

This is really what made me suspect that Google was teetering on the edge of the MBA death spiral: these problems run for years when they'd be easy to block, which suggests to me that whatever metric gets you a bonus / promoted doesn't include things like that which are long-term threats to their core business even if it's selling a lot of ads short-term.

> A lot of sites seem to have scraped StackOverflow and GitHub issues, put a crappy ad-loaded interface around them, and somehow out-rank the original SO/GitHub content.

Some even made slideshows of SO screen captures and put that on Youtube, with a fake video or spoken intro to make believe an actual content will be discussed... A number of shameless people would go any length to grab bits of money anywhere and anyhow, and I've hit those links a couple of times.

They outrank the original content because google is corrupt.

I said this in a similar thread yesterday, but I think this is an unsolvable problem because much of the content either no longer exists in website form or is old.

To put it simply, a new generation of the people who used to make the reliable niche websites that not just answered your questions but also helped you learn a particular topic have moved to youtube instead.

Google search is hollowing out as a result with the meat going and the SEO'd fluff that kinda answers the question but ONLY the direct question being asked with none of the wider expertise that more educated people in what they were searching for.

Of course google owns youtube as well.. so perhaps they just see it as an inevitable transition.

Just a note on that, youtube search is finally getting better, yesterday I noticed it was able to find key words in the middle of a lecture that had nothing in the title or comments. I always wonder about their AI transcription service, it's gotten so good, if they're storing all that audio as text, I guess their search is going to get excellent?

Is that...essentially a Proof-of-Work system...

The problem, IMO, might be the monoculture we have around search. Because Google is soo big, it's enough for spammers to target it and they have the vast majority of the search visibility. If we had better, more diverse competition, that might manifest as a tradeoff, presumably, they would have competing and diverse criteria so you would probably not be the top result on _all_ dominant search engines. SEO spam needs upkeep and attention to latest algos, else it decays. Competing algos would yeld better results for everyone. Maybe Google is just ripe for a shakeup.

Doesn't your model predict that Bing would have substantially less SEO-gamed results?

(Disclosure: I work at Google, but not on search)

Well... Yes, it should. But, no, it seems it does not. I thought about this when typing it but did it anyway, maybe because I thought there is still something worthwhile there.

I still think the model could work if the algorithm is sufficiently different than Google's. Ideally, people would go "I did not find anything I cared about on Google, I know, I'll use Bing!" - but nobody does this, because the results are consistently worse.

Don't get me wrong, I like G as a company, I think they do worthwhile things! But they have left things slip and need competition into this field, I mean real competition, then maybe they would actually address issues.

Maybe the issue is also on the incentive level as well. I mean more searches means more eyeballs and more money for Google. If someone searches one thing and they are done that is less interaction! I hope they don't work like this, but it's possible.

And another possible problem is the opposite. Maybe Google is optimizing search for what it thinks people want, but it uses the wrong metric. Or it gives people what they want but not what they need.

This x100000.

There is no scenario - none - where thousands of engineers at Google working on search wake up in the morning and say "we sure have made it good enough wr2 SPAM. I think I'll have another Danish."

I agree with this and the grandparent comment wholeheartedly. That said, there's a kind of institutional blindness that can build up in companies—especially ones that dominate their sector. It may have roots in intransigent upper management, ossified and inflexible process, wide-scale burnout, a culture of passing the buck, or any number of other pathologies.

I don't claim that Google has any of these and certainly have no insight into their search group. But I've personally been at powerful companies with best-of-the-best talent that were blind to the decay in their own living room, so I would caution against immediate dismissal of PG's take.

Especially since "made a change that improved search result relevance by X%" is an extremely compelling story for promotions. If indeed there is a launch-driven culture for promos at Google then there'd be extra incentive for new mechanisms to reduce low quality search results.

When the cafes were open, you can bet they said, "I'll have another Danish, and then get back to work on this problem that never seems to go away."

I sure wish I had problems that were totally unsolvable, they are so easy to measure progress on. /sarcasm

I think it’s more likely that because they are just building hundreds of tiny tweak experiments and it’s someone else who desides what to build and if it even worked. Search quality is such a meta-problem that it goes beyond any real hope of simply working on it in anything beyond piecemeal trial and error fashion on their dataset.

What is a danish?

A breakfast pastry something like a donut.


The curated search results business model doesn't work. Google gives "aggregators" and other search engines the death sentence for organic search traffic from economically meaningful queries, so you'd get no free traffic. This is one of the major antitrust complaints against Google in the EU. Since you get no organic search traffic, you need to build a brand using advertising, and once you start down that road you need to monetize the first click which compromises the quality of your site.

> This is one of the major antitrust complaints against Google in the EU.

The complaints I've read are from exactly the kind of generated content farms people are complaining about in this thread.

I'm sure they know all about it, but are prevented from doing anything by the business model. Pinterest has been spamming up my search results for years. Maybe other people find it helpful, but I do not. It's obvious I am never going to get value from Pinterest. Let me click a button to add it to my block list. One single click would have given me years of massively improved results.

The fact that this feature does not exist shows that there is something deep within Google's core that is preventing them from addressing SEO spam, just like there is something deep within Airbnb that makes it difficult to filter out Airbnbs with problem reviews.

Google has been coasting for a good long time and now major players are realizing they are wide open for disruption.

> But they aren't free to talk about it, because if they did it would just give more assistance to the spammers, and make the problem worse.

The reality is more that some Google engineer will come up with an algorithm change that makes the result 40% better, but it will come at the expense of making that search 3ms slower so the change won't get merged. Or it will make the results worse for some niche set of queries that the business team really cares about, so again it won't get merged.

There are lots of consumers who would gladly pay $1 a month or whatever in order to use a couple extra milliseconds of compute power per per search in exchange for drastically better results, so there is lots of room for a startup to compete.

> There are lots of consumers who would gladly pay $1 a month or whatever in order to use a couple extra milliseconds of compute power per per search in exchange for drastically better results

Google has a paid-for Search API, so they could do that if they chose to pursue it. And then they could let Google One users opt-in to the same thing via ordinary Search. I'm not sure whether Bing has anything equivalent.

I think the problem is just that the solution isn't in Google's wheelhouse: There is no algorithmic ranking system that can't be gamed. Human moderation and curation is the only way to provide true quality, and Google is allergic to solutions that don't automate and scale.

I think a really good search engine would still algorithmically search it's index, but the content library should be human-curated with a goal of ingesting content via author, not via platform. Once a given author was human-approved as a quality source of information, content they produce could be automatically ingested going forwards, and conditionally re-reviewed by a human if there were reports the quality had decreased.

This was Yahoo in late 90s early 2000s. They had a human curated directory search where one could look up something like "kayaking" and find a bunch of sites on kayaking. Then if you wanted to search on keyword it was outsourced to AltaVista and later Google. Altavista results were terrible and were almost nothing more than a keyword search (IE the word you were searching appeared on this page). Google got much better at the general search and this was history.

I think the death of the directory search dramatically dropped the number of self-curated, informative sites from a domain expert that were common in the early internet. Now instead of making a website, many people are on content silos like Reddit/FB

I do still think we could adapt this model on top of content silos... assuming we can index them! Consider that one could also, rather than just ingesting Reddit content, we ingest new posts from particular users who write quality posts on Reddit.

Assuming a method also existed for an author to authenticate themselves with the search engine, one could also enable an author to help identify their content across multiple platforms, as well as suggest other quality authors to consider.

> The funny thing is that if the people who worked on spam at Google were free to talk about it, I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence.

That may be true, but I think one of the good points made on the OP is that it might actually be cultural constraints that keep them from solving the problem:


> You might need to do a lot of manual spam fighting initially. That could be both the thing-that-doesn't-scale, and the thing that differentiates you by being alien to Google's DNA. (They must hate manual interventions; so inelegant).

Google has some very smart and knowledgeable people, but the things they do have to fit into certain boxes, which means there are some problems they just can't fix, e.g.

* Everything has to be automated at scale, which leads to consistent poor user experience (unappealable account closures initiated by inscrutable algorithms, SEO spam).

* You get promoted by building new products, not maintaining existing ones, which leads to self-defeating churn outside of core areas (e.g. abandoning Google Talk and squandering their position in the messenger market).

* etc.

I understood PG's point differently. My understanding is that he is suggesting an angle of attack in which carefully crafted manual reviews (that do dot scale) can be used to bootstrap a product that does scale thanks to something else (e.g. collaborative filtering). All of this being on a niche domain where you can drive a wedge into the mediocre performance of Google (online shopping probably being the worse possible choice, but there are many others).

But why is Google even dealing with spam? What if they (or someone else) curated top websites for a given category? For instance, when I search for a programming-related term, I already know that I want to see the answer on either Stack Overflow or one of a few reference documentation sites. It is possible that some other site could have the answer instead, but in practice the random sites that often show up at the top of the results are usually SEO spam. A search engine that figured out or let you select the semantic space you are in and then promoted known websites - maybe ones you curate yourself! - would be a big improvement.

Of course you can always hardcode the site you want in the Google search results but this is hacky and not very expressive.

Legitimate sites could help a lot by adding machine-readable descriptions of their content, per the schema.org spec. The richness of these descriptions means that this is effectively a "hard", non-forgeable claim to being a worthwhile, non-spam source (quite unlike the old META tags that got abused to death pre-Google). Of course spam sites could simply lie in their schema.org tags, but the lies are easy to spot (with combined machine- and human-review) and then they just get banned. It makes it a lot harder (and hopefully infeasible) to SEO-spam by just copying random content.

A lot of what counts as spam these days isn't something like "I search for bicycle reviews and get penis enlargement pills", it's more like "I search for bicycle reviews and get some blog who searched Amazon for the 5 most popular bikes and posted links to them with a little blurb and called it a 'Review'".

These sort of things are easy to spot, but only if you actually have a basic amount of familiarly with the topic. It's hard to spot with "AI" or super-cheap labor.

You talk about this "constant assault from spammers" like it's not Google's fault and it's an intractable problem. That is not a correct characterization. There are plenty of low hanging fruit that could easily be detected and deranked, for instance scraped stack overflow spam. But google chooses not to deprioritize these results. The reason they don't is that they make money on ad clicks, which many responses have already elaborated on.

The search results markedly worsened in the last 5 years. Why could they keep up with SEO spam until 5 years ago, and now they can't? Their revenue has been growing dramatically, so they could proportionally increase the allocation. It's probably because the focus of their HR/changing workforce is now elsewhere: maybe fighting "disinformation": both COVID and political. Those efforts were non-existent 5 years ago.

I think it is also no longer in their interest. If you look at their mobile results now, there are sometimes no search results for webpages, just ads, and their automatically extracted data. So, it is in their interest now to have the search for non-advertisers to be bad. Eventually people will consider those results junk and just use the google extracted data/people who paid to go up.

Most comments focus on the technical side of things, whereas I'm sure there are also legal restrictions involved in this. If Google delists a website on the grounds that it's a copycat of stack overflow, or because they have low quality content according to Google's taste, there might be lawsuits filed against Google, claiming that the company is discriminating.

In which countr[y|ies] does Google not have the discretion to decide that certain sites / pages / etc. "belong considerably further down" in Google's search results pages? Seems to me that sorting the search results to #1, #2, #3, etc. is pretty well baked into their basic product.

I think the issue is that these crappy results are kind of good for revenue. It’s not just organic results impacted but all the affiliate ads.

Google is smart so I assume they crunched the numbers and figured out they make more money from people filtering through crappy results that include viewing and clicking ads than by surfacing good content.

I think Google is optimizing for ad revenue, not for good search.

I agree it’s a hard problem. I don’t agree it’s “really really good”. I regularly encounter obviously scammy websites. With Google’s js execution capabilities I’d assume they can detect that. I’m talking about the VPN install pop ups and so on. Right now there’s a whole bunch of GitHub.Io hosted sites that’s doing that. It’s not even porn. It’s home decoration stuff.

> I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence


I can point you to Hard Problems that have been solved better at little startups than at Google - or, indeed, at any other bigco. That's why acquisitions happen.

Why does Google having 1000 engineers working on a problem automatically mean they are the smartest?

Spam, yes, but Google has also made meaningful shifts that are clearly directed from the top-down. It's much harder now (imo) to get specific results, they've overall started looping SERPs into broad answers.

This is def a user-engagement strategy -- but it has cons as well.

Part of the complaints in the thread were spam related, other were something deeper

I don’t doubt it is hard but I’m forced to sign into Google now pretty much, just let me rate results and ban domains again etc. You will solve the seo problem really quick and start giving me results I want.

Highly OT, but if a technical person (not at a managerial level) involved in tackling spam at Google were to leave the team, are they allowed to work on the similar problem space at a different company?

I agree. You can say a lot of bad things about Google, but they definitely have some of the smartest and highest paid engineers working on their search. Plus there are already a lot of people trying to compete with Google and so far, no one seems to provide consistently better results.

The only advantage a startup might have is that they could do completely new concepts, such as specifying what area you search in, allow you to modify their classification of your query and/or moderating sites you include - which is probably necessary anyway, since you'll hardly have the budget to fully index the web. I'm not saying it's impossible, but it's not going to be easy at all.

And after all of that, you still need a way to make some money.

Really? I don't think the bureaucratic bloat at Google cares and the original authors of the search engine in its current incarnation are probably long gone. It is maintenance mode and I don't think they dare touch too much.

It takes time and effort to build up a spam site's ranking but it is trivial to blacklist those who get to the top.

A lot of people forget that one of the inputs to the Google ranking algorithm is input from human quality raters who work off of an extensive, 172 page, guide that Google publishes and updates for anyone to read: https://static.googleusercontent.com/media/guidelines.raterh...

Apparently the "human quality raters" never found the sites reported in this thread.

How hard is spam, really. If you're Google? Here's what I would do as a heuristic (uh, not evil?): We know everything about you and everywhere you've visited and everyone you've talked to in the last 60 days. We know all their phone numbers and email addresses. We even know the girl's phone number you met at the bar, who didn't give you her phone number. So if any of those people email you, we'll categorize that as "not spam". Also, if it's your boss or a coworker, "not spam". If it's a major company that's existed for more than ten years, not spam. Everyone else, spam. Done.

This is hyperbolic, right? But they can solve spam in a split second, if they just admit they're watching you all the time.

[edit] /s thx for reading to the end, folks.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact