Hacker News new | past | comments | ask | show | jobs | submit login
The top-ranking HTML editor on Google is an SEO scam (casparwre.de)
1743 points by caspii 7 months ago | hide | past | favorite | 395 comments

As webdeveloper I have a strong feeling that we are writing web for google bot and not for people. For any website I created I have a list from SEO what to add. Like 200 links at each page bottom, different titles, headers, metas, human readable urls without query params, all that canonical urls, nofollow rules etc. Most of this things invisible to users and created only for googlebot.

It never seemed important to the CEO of a previous company I worked for that we had something to say, only that we gave off the impression that we had something to say. We hired an outsourced blog writing service to fill our wordpress instance with generic, inoffensive platitudes and listicles poorly cribbed from Wikipedia and the ONS. Squint a little bit and you could convince yourself there was value to it, but nobody with any experience in the problem space would treat it as anything more than marketing fluff. His hope was always that one day we would get rewarded by the great Google algorithm and appear on the first page for search terms we were convinced our users were looking for, but the end result was that our blog was largely designed to be read by robots.

It's the same thing as the tweaks you have to perform for SEO optimisation, some have questionable value to the end user but you jump through the hoops anyway because it's what is done, by pleasing the robots you're rewarded with a higher search position.

Fortunately with GPT3 and the like I’d imagine this approach will soon have had its day. Not that I’m optimistic about whatever will replace it.

My sense is that Jevon's Paradox should mean the blogpap business will explode. Google will be filled with even more pithy, business-topic supporting SEO blather as human writers put GPT to work for a lot more clients than they had before GPT3.

Much of it driven by cult cargo SEO, throwing everything and the kitchen sink into the page in completely unproven hope that it'll somehow game the rankings

It is cargo cult but it's cargo cult because it is the way to "success". Company A have great page ranking, and blog about how they think they got there. Company B also have great page ranking, but think they did something different to Company A, so they blog about it, too. Everyone else reads both blogs and intersects what both companies did, and implement those changes. Iterate for every difference you encounter and voila.. you now have your rubber stamp SEO method.

Even (or rather, especially) if every SEO advice is correct, it still means that Google effectively has a lot of control over the shape of the modern web, alone through indirect pressure via SEO.

I am running every SEO advise as an experiment before implementing it across my network and a lot of advise actually brings results.

I know that it's being done, but I don't know if it's necessary. I frequently find good old unstyled HTML pages from the 90's internet (the ones with Prev/Next/Up links, like this: https://tldp.org/LDP/abs/html/here-docs.html) at the top of Google results.

I didn't check but to my recollection that domain is pretty old, domain age is supposed to be a principle metric for trust (which in turn is a strong signal for page rank). So, ...

I mean it's pretty reasonable, if a site has been around a long time it's going to be generally 'good'.

Some of the technical SEO is good though, like simply making the page crawlable and content being in a logical order.

The "fiddle with H1" or "write X amount of words" or "buy Y number of links with a % of anchor text" is silly.

> Some of the technical SEO is good though, like simply making the page crawlable and content being in a logical order.

Semantic HTML has been created to help screen readers and browsers understand content organization, it having been hijacked by SE is just a side-effect.

The point I was meaning to make is that it's quite easy to make a site uncrawlable, and therefore unfindable in search engines.

e.g. Google always had problems indexing Flash websites. It historically had issues with sites heavily relying on Javascript. Nowadays it's less of a problem, at least for Googlebot.

Though a useful side effect of SEO people finding it to be a useful side effect, is that what they are doing for their gain may help overall accessibility (where too often the opposite is the case, when people trying to game systems accidentally affect accessibility, it is usually negatively).

Yes, but even for those, it means that we are left to hope that what's good for a crawler and what's good for e.g. a screen-reader will still align in the future. Right now it feels almost coincidental.

The problem is, is that the internet at its conception was just a way to host content, not a way to discover content. When discovery was done via word of mouth or extra-internet means, the websites themselves were just for the people that viewed them.

Now, when the website needs to not only contain content, but also be its own advertisement, writing it in a way that will maximize virality is the natural course of action to make sure the site actually gets seen.

This will likely be true until a method of finding webpages that is not based on automated scraping or the page itself.

On the contrary, the Web, being a hypertext system, was definitely always about discovering content. If you found an interesting website, it would typically link to other interesting sites. There used to be ways to systematize these ad-hoc linkings, such as Web rings. And the first attempts to catalogue and categorize the contents of the (then tiny) Web were in the form of human-curated directories à la Yahoo. It’s just that in just a few years it became apparent that this approach could not scale, and search engines based on automatic crawlers became the norm – but again, critically, these too are of course fundamentally dependent on the Web’s discoverability by following hyperlinks!

This works well for random exportation, or exportation of related topics, however it is basically useless for finding information on a new topic as you don't have anywhere to start.

The only way would be to keep finding links like Wiki Game and hoping to get closer to the intended target. Luckily there are huge robots who have done this for you and can tell you which links lead to your destination.

Yeah I also don't really remember this extra-internet thing. Perhaps the author is talking about a very early period of the internet (which I don't know)? What I rememeber was that before 'real' search it was indeed what you describe, just endless chain of links of one site to the other and sites aggregating links.

Also, I remember web rings being helpful for content discovery in the mid-late 1990s. Different authors for a given subject would cooperate with each other and put something like a banner add at the bottom of their page with "next" and "previous" links, so you'd get a doubly-linked list circular ring of cooperating sites for a given subject.

worse: paid content farms / ai to generate crap "articles" by the boatload, targeting every organic search term 5 different ways.

The result is that ACTUALLY USEFUL articles are buried on page 5. Any slightly helpful bit of content in the top articles are repeated (using different grammar of course) in all the other "top" articles.

What I tell every client is that 90% of SEO is in writing good, relevant content. Technical SEO is more like housekeeping. Adding footer links is redundant if you have a sitemap and good navigation. If your users can find stuff easily, crawlers can too. The biggest technical things that I make a stink over are canonical URLs and https.

On the other hand, Google over the years has tweaked their algorithms and recommendations to match up with what makes a good site, in terms of content and markup.

Human readable Urls don't sound that bad.

Extremely useful for when a link dies and there is no useful archive.

As a mobile developer (sometimes) I rarely/never see apps that don't have Google SDKs bundled either..

That sounds...like a great reason not to get into "mobile" dev and stick to PWAs.

Well, yes, because googlebot is the gatekeeper of popularity and income for websites. Got to appease the decision maker.

Apart from stuffing 200 links in the footer why is this bad?

Well, that is unfortunately.

In 2015 I was fired because some issues on a site that I was working on because some friction with the company owner. Two months before I was fired I reported that some links to others sites non related to our service was on the initial page (some porn and some scams pages). After that I heard from my ex-coworkers that a manager from another area from the company told that I was fired because I was linking porn on some pages from our service. I didn’t knew at the time that those tools existed, but only today I realized that it is an option.

I was really sad with that manager and didn’t understood the reason to lie to my friends the reason of my demission. But is nice to know what may have caused the issue. Better late than never hahaha.

Wow that sucks. It's not just HTML Cleaners though. A few years ago (before snowden) I analyzed free proxy servers and found that most of them blocked https and many even injected JS or HTML into all requests [1].

I also wrote a tutorial on how you can build an infecting proxy too [2]. Doesn't work anymore though since HTTPS is everywhere. Thank god

[1] https://blog.haschek.at/2015-analyzing-443-free-proxies [2] https://blog.haschek.at/2013/05/why-free-proxies-are-free-js...

I find it hilarious that this made it's way into an Amazon listing for some waterproofing chemical. https://web.archive.org/web/20210607233655/https://www.amazo...

I find it even funnier that it appears in a research paper:


(It's on page 24, at the bottom of the References section.)

And a Seventh Day Adventists sabbath lesson, as the only portion in English.


Serious question: are you guys trolling? Is this like describing Rick Rolling by actually Ricking Rolling people?

I only ask because when I click on these links, I get a while bunch of legitimate text, but noting actually useful. Am I missing something?

An (obvious) injected link, as described in the article, is at the bottom of the Sabbath lesson I posted.

In the case of the last one it's immediately above the "Related posts" at the bottom. In all cases, this is a job for ^F. (I was genuinely surprised all the random links cited on the website _still work_, and had to search the page for them.)

go to "view page source" and then search

This is fantastic. I added this example to the post. Thanks!

It's even freaking funny that "SEO" appears as a Related search at the bottom of that search url. It has nothing to do other than a lot of people (we) come from a SEO article


SEO is so broken, it's not about website content or website quality. It's about how much money you pay to some punks - "SEO experts" who are hacking a system. I'm so sick of that.

If you Google stuff like "opening hours of ..." in Turkish(probably in other languages too), since many years the search results are only news websites spamming google, including the Turkish franchise of CNN, the CNN Turk.

The format goes like this: Lately people are searching for XYZ but is it safe to search for XYZ? What experts say for XYZ? To find out continue to read our article.

Then it's followed by wall of text made of keywords(in sentences that don't make sense), if you are lucky there would be the opening hours(which are often not accurate) somewhere down the text.

But that doesn't stop there. Even actual news articles are written for the consumption of the Google bot, the sentences often don't make sence, they are repeated multiple times with the synonyms of one of the words, making it into a lengthy article that doesn't have any meat beyond the title.

I argue that the problem is not SEO experts with low ethics, the problem is the way the business is structured. SEO experts don't do it for the sake of the art but because they are paid to do it. They are paid to do it because it has a positive ROI on bringing eyeballs and people pay Google for eyeballs, then Google pays those who generate the eyeballs.

Isn't it better for Google and everyone involved if you can't find what you are looking for, continuing your search brings more eyeballs? It's not like you are going to switch to Bing? You are also not going to abandon the internet and go to a library.

I've not seen it for opening times (UK here) but the same pattern is very visible elsewhere.

Entertainment/news sites are chock full of pages like "<whatever>, what we know so far, release date, cast, will it be renewed, has it been cancelled..." pages that spend many paragraphs saying "we know nothing, randomly plucking crap out of thin air we could guess something-or-other but that remains to be confirmed". A new news story, film, show, or even just a hint of something, and the pages go up to try capture early clicks. Irritatingly they are often not updated quickly when real information becomes available or that information changes (particularly over the last year that has affected release dates). I have several sites DNS blocked because that annoys me less than getting one of these useless/out-of-date pages more often than not when I follow one of their links.

Oh, tell me more about it. It's a painful endeavour to gather information about upcoming TV show precisely because of the tactics you described.

BTW, news websites in question are not doing it only for opening times but for any popular search phrase they can come up. Would be such a shame if outlets like BBC, WSJ and others adopted that kind of SEO.

> It's not like you are going to switch to Bing

From personal experience, I switched to another tool (DDG) a couple of years ago. When I occasionally try Google, for 95% of common requests I'm appalled by the results: the top is only SEO garbage. For very specific and precise searches (where people are not trying to game the system), Google is still the best, though.

Huh, you've given me a realisation - I don't do 'generic searching' on google anymore. I hear people say "google is broken" and I always think "it's fine for me" but thats because I'm searching for specific things, error messages, function calls etc. If I am searching for general interest stuff I tend to search reddit, hacker news or some other topic specific community rather than just search google

I just realised I do something similar - almost every term I search will have the word "Reddit" appended to it. It's not perfect, but at least the content is intended for human consumption.

Same for me, `site:reddit.com` for almost everything that has to do with product recommendations or reviews.

Putting this here makes it even less likely marketers will miss "gaming reddit" as part of their strategy.

They already do, but mods have a vested interest in keeping communities clean.

Me too, but DDG is using Bing under the hood though.

I thought that was just for image search?

I agree. Although DDG isn't exactly a bed of roses either.

DDG's refusal to honor booleans is putting a gun to it's own head.

The best is minus operands acting more like plus or quotes.

> Then it's followed by wall of text made of keywords.

I've noticed a rise of that as well. With some searches such spam is all I've received. But that's really a problem in all languages Google supports I think.

There's even malware that infects websites and generates such content, not sure what's the point of that. Anyone knows?

I'm guessing if even legitimate websites have similar content it's difficult to distinguish between fake and real content for an automated system?

> It's not like you are going to switch to Bing?

I changed the default search engine from Google to Bing and DDG in all browsers. Google does have better results, so sometimes I still need to use them. But for 90% of generic queries such as the weather, product information, or finding a company's website, Bing is good enough.

I used DDG as primary engine for a while and it was more like 30% effective

Fix the system? People who comment online seem to think the concept of the "search engine" cannot be improved, except by Google. The list of inactive search engines at https://en.wikipedia.org/wiki/Search_engine is depressing. The problem for us is that the supposed innovator Google has little financial incentive to improve the system regarding "content" or "quality". As long as the traffic keeps coming, the ad revenue keeps coming. Their best bet is promote what's "popular" ("top-ranking"). Because the traffic keeps coming no matter what Google does, "content" and "quality" are not really their major concerns. There are no true alternatives for users. Bing is basically a Google clone. No new ideas. Other search engines, like DDG, just piggyback off Google or Bing crawlers. Not sure about Baidu, Yandex or others but I suspect they are more or less Google clones as well. In every case, advertising dictates design. No new ideas.

If a topic or search term is present in any form of news article (If one paper has them, they all have them shortly after), the search results are just extremely bad. You know that Google promotes its media friends and by now Google results look like an ad list. They haven't stopped innovating, they are moving backwards.

It would need an option to ignore any form of news media in search results.

I'm trying alternative search engines from time to time and and they are much weaker than Google. So yeah, I'd bet on them to improve stage. The others first need to catch up.

Long time ago (~2005) my French Telco employer had a search engine (Voila), and they were worried by Google's influence so they try a test campaign to see how Google's results were different from their own search engine.

The result was astonishing: In the first page most results were similar, except for the order. Specifically a first result in Google was only second in the first page in the company's search engine. But in overall the difference was mostly in the presentation, not in the results.

There was something Spartan in Google's page UI that made it more credible and informative. At the time for most people including academics, they were the good guys and us (Telcos) the bad boys.

I guess academics advices were very influential on young adults who will shape the world the next years.

I guess also the erratic management by France Telecom was for something in the demise of Voila.fr

I thought this a really interesting story because, as I remember it, Google was quite well-established and the dominant search engine in 2005.

True to some extent but it is improving with Google updates. Now, there is a way to go still, and some legit websites get hit by updates unfortunately, but overall fewer and fewer scams pass through.

SEO used to be extremely gameable (seniority of site, keyword stuffing, backlinks), but these levers aren't as obvious now, if at all.

That is great, but can google change their algo to some point where it works differently? Their ad business is there in the web.

Google changes their algo frequently. It's a cat and mouse game. Sometimes Google have the lead, and sometimes the Blackhat SEOs do.

the game doesn't change ... only the players

I wonder why google is not more rigorous about that. Google search is riddled since years with "optimized" content nobody wants. It's become so bad even my non-techie friends are beginning to switch to DuckDuckGo -- which is not better per se (probably worse at contextualizig).

Getting stuff on PagaRank feels a game. Getting stuff out of Google feels a game too. To the point that moving to an alternative feels worth it, at least to try.

Everyone wonders about that. Googling most phone numbers return nothing but pages of spam links.

A decade from now, Google will have made no improvement.

That's why for certain things Google is useless. Have to add certain keywords to avoid the SEO content to get comparisons, reviews, forums.

One day Google may introduce multiple search rankings, where one of them is SEO and another is the "useful things". But I don't hold my breath.

I still do this but I'm 99% sure Google and DDG max out at around 3 keywords these days. I just get results for the top 3 SEO keywords, no matter how much I try to refine my search.

Maybe it's just because I'm searching for technical stuff but DDG and Google are both a big source of frustration for me,

DDG thinks I mistype most of my queries and will desperately try to correct my 'mistake' because "surely nobody is really searching for documentation about ARM32 bootloaders, they just mistyped when they were really trying to look for a webshop that sells 32 different ARMchairs and ARMy boots.".

Google will understand my input at least half of the time but uses that power to show me the power of websites that do some article/keyword scraping and run GPT on it, or this great new Medium blogpost with two paragraphs of someone copying a Wikipedia summary of what ARM is and copy pasting build instructions from a GitHub README.

I've tried searching github.com itself but that's just a nice way to find out that apparently most of the data they store is just scraped websites, input for ML models or dictionaries and they will happily show me all 9K forks of the one repo that contains the highest density of these keywords.


It's also so frustrating to get results for websites which present themselves in the search results with "Results for <your query>"...only to show "no results found" when you actually click on them.

Good thing /etc/hosts has no size limit.

The useful thing would instantly become useless because people would start gaming it.

I doubt it. A lot of SEO drivel appears easy to detect - recipes for example.

Recipes would ultimately be a list of ingredients, concise instructions and maybe a picture or two. It should be trivial to train a classifier to detect SEO spam in this context.

I think Google doesn't really have an incentive to do this, as SEO spam typically includes ads which can contain Google ads or analytics/Google Tag Manager which helps Google, thus prioritizing better results would work against their bottom line.

> Recipes would ultimately be a list of ingredients, concise instructions and maybe a picture or two.

So, if Google altered their algorithm such that "recipe" content had to be shorter-form in order to perform better in SERPs, how would this change anything? The sites that profit from search traffic would be the ones with their fingers on the pulse of the algorithm, and the resources to instantly alter their content in order to ensure that they continued to rank for the terms that were driving traffic.

Well, if Google ranks user-friendly content higher then sites will either adjust to be more user-friendly or get outranked by new sites that are user-friendly. The user wins.

Agreed, I heard that before.

What about trust-based systems. You choose who you trust and get information that they found not to be SEO-garbage, like trust-rings. When the system can't do it alone, user-centric feedback may work. That could give interesting inputs besides the ones Google already gets using its standard metrics.

When you place a tangible value on trust, trust becomes a commodity to be bought and sold. See:

1. Old domain names bought solely for their old SEO rank.

2. Apps on mobile app stores are sold, and updates begin to include shady privacy-invading malware.

3. Old free software projects on various registries (npm etc.) are sold, with the same result as (2).

Agreed, being able to become part of any group makes this problematic. Without repercussions, it seems difficult. Detection of ownership and the following loss of trust seems to be also in order. Or make the trust innate, not sellable to others, under the assumption that you cannot sell yourself.

Otherwise, it seems really like a cat and mouse game. Another option may be to force SEO to be indistinguishable from the best content. Is that the current goal?

Yes, but it feels like an approach that would not allow you to do anon searches. I guess pseud searches may be good enough.

I suspect this is actually one of those fundamentally hard problems.

Or .... how much money you pay Google. This is working as intended for a free search engine.

This is not how it generally works, I would say. It is more about how much you pay Google and how good your page is. I worked with several SEO experts and none of them suggested scams like this. The risk of doing something like this is too high for many companies.

Amazing how such a simple approach can achieve content injection on a diverse network of unrelated websites, to the point of raising the profile of the vector and increasing the chances of further spread.

I hope someone figures out which other campaigns were run with these tools. Also, whether you can find output with the link injections in source code, like on GitHub or distro packages.

I suspect this will only get worse over time. There was a time when, if you wanted to put a site online, you (or somebody that represented you) made a point of understanding everything that went into it. But, even as what's considered a professional web site has gotten exponentially more complicated, too many people see setting up an online presence as something like printing a brochure: details irrelevant. Somebody who does understand the details is going to use them to their advantage.

There was a time when, if you wanted to put a site online, you (or somebody that represented you) made a point of understanding everything that went into it.

I've been making websites for 24 years. Making a website has always been quite hard, especially for a nontechnical user, and there has always been scammers happy to take their money. What's worse is that a lot of the time the scammers believe they're actually selling a good service. There have always been people happy to chuck any old rubbish up on a domain and call it a website, even if it was full of scammy links, stuffed keywords the same color as the background or in tiny text, with JS that overwrote your browser history and blocked the back button, with no context menu, etc etc.

Its annoying, and sad, for those of us who care and consider ourselves professional. But it definitely wasn't any better years ago.

True. A company we bought in the very early 2000's was paying $1000 a month to an SEO "expert". The expert hadn't noticed that the site had a robots.txt file that was excluding all search bots but was still happy to take their money and produce faked up reports about how busy they'd been pushing search terms around.

I have had two clients in two years that have had that exact issue. In both cases they were WordPress websites, my friend and I refer to them as 'WordPress Specials'. It is obscene considering the amount the clients originally paid, but it works well for me as the client immediately sees a dramatic jump in the SERPs as soon as the new site goes live, and that's before any of the general improvements in navigation, content and structure!

I agree, there has been a clear, negative direction of stacking complexity in Web development for the past 20 years. It's one of the primary reasons Wordpress has 1/3 of the Web and there is a cottage industry of developers that specialize in just hacking at Wordpress to make it do things it's not particularly great at. Most people and most businesses can't come remotely close to building their own high-functioning sites (from scratch) in a cost effective manner, while getting all the critical details (eg building for SEO) right. So you get an obese do-everything CMS, and throw in some plug-ins, to sort of shim the problem.

Why is Shopify worth $150 billion? Well, other than the bubble, this effect is why. People can't easily build their own ecommerce sites, can't integrate everything they need to, in a way that doesn't cost them a small fortune.

Wix is a pretty mediocre service, clunky and slow. It's worth $15 billion? How in the world does that happen. Well, building sites is super difficult for most people. The opportunity to make that problem better is, apparently, huge.

What they value is the users, not the platform as such.

Feels similar to "Reflections on Trusting Trust".

Could someone inject links into content in such a way that you cannot find the link in your own source or even your hosting stack?

You could modify the web server to modify the code in a similar way to the reflections paper.

But even more imaginative would be to work it into the kernel or the ssl layer somehow.

Could it be that Scorecounter is paying for their links to be embedded, as opposed to them being the owner/developer of both sites? If so, and provable, can they be flagged in some way?

Doesn't say much for Google's ability to determine relevancy in linking or recognizing suspicious link growth. Or perhaps it just takes some time ...

Google used to impose manual penalties for unnatural links BUT this gave the rise to, you guessed it, competitors buying unnatural links for their enemy and waiting for the penalty to be given.

Nowadays, unnatural links are mostly ignored.

Probably. It'd be weird for a SEO spammer to put the effort into building a popular HTML editor/optimizer just to inject links to a few sites they own and operate. It's far more likely that they're offering that link injection as a service.

If I’m not mistaken, paying for links is still very much against Google’s policies. Whatever weight that should carry... in my opinion you should always try to be as independent from Google as possible. It’s such a huge liability.

Clearly. But I guess it is not outright proven that they are technically buying links. Though they would likely fall under some form of bad behavior in Google's eyes.

And, buying or otherwise, I am not sure what the mechanism is for bringing this to Googles attention.

I doubt there is another acquisition channel for a project like this that would compare to SEO (and not just Google).

> paying for links is still very much against Google’s policies.

quite a strange think to say about a company whose bussiness is based on selling links (to ads)

I believe dstick meant to say "paying [someone else] for links is still very much against Google’s policies."

> as opposed to them being the owner/developer of both sites?

If they're not owned by the same entity, then this blog post is rather odd: https://html-online.com/articles/scoreboard/

(To be fair, that entire blog seems odd...)

Agreed. Sure seems that way. Though that may actually make it less likely to be a violation than if one was paying the other for the links. Not within the spirit of the terms, but may not be a violation either.

This is apropos...

Google's old link-based authority algorithm, pagerank, isn't alaysing the same web anymore. I think there's barely any signal in links these days.

The first major event was Google itself. Once you use something as a metric, it becomes currency. SEO vs anti-spam became a defining cat and mouse game. This kind of stuff was born then, and antispam was meant to curb it.

The second major event was user generated content. The old link pages and blogrolls die slowly. Comments, twitter, and such become the way links are shared. High signal, but extremely spam prone. Google tapped out of this early, and mostly ignore user generated content.

The third major event is facebook, and facebook like ways of doing things. This made most regular people's content unindexable. Search for esoteric keywords used to return a lot of forum results. Still does, to an extent. The thread is usually years, or decades old. What's left on the open web is a subset, a non random subset.

Wikipedia is one of the last sites that does "hypertext" the way pagerank assumes the web works.

In any case, I feel like search (or what search used to be) is in decline. There isn't as much web to search anymore, in a sense. The broad brush way of doing antispam (eg user generated content is just ignored) makes more sense. Why deal with all that noise/spam, just to search what's left of the old web.

What's left? User behaviour, a la analytics. That's makes for more feedback loops and winner takes most dynamics. Localisation became localisation to your bubble. Meanwhile "officialness" measures aren't against google's ethic/aesthetic anymore. They got burned by the "fake news^" crisis, and the quick fix was officialness. In for a penny. In for a pound.

Meanwhile, web search is increasingly just another thing that google search does. It searches "your" data, content of your devices, search history and NN generated whatnot. It searches news, ads, returns answers to questions, does math... There's nothing new about seo scams, antispam just isn't Google's primary solution anymore. Just default to other ways of returning results.

I'm calling it. Web search is dead. Long live the new websearch.

^Circa 2015 usage, not the current

>first major event

IIRC with PageRank there were very specific values associated with 'toolbar PageRank', e.g. a PR7 link could be sold for $1K a month. Understandable because at that time there was no context to PageRank at all, it was simply about being linked to by an "authority". This was 20 years ago though.

Google is getting worse and worse. It's harder than ever to find real information. All you get is seo scams trying to lour you in and sell you stuff. It's tragic. I miss the old internet.

This: As an electronic engineer I would often search for component data sheets. Usually the sheet I wanted would be the first hit. These days however I get pages and pages of crap sites that want to sell me the data sheet. Or even pages that say that they don't actually have it.

To be fair, this is how the web has changed too. High-value content has been duplicated and hidden behind pay walls when in the early days (ie my early days on the internet/web) everyone seemed to come with their own content and share freely.

There's an inverse to this, too: Low-value content is far easier to distribute because distribution is now effectively costless, a situation deliberately created by online platforms that want bottom dollar works.

This effect isn't limited to web searches, either. Social media is way worse - at least Google pays you in presumably useful web traffic. Facebook and Twitter want to trap you on platform as long as possible. Even platforms like YouTube which pay their creators have this problem. So does Amazon, which encourages dropshipping cost-optimized products from China under weird, fly-by-night brand names. Their business model is to outsource the financial risk of creating new works to someone else so they can get "content" (or in the case of Amazon, actual products) for cheaper.

In the olden days, a publisher was a corporation that took on the financial burden and legal risk of publishing your work; with the caveat that only a limited number of things would be published. Thanks to a number of 90s era liability limitations, online service providers were given broad leeway on pretty much everything a traditional publisher would need to worry about: defamation, product liability, copyright infringement, and so on. This flipped the publisher model on it's head, creating the "platform model": one where you publish everything with no up-front cost or prior restraint, monopolize your creators' audiences, and make your money by taking cuts of whatever revenue streams your creators happen to establish after-the-fact.

Publishers had financial incentives to make their creative works more valuable. Platforms do just the opposite: their financial incentive is to devalue content. How do they do this? First off, they call it "content", as a generic catch-all term for anything their users publish. Second, they have no quality control mechanism, allowing literally anyone to submit content and have it promoted by their platform. Third, they run their platforms off of algorithms that use user-submitted feedback (reviews, upvotes, and so on) to judge group tastes in lieu of actually having taste. And finally, sometimes they'll just outright take money away from their creators in favor of their own stuff.

The reason why people were even putting high-value content on the web for free was because nobody knew how any of this would play out. Advertisers were paying far too much for banner ads, so it made perfect sense to just put all your content online, make sure people could see it, and get a lot of money. You used to be able to run a whole YouTube channel purely off of AdSense revenue! That's all gone away, now. Advertising networks pay out a lot less than they did even a decade ago, and at least in the case of Google, are also competing against their own creators for ad space to sell.

(This also implies that we will never actually go back to "the web as it used to be" until everyone alive has died and we can repeat the mistakes of the past. Hell, if you ask the copyright maximalist nutters, we've already repeated the mistakes of the past - publishers of centuries past acted a lot more like Internet platforms do today than modern publishers did pre-Internet.)

Yes, and what I always wonder about is that while I can understand the crappy sites that want money for an otherwise freely available documentation, I cannot understand the reason behind those sites (they are not only related to electronics) that come up in search (as they do have the very specific keyword/part number searched for) only to say "Sorry we don't have any of these, nor anything related".

Totally agree with all the comments here, seo broke google, and they don't care. Probably sells more adwords in the end.

I found uBlacklist from this thread, and the subscription functionality enables some collaborative effort.

So I've started making a list, but unfortunately there aren't many uBlacklist subscription lists out there yet.

Be interested to see how far this could go: https://github.com/rjaus/awesome-ublacklist/

> and my personal favorite: a blog post on Kaspersky.com

Wow, embarrassing for Kaspersky as a computer security focused site to be a victim of this.

When I searched for "Rubiks" as it said to do, I couldn't find it though. Has the Kaspersky post been changed?

Yeah, looks like they removed it.

Embarrassing but understandable. Computer security isn't about perfection, which is impossible. It's about vigilance, resilience, backups, and responding quickly. I'd say they nailed it, here.

It's worth noting the scam site is the top result in Bing and DuckDuckGo as well

Yep this! How can you really beat SEO when people can just try new things all day and see if it helps their rankings? I don't feel there's a solution here. Everyone suffers under SEO types just trying to bring scammy things to the top of the results page.

The thing is, I have a site in a very competitive niche that's full of black hat SEO tactics, and I am doing my white hat best hoping that Google tanks these sites when they update algos over time, and I'd then be the best placed to take their spots over.

But in the meantime, yep... It sucks.

Same story for chrome a while back. I formatted my father computer because he had a bunch of malware. The first thing he did was to google “chrome” and download the first result. Which was an ads. Which was a malware.

The old Google would have hunted these down mercilessly (Panda update in 2011). What happened to Google these days?

They have no competition to care anymore. Their closest competitor, Bing, has a 2.24% market share which consists mostly of people who don't bother to change their default browser's default search engine. Competition is necessary to breed innovation. See for example, IE6.

That is true! Why should google do something? They say "use ads", to make money.

Use other search engines is the only way to do something.

"SEO scammers got you down? Call Google Ads now!"

I’ve always wondered if Google AdWords hurts your SEO. Let’s say you sell widgets and searching for widgets you are ranked 5. You buy AdWords to be on the top. Since people click you AdWords ad that’s on top, they are less likely to click the organic listing thus penalizing your organic listing since it’s not getting clicks. Google factors in which organic listing click counts when determining ranking since it is a strong signal.

Most companies will buy Google Ads (formerly Adwords) as insurance for their SEO efforts. At the barest minimum, you don't want links to competitor sites at the top of the search results. With ads, you can at least claim the top spot for those search keywords.

Most people don't click on ads, so getting visitors to your site from organic search terms is more likely to convert them into returning users.

Interesting idea.

Also funny that Google Page Speed Insights was complaining about the Google Analytics JS and its caching duration.

Probably different teams and competing in a strange manner with each other.

Yes they would have. OMG that was 10 years ago and ...what is new in Google search these last 10 years. Maybe a lot but I don't see it. I just see ads and, when I do some long tail query most of the results are just random sites in russia or whatever with keyword salad (is there a word for that kind of site?)

No great fan of Google, but a large component of the problem is the Library of Babel phenomenon: there’s just too much crap being published.

Let’s face it... the early internet was interesting because the only people who could use it (and publish on it) were smart eccentrics. That was its charm. The technological hurdle served as the curator: you might have been a crazy white supremacist, anarchist, conspiracy theorist, or ‘expert’ in how to grow radishes or some other bizarrely eclectic field... but all of them were necessarily a bit smarter than the average bear just by virtue of knowing how to host content and access it; not a trivial task in the late 90’s.

Maybe it’s time to think up some convoluted alternate network that is a royal pain-in-the-ass to use. Perhaps there the eclectic and useful content creators will once again arise (and searching their trove will be a snap as most everything there will be fresh, unique, and interesting.) It will exist, I suppose, for a few years before tools are made to enable grandma to easily use it.

I think that’s somewhat the promise of Web 3.0 at this point. Painful to use and relatively empty. However, it’s mostly people hyping random crypto instead of actually creating value.

I have little to no experience in SEO. Does Google have a history of weighing in on situations like this and manually penalizing bad actors? If so, I would love a link to read about.

I agree with some of the other comments, googles actions on SEO are always shrouded in a little "algorithmic" mystery. That said, they do apply "manual action" penalties to individual websites.

Using google search console you can determine if a manual action has been applied to your own website: https://support.google.com/webmasters/answer/9044175?hl=en

Rather than determine the ranks, these actions remove / punish offending websites from the ranks, effectively making room for 'good' actors.

Manual actions often come after a a significant change in ranking algorithm or policy, and can be reverted / resolved in some cases. This usually requires removing or disavowing (in the case of unauthorized or unresponsive sites) the links pointing to a website.

You may want to dig into http://www.seobook.com/blog for an opinionated (albeit typically objectively correct) perspective on many things related to the SEO industry. There are a few studies about Thumbtack (with GV investment), RapGenius and eBay penalties and their subsequent recoveries.


wow that's amazing, I guess I sort of quit reading blogs like this when all the RSS readers died.

People seem to have stopped producing blogs like this ever since Facebook are the world.

I wonder how much of modern search crappiness is because much of the good content that used to be in small blogs is now locked away behind facebook’s logins.


>Google issues a manual action against a site when a human reviewer at Google has determined that pages on the site are not compliant with Google's webmaster quality guidelines. Most manual actions address attempts to manipulate our search index. Most issues reported here will result in pages or sites being ranked lower or omitted from search results without any visual indication to the user.

https://www.wsj.com/articles/how-google-interferes-with-its-... I'm sure that the title tells you that the article has an opinion (not unbiased), but I think it is a useful source.

They state that they don't manually pick results, but improve their algorythms to solve these problems. They prefer to share the least amount of details though, since it would better inform SEO spammers.

Not true, you can get penalized and you may be noticed about it in the google search console.

Thanks for the correction. I remembered it wrong. In this article for instance Matt Cutts details how they go about flagging individual pages [1]

[1] https://searchengineland.com/googles-cutts-we-dont-ban-sites...

It probably started with the guy adding something like "Edited using XXXX Editor tool" to make himself some publicity. Seeing that it worked he started selling those backlinks a fortune.

Circa 1999 I was running a webdesing studio. We added that link to all the websites we designed, then the next logical step was to make it link to a page with our entire portfolio which in turn linked to our website. That boosted the SEO of all our customers, and in turn boosted ours exponentially.

I've heard so much about how PageRank isn't that important to Google anymore -- but there are many reports of SEO tricks that get people on the first page of Google for common queries. It seems like it's still quite important after all.

They may have abandoned the actual page-rank scoring system (a quite specific implementation) without wholly abandoning the idea of using "who links to who" as a quality signal.

Those can both be true. PageRank is a relic of a time when search engines more consistently returned the same results for the same query. These days we're all filter bubbled with personalized results

Same story for various Wordpress plugins and widgety things that live in site footers.

Google has turned into a cesspool. Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov to search strings, searching by time periods to eliminate new "articles" that have been SEOed to the hilt, or taking out yelp and other chronic abusers that hijack local business results.

Also phone problems: Google a problem with a phone and the top hit will be a whole bunch of churned out articles with generic copy on the cause (sometimes there are bugs in the software, so reboot your phone).

Any technical issue, really. There's a ton of autogenerated content out there with low-effort troubleshooting tips. A lot of it is used as lead generation for scammy antivirus/antimalware/"cleaner" software, paid tech support, or outright tech support scams.

These results are incredibly frustrating. Google should de-rank these autogenerated tech troubleshooting sites.

Yes, I clicked the link because it exactly referenced my issue. But it's not helpful to just see the same 5 tips copy pasted from elsewhere by an algorithm.

>These results are incredibly frustrating. Google should de-rank these autogenerated tech troubleshooting sites.

Why? Google makes money from advertisements either way, it's not in their interest to improve search results. If anything, terrible search results make users more likely to click on ads, which now look better by comparison.

Google became very popular very quickly because it gave much better results much faster. The more that Google allows quality to decline, the faster they approach a non-recoverable tipping point. Just ask Yahoo how quickly that can happen. Google may seem entrenched, but they have a shaky hold on search that is only as strong as its result quality. They are entrenched in advertising, but only because that's where searchers go to search.

Users may be entrenched in other Google products-- Gmail, gcal, docs, etc-- but not search. Someone using all those other Google products could change their default search engine and have zero impact on the rest of their digital life.

I'm shopping around for a preferred alternative right now, I just haven't settled yet.

That was pre-IPO Google. That company doesn't exist anymore. Money is their God now. Every Googlers high salary depends on it.

Yep, not disagreeing. My point is that a short term pursuit of money over at least a reasonable quality of search will destroy what they have built very quickly if quality gets low enough to make it easy for an upstart rival to have obviously better search results. And the evidence for that is in the history of their own rise to search dominance.,

>The more that Google allows quality to decline, the faster they approach a non-recoverable tipping point. Just ask Yahoo how quickly that can happen.

Do you think we're in the same situation now as we were fully 20 years ago? I don't. Facebook killed MySpace, but Facebook is now too big to be disrupted, same with Google. The word "google" is a verb now. This is why the quality of their search results doesn't matter, people are too entrenched to switch now, which was not true in 2001.

With respect to getting users to switch, Facebook and MySpace are much more complicated services in terms of user interactions and the need for network effects. It is literally a text box you type into, and it's usefulness does not directly depend on how many other people use it.

In that respect, not much has changed in 20 years. Switching your search bar is a very low friction activity, and if quality of results is too low then people will look elsewhere. There's only so many times someone will tolerate seeing the exact same copy/paste useless answers to questions as most of the first page of results.


In General:

The tech industry is filled with examples of companies that had an entrenched product end up failing very rapidly. I think Google probably understands this well enough to ensure search quality remains better than a scrappy under funded startup can accomplish, but then again Google achieved search dominance by coming up with a different way to determine results, relevancy, etc. There's no reason to believe that someone couldn't come up with something superior now either.

I think the most significant threat to that possibility is 1) FAANG companies buying up many of the most talented people. 2) If a competitor did come along, buying them up as well.

But it's also hard to predict the anti-trust future. Microsoft had an extremely long run as the most dominant web browser for longer than Chrome has held that crown, but they got knocked down very quickly. I doubt that would have happened as easily if not for their anti-trust issues. Of course it doesn't help that IE grew into a slow bloated mess, but in that respect, refer back to what I said about search quality: Microsoft was entrenched, if sliding, in the browser space even after its anti trust issues, but it let it's quality slip too much for users to accept. Given viable options, users switched.

That switch was truly remarkable due to the much higher friction. IE still cam bundled with Windows, Chrome did not. Every home computer with Chrome requires a user to ignore the option right in front of them and choose Chrome instead. Now just think about how much easier it is to use a different search engine.

I'm not saying Google is doomed, but 20 years of market dominance guarantees nothing. The "big 3" US automakers owned the market for longer than Google's founders have been alive, but those days are now just another cautionary tale of poor quality and unassailable arrogance.

The entire reason Google is the most successful search engine is that people don't use search engines that behave this way.

They obviously do use search engines, like Google, that behave this way.

The last few weeks I've started noticing a very specific type of SEO that pops up when I'm doing technical search, where the first page will be a Stack Overflow result, and the 3rd or 4th result will be from some content farm, copy-pasted from SO, sometimes translated in French.

It's a little unsettling.

It can be worse than that when those sites get a full multi-line result billing whereas the original stackoverflow answer gets a single-line subheading under some other SO result.

If you start getting a little esoteric in your searches you’ll get tons of results that are clearly crawled from personal blogs, and hosted on personal-blog-looking domains that redirect to godawful garbage. Especially bad on mobile because Google truncates the URLs.

That's a years old scam, but occasionally a new site pops through Google's filters.

I haven't gotten real SO as google result in years, only those content farms, constantly. Nowadays the same even happens for github issues, they're also mostly outranked by content farms copying from them.

If I search on mobile, often all my results are these content farms. (Google used in English from Germany)

That's why I append reddit, stackoverflow, superuser when I search for technical solutions. At least those sites are still full of user-generated content with good answers upvoted to the top.

You know, I was joking the last few times the subject came up, but I'm getting seriously worried that the more people mention using that kind of trick on HN, the faster advertisers will catch on and start building reddit-based SEO strategies.

Not sure how we should react :/

Reddit has been gamed by guerilla advertisers for years, everyone knows it, and the admins there don't seem to care/are unable to do anything about it.

r/HailCorporate used to be about calling out stealth marketing/advertising but it's morphed into just discussing how things can inadvertently act as an advertisement aka society is full of branding and consumerism. It's a shame because it used to be a very high quality sub.

Prefer resources that have some governance and aren’t entirely crowdsourced. For example if I’m looking for web tech answers my first search is ‘[whatever topic] mdn’.

Oh, it's no secret. Google's autocomplete will actually suggest appending "reddit" to certain queries. For example, let's take one of the most SEO-spammy queries imaginable, "best mattress 2021". Google will suggest:

- best mattress 2021

- best mattress 2021 consumer reports

- best mattress 2021 reddit

- best mattress 2021 for back pain

- best mattress 2021 wirecutter


But of course Reddit is already rife with shills. Not sure about CR.

Don't search for "best". That's specifically requesting spam.

I use colloquial language to try and target actual human reviews on forums. "Are audio-technica any good?"

Mostly works, but Google drops keywords pretty quickly now so you still get lots of spam or shopping sites.

I remember in the late 2000's I had a CR account. I had two weeks left on the period I had paid for. But when I cancelled the account... poof. My access was revoked immediately. Very much not consumer friendly. I was done enough with their crap that I didn't even bother with an email.

FWIW I signed up for CR recently when I was car shopping, and I canceled my subscription within the first month. They assured me that I would still have access for the remainder of the period. Of course, you're forced to subscribe rather than buy access for a set period, and they sent me a couple dozen emails during the time I was signed up, so they're not completely innocent... but at least that part felt reasonable.

I've been trying to unsubscribe from CR email spam for months now to no avail. Looking at the browser tools, it seems that their api can't handle the fact that I registered with a single letter first/last name so therefore my attempts to unsubscribe silently fail. There also appears to be no way to change my name since the api for that also fails on the single letter first/last name. I wish ungood things to happen to the people who 'designed' this Kafkaesque rubbish and in the meantime, thank GMail's mark-as-spam feature for throwing away their unrelenting pablum to the memory hole. This experience has led to me canceling my print subscription to CR plus my donations to their organization.

I keep getting results to a site 'gitmemory.com' which is just GitHub issues scraped. Super annoying that they outrank the actual GitHub issues they've taken the content from.

How is this not just spam and duplicate content. I remember when I was punished by G for duplicate content on my very small private blog when I was using jekyll and had the markdown sources and the code stored in GitHub. I didn't know of the canonical tag back than and was punished because the GitHub domain had more trust.

It is sad, bit nowadays I often just directly jump onto page 3 at Google or use other "tricks" to get okayish results.

"Google has turned into a cesspool."

That's a bit harsh but I agree that it is starting to fail to live up to the expectations I had with Google when it came out and destroyed Altavista in a spectacular shower of sparks.

Could I tender: "uBlacklist" as a stop gap, amongst others as we await Google being given a right old kicking?

Despite being a staunch Arch Linux user I have to deal with rather a lot of MS Windows related stuff. Being able to filter out that bloody awful Microsoft Social thing gets me closer to decent results. The majority of the next 10-100 results will be CnP clones of someone's blog but a human is able to get in reasonably quickly. I'm toying with blocking Stackoverflow and other cough slatwarts to see if results get better for me.

In my opinion: the www has hit a crossroads or perhaps a Spaghetti Junction or a Magic Roundabout for the last five years or so and continuing. However the exits are connected to the entrances on these road systems (take a look at them - they are real junctions. The MR is particularly terrifying but it works really well.)

I still won't use words like cesspool for this but I am increasingly losing my patience over the standard of results from Google. Those featured things (not the Ads - that's fine) at the top which add #blah_blah to the URL to colour search terms yellow is not working for me. The quality of the returns featured in a box are often rubbish too. It would be nice to be able to turn all that stuff off.

I understand that Google are trying to "be" the internet to try and keep the stock ticker pointing north but there seems to be a point when they have overreached themselves and I think that was passed several years ago. I also increasingly feel that Google thinks that it knows best and has removed many choices from their various UIs - that comes across as a bit arrogant.

Many years ago I left Altavista behind for Google. I will move again if I feel I have to. Of course that's not much in the grand scheme of things and I'll probably only take around 100,000 people with me but they have friends - still probably not a big deal.

I appreciate a lot of what you're saying in this comment but I disagree with this sentiment:

> not the Ads - that's fine

In my strongly held opinion, push advertising is not fine and it's the root cause of all the problems you are discussing. We will only exit this mess that the web has become when everyone blocks push advertising by default. People should only see advertising when they are interested in being advertised to, e.g. sites you consciously choose to go to that advertise products & services, like the old Yellow Pages phonebooks.

I don’t think Google is the cesspool, I think Google is a search engine for an internet that is the cesspool.

We’re moving to the vision of information services that were pioneered by AOL, Prodigy, etc. Honestly, we’re there already.

We were already there when Google was the hot thing all the nerds loved. At the time their search was a way to cut through that, not the primary window into it. The cesspool isn’t Google, now it’s just hosted by them.

I wish i could have 2010 google search as a alternative to 2021 google search.

Problem is, I expect 2010 google search would be considerably worse now than it was in 2010, because "SEO" has had another decade to evolve.

There was already SEO stuff going on back then people were less aware of it. I can remember during height of the Iraq war people manipulated google to display George Bush as the top result for "Miserable Failure" and there were other exercises like that happening.

It's hard for me to pick a sweet spot for the internet in many ways I feel like I've grown up with it.

I can remember the web of circa 1995 to 1997 with Gif's that wouldn't render properly in internet explorer, HTML marquee scrolling text and the dreaded blink tag being used everywhere. You needed to play search engine bingo with Altavista, Metacrawler, Yahoo, Infoseek, Lycos etc etc. And it was a crap shoot if search engines would give you useful results.

I can remember the web of 1998 to 2000 where every web developer seemed to discover html frames at the same time. We had good search with Google but pop up ads were so rife that the internet was borderline unusable. I can remember all the free webmail sites like hotmail, yahoo etc. ICQ chat was massive (whatever happened to that - it was a staple of my teen internet).

In Early 2000's Firefox came along and saved the internet by virtue of its built in popup blocking. But there was a mishmap of "Applets" and "Plugins" everywhere Flash Player, Java Applets, Real Player etc. Video (and audio) on the web was terrible half the time it would complain about missing codecs, it would buffer forever and if something did load it would be the size of a postage stamp and look pixelated as all hell. I remember Gmail came out and everyone went gaga over it's interface.

Last period that real stands out is the mid to late 00's with development of big Social Media sites, Facebook, Twitter, Youtube etc. The web got more and more javascript heavy. Web video streaming finally became useable. Google Chrome came out and flash player finally died despite Microsoft trying to revive it with Silverlight.

I kind of feel like this last 10 years are a continuation with increased surveillance and tracking.

I think matt_cutts or someone who was active at the same time used to say that.

But it still doesn't defend not blocking sites that doesn't contain anything except autogenerated content.

And it still doesn't defend ignoring my keywords.

No, the keyword ignoring stems more from catering to the majority of people who don't know how to logically formulate a search for a search engine that expects every word to match. Most people will intuitively just try to ask the search engine a question (even if not literally phrased as such), and so Google has adapted to fill that need. Which even for those of us who would prefer something a bit more clear cut, is honestly handy a lot of the time.

I think using +plus +before +keywords still works for situations when you don't want any words ignored?

Certainly agree it seems like they could do a better job of burying auto-generated sites though. (Although I'm sure it's a difficult problem!)

How so? I haven't seen much change apart from that crappy yellow streak of piss thing that dribbles on pages.

How do you recall 2010 search? (I suspect I've lost it a bit - I'm 50.5 years old)

In general i had more relevent results on my first search qurry compared to now admitedly thats hard to prove as i can't rerun the search side by side for a comparison now.

additionally ads were firmly separated into a colored box away from actual results

As mentioned, I removing the think the rose colored glasses won't put lipstick on this pig. Google Search (and not sure how Bing or similar would do better, baring their censorship problems) is increasingly a minefield...

This is the same problem with something like WoW classic... you can get the game that existed 15 years ago. But even if it is the exact same game, the world itself isn't. Online walkthroughs, videos, modding knowledge, theory crafting, etc. Those things are much more fleshed out today so even if the system didn't change 1 bit, WoW Original vs WoW Classic are really two separate games.

Likewise... if you dropped Google Original down today? I'd love to see how fast it would get owned by these sorts of operations that have had a decade+ of practice in skills like CEO that didn't exist in 2010.

You had more relevant results? That wouldn't change because companies live and die off of SEO now and didn't then. Highlighted ads are such a small thing on the website when compared to getting a full front page of the same Stack Overflow answers in 20 different websites that all have SO cloned and reskinned.

Yandex.com is 2010 Google search, IMHO. It's not filtered at all and seems to have that pure pagerank feel of the old Google search engine, while the modern Google seems to be hand tweaked quite a bit to only quote "authoritative sources". Search for a politically controversial topics all you want on Google and you will not have your first couple of pages being debunking or fact check sites. Compare Google's search results for "who is zhengli shi" vs. the Yandex.com results for example. You can even find Putin scandals and "Tank Man" on there, even though it's a search engine based in Russia.

I'm amazed that there isn't anything like uBlock Origin for search results.

"My eyes are bent, my back is grey etc"

I think we have loads of tools to play with but fundamentally there is a problem when you are fighting with your search engine to find stuff you want to find.

My laptop (Arch) still has Chromium as default with uBlock Origin, Privacy Badger, uBlacklist and a few others running. I will be moving back to FF and running a sync server because I am that pissed off and able to do so. I'll also take a few others with me (between 2 and rather more)

When I say move back to FF, I'm talking about something like reverting a 10-15 years change.

I've always had FF available but it fell short back in the day for long enough for me to move to the Goggle thing. Now I think I'll go back.

Noone at G will lament their loss, I'm not even a rounding error. I'm sure that all is fine there.

> I'm not even a rounding error. I'm sure that all is fine there.

I'm already here :-)

If 5 or so devs read it and change too and they start mentioning it then we have a fast chain reaction.

Just look at WhatsApp or even Microsoft or IBM: they seemed unstoppable but are very nuch just another alternative today.

If you're referring to user-curated search result blocking, that's very easy with DuckDuckGo and uBlock Origin (just block elements like [data-domain="w3schools.com"]; see my comment to the GP). I don't know of any large extant lists like this though.

That won't do much if every result on the first page is blocked. Ideally a filter list like this could be pushed to the server side as a per-user preference to go with your query, so that if e.g. the top 10000 results were all filtered out, then you wouldn't have to click through (or infinite-scroll autoload) 100 empty pages before getting anything.

DDG will add more results, if enough are hidden. If I search "w3schools" with my filter, there are only two results on the first page that are not hidden, so it immediately displays the second page below. It seems that they planned for this use case.

https://millionshort.com/ tries something like this.

I used to have an automatic google search-domain blocker. It was just front-end though so if a page would have website domains that were useless, it would only have 1 or 2 results on it unfortunately. Something a little better integrated would be nicer.

Comparing Google now to Alta Vista is not very helpful. They don't get to rest on their laurels. Search is less helpful now, and it's not clear to me that they care enough to do something about it.

You mean besides spending far more on people and computers than any other company, perhaps combined?

You're giving their entire search budget credit for dealing with spam results? My observation is that it's bad and has been for some time. They are either unable or unwilling to solve the problem.

Anecdotally DuckDuckGo seems to have fewer sponsored sites than Google. DDG also makes it easy to block low-quality sites because it adds a data-domain attribute to the root of every search result. I recently started this mini uBlock Origin filter list for that (suggestions welcome!):

    ! Hide low-quality results on DuckDuckGo
    !! Stack Exchange mirrors

Great idea. Though I've noticed DDG promotes "blogspam" articles more often than the authoritative sources.

Let's say, if I search for a python builtin library, I want to go to the python website, not some "Python 101" blog post about it.

Great tip! I've been using DDG's official addon but this means one less addon. Thanks!

pinterest.com would clean up another large chunk of crap

The reason for that is actually rational: when Amit Singhal was in charge the search rules were written by hand. Once he was fired, the Search Quality team switched to machine learning. The ML was better in many ways: it produced higher quality results with a lot less effort. It just had one possibly fatal flaw: if some result was wrong there was no recourse. And that's what you are observing now: search quality is good or excellent most of the time while sometimes it's very bad and G can't fix it.

I wouldn't call that rational. There is no reason you can't apply human weighting on top of ML.

Honestly, I don't believe for a minute they "can't fix it." They do this sort of thing all the time, for instance when ML shows dark skinned people for a search for gorilla, they obviously have recourse.

You do know that Google basically slapped a patch on that one right?


I’m confused. I read that article and it has this:

> But, as a new report from Wired shows, nearly three years on and Google hasn’t really fixed anything. The company has simply blocked its image recognition algorithms from identifying gorillas altogether — preferring, presumably, to limit the service rather than risk another miscategorization.

Is that not an example of human intervention in ML?

Yes but then they fixed it right.

Fixing it right would be re-training the ML algo.... they basically told the algo to never ID anything as a gorilla (even actual gorillas)

> G can't fix it.

Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.

But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.

From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.

They have been already doing this for a loooong time, it's a low hanging fruit.

Take a look sometime at the wealth of data google serp sends back about your interactions with it

The fact that they do collect data does not mean that they use that data in any meaningful way or at all.

They ought to see humongous bounce rates with those fake SEOd pages. Normally, that would suggest shit tier quality and black-hat SEO, which is in theory punishable. Yet, they throw that data away and still rank those sites higher up.

You mean to say that no one at Google has even heard of "external SEO", which is nothing more than fancy way of saying link farming? They do know, this is punishable according to their own rules, yet it works, because either they cannot fix it or do not care to.

They'll never tell how they use the data for obvious reasons and I also can't go into any details. But any obvious thing you can think of almost certainly has been tried, they've been doing it for 20+ years and ranking alone is staffed with several hundreds of smart engineers. Mining clickthrough logs is a fairly old topic itself, has been around since at least early 2000s.

Please provide proof for this theory that google measures this also.

I worked in ranking for two major search engines. They all measure this, this is a really low hanging fruit - how much time it took you to come up with this idea? Why do you think so lowly of people who put decades of life into their systems that they didn't think of it?

Technically just open google serp in developer tools, network tab, set preserve/persist logs option, and watch the requests flowing back - all your clicks and back navigations are reported back for analysis. Same on other search engines. Only DDG doesn't collect your clicks/dwell time - but that's a distinguishing feature of their brand, they stripped themselves of this valuable data on purpose.

Again, this is not about data being collected, we do know how much data Google collects, it is all about what is being done with the data and by extension how good the end result is.

This touches the broader subject of systems engineering and especially validation. As far as I am aware, there are currently no tools/models for validation of machine learning models and the task gets exponentially harder with degrees of freedom given to the ML system. The more data Google collects and tries to use in ranking, the less bounded ranking task is and therefore less validatable, therefore more prone to errors.

Google is such a big player in search space that they can quantify/qualify behavior of their ranking system, publish that as SEO guidelines and have majority of good-faith actors behave in accordance, reinforcing the quality of the model - the more good-faith actors actively compete for the top spot, the more top results are of good-faith actors. However, as evidenced by the OP and other black hat SEO stories, the ranking system can be gamed and datums which should produce negative ranking score are either not weighted appropriately or in some cases contribute to positive score.

Google search results are notoriously plagued with Pinterest results, shop-looking sites which redirect to chinese marketplaces and similar. It looks like the only tool Google has to combat such actors is manual domain-based blacklisting, because, well, they would have done something systematic about it. It seems to me that the ranking algorithm at Google is given so many different inputs that it essentially lives its own life and changes are no longer proactive, but rather reactive, because Google does not have sufficient tools to monitor black hat SEO activity to punish sites accordingly.

So they do collect it, they only ignore it - just like the 10 - 30 (or more) clicks I've spent on the tiny tiny [x] in the top corner of scammy-looking-dating-site-slash-mail-order-bride ads that they served me for a decade?

My impression is that the ML algorithms at Google have the goal of increasing profitability from search. If that is the case, the quality of search will tend to be secondary to displaying pages that bring more revenue.

Blatantly false that Google has "no recourse", Google can put on penalty and bring domains down.

"Request manual review of search results" button?

Since this is now the top spot here on H/N I suspect it just got the attention of some Googlers who I’m sure will review it.

They may not give the site a manual action, though. They’d rather tweak the algorithm so it naturally doesn’t rank. Google’s algo should be able to see stuff like this.

I know that I’ve seen sites tank in the rankings because they got too many links too quickly. It could be that the link part of the algorithm hasn’t fully analyzed the links yet.

I’d be interested in seeing what the Majestic link graph says about this site, ahrefs doesn’t have tier 2 and tier 3 link data.

I really don't like how easy it is to fake a "new" article on Google. You can just re-publish an old article and stick a new date on it and Googles takes it on face value and uses the new date.

You can also do the opposite: post something today and say it was up on your site in 2003.

Makes it really difficult to find old pages about something that recently exploded in popularity, because the age filter just doesn't work.

I ran into this for the first time yesterday when trying to find out new info about a footy player. Some article from 15 years ago talking about how he had a good first game, tagged as 5th june 2021. Like, wtf?

I have been seeing this a lot recently too. Especially with the first result or two. Or the section up top that gives you a partial answer without having to click through. All of them always seem to have been freshly written like some made to order meal at a restaurant. It’s just too suspicious really.

Google Search is ripe for disruption. It's been over 20 years now and they are not dynamic or interesting at all anymore.

I still think that the "Yahoo!" style web directory is a good model. A catalogue of hand-curated links has increasing value as the quality of Google results goes down.

I was briefly going to write "I'm surprised that DMOZ[1] still exists" but it says "Copyright 2017 AOL" at the bottom so maybe it doesn't.

Edit: ...and using the search box results in a 404 so I guess it's really dead huh.

Edit 2: Apparently this is the successor! https://curlie.org/en

[1]: https://dmoz-odp.org

The creation and maintenance of such a directory might additionally be more feasible now because sadly there are much fewer personal or independent websites instead of content hosted on large platforms.

I just tried to use both to look up pharmacies via navigation.. With Dmoz after my second try I was able to find CVS, but I wasn't able to find it with Curlie..

It's not a bad idea to have a curated dataset of information. But clearly there are much better ways to navigate said information, which would include search, but also dynamic filters, predictive text, sorting algorithms, context awareness, etc. All of which... is built into modern search engines.

So perhaps what we really want is a Wikipedia/OpenStreetMaps of curated, indexed, semantic content/links, that anyone can consume and write their own search interface for. Basically, an open data warehouse of website information.

> A catalogue of hand-curated links has increasing value as the quality of Google results goes down.

Who will pay for its creation, maintenance and hosting? Who will judge ranking, disputes, hacks?

Who will have an eye on discrimination issues? Whose jurisdiction will be relevant (think GDPR or the Australian press "gag order" law in the case of that cleric accused of fondling kids)?

Who will take care that the humans who will get exposed to anything from generic violence over vore/gore to pedo content get access to counseling and be fairly paid? Facebook, the world's largest website, hasn't figured out that one ffs.

These questions are ... relatively easy to bypass with an automated engine (all issues can be explained away as "it was the algorithm" and IT-illiterate judges and politicians will accept this), but as soon as you have meaningful human interaction in the loop, you suddenly have humans that can be targeted by lawsuits, police measures and other abuse.

> as soon as you have meaningful human interaction in the loop, you suddenly have humans that can be targeted by lawsuits, police measures and other abuse.

In theory, you could have a curated directory whose hosting works like ThePirateBay, and whose maintainership is entirely anonymous authors operating over Tor (even though the directory itself holds nothing the average person would find all that objectionable.)

Of course, there's no business model in that...

TPB is not a good example since they're allowing everything except pedo content, thus drastically shrinking their moderation workload.

A site that wants to be compliant to the law in the major jurisdictions (US, EU) can't operate that way, not with NetzDG, copyright and other laws in play.

It doesn't need to be a corporate enterprise that has to worry about all those things. People already share directories of links via Google Docs, Notion notebooks and the like.

The irony being that 20 (more like 25?) years Yahoo search was ripe for disruption... by Google :)

Halt and Catch Fire [1] (As a nerd, I can say it's one of the few TV series that got the hackers spirit correctly) had a few episodes about the Google disruption.

Like some people often say here, things come and go in circles...

[1]: https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(TV_series...


I am in the pre-release program. The hardest initial thing to get used to was not immediately scrolling down to the bottom to avoid all of the spam.

I suspect that their methods are not much different than Google, but the experience has been so much better.

I'm also testing neeva, do you know what they use to get the search results?


I would rather not have a required sign in to a search engine, but looks interesting.

That just implies locking into an ad supported model. Personally, would prefer to pay. Stewart Russel wrote in his book that when surveying humans the value they ascribed to not being able to google fo a year was something like $17,000 per year. Just some absurd number.

It is not an ad supported model - it is a subscription model. I just signed up for it.

I just signed up for a trial with them after reading this post.

It's so easy to do better! Just look at what a rousing success Cuil was.

Nobody said it would be easy. Industries ripe for disruption are often very hard to break into. Being ripe for disruption is more about giving up on innovating so you stagnate.

Free WordPress* themes are particularly bad in this regard. Since they're expected to contain HTML anyway, it's altogether too easy for the author of a theme to include a couple of links to a site they want to promote. Some themes take this to the next level by obfuscating the code that generates the promotional links, and/or including other code which makes the site not work properly if the links are removed.

*: and themes for other web applications, but mostly WordPress these days

Hmn. I would agree about all crap being mixed in there, but in terms of overall results (both wrt. SEO crap and other irrelevant stuff), my experience has been that the quality troughed something like 2-3 years ago and then came back (my guess is that they're incorporating all of the AI they've been doing throughout the company into search). To me it feels like it's about 80% of its best right now.

I bet it's that we do different types of searches.

Ugh Pinterest results.

I swear, Pinterest must have employees working undercover in the Image Search team for Google to have let them destroy image search results the way they have.

It's literally never the original source for anything, but you can bet it's most of the first 10 pages of results. Then it doesn't even let you right click to open the image file, and dumps you to a login prompt if you click on anything. THAT'S NOT EVEN YOUR IMAGE STOP TELLING ME WHAT I CAN DO WITH IT.

Really makes you wonder if the people at google actually use their own product. Anyone who has ever used google image search in the past couple of years will have noticed that it's filled to the brim with garbage results from pinterest.

I have fallen in love with Yandex image similarity search (search by providing a query image, not text). You can find so much more with it, it's like Pinterest but without the crap. For example I could find images for my ML model but also furniture ideas for my house and check if my kid is objectively cuter than average (lol, yeah, objectively!).

And if it is not a pintrest link it is an amp link which is equally bad in my experience. I just want to link a picture. Not a link to a page that might have the picture but might also have the entire article/reddit discussion and not the image which I was searching for.

When I'm reverse image searching something it's often to find the original artist of an illustration, photo, or whatever. I want to know who made it, see their other work, and find it in its original quality without 15 generations of jpg recompression artifacts.

But no, Pinterest has better SEO than the artist does, so it's just endless reposts upon reposts and never the original work.

Occasionally you get lucky and it's not the sort of image that Pinterest users share. Then you might actually find where it came from.

THIS. So much this. Time was when you could actually discover the provenance of an image. Almost every time, when I’m doing a reverse image search, that is my intent. It used to work. It seldom does these days.

In my recent experience, Bing, Tineye, and Yandex are all better at finding image sources than Google Images. But who knows how long that will last.

Try using tineye.com. It has noise too, but seems to be easier to find the original source than Google these days, at least for me anyway.

And the interesting thing about that is, you'd think it would be (relatively speaking) straightforward for Google to keep track of the first place a given image was indexed (or possibly the first few places, or everywhere it was seen over the first X period of time since you couldn't guarantee the very first would always be the original). Assuming that original was still online, it would seem to be the place to direct searchers to, regardless of pagerank or whatever.

I find this helps: https://addons.mozilla.org/nl/firefox/addon/view-image/

It puts the "view image" button back.

I'd expect a company like google, who tracks what kind of socks you have on everyday, to also track their own search engine... users mistakingly clicks on pinterest link, user immediatly clicks back, and looks for something else... is it so hard to assume, that they don't want pinterest results, because they're useless, and somehow lower their seo score? Nooo, of course not, just put the pinterest results near the top, until users puts "-pinterest" in the search bar.

> Google has turned into a cesspool.

All these same sites appear near the top of Bing searches too. There's nothing particularly Google-specific to this story. It's about SEO hacking that will work against anyone with a PageRank-style system.

Indeed. I recently noticed this while relying on DDG for documentation for Common Lisp, a language I still learning. The top-ranking site for any Common Lisp function was an SEO scam site, where clearly someone had hired freelancers to take preexisting CLisp documentation and rewrite it – in poor-quality English – until it would no longer be detectable as copyright violation, then loaded it with ads.

(I just checked and this copycat documentation site has, thankfully, now been pushed down a bit in DDG results.)

Note that as I quite recently learned DDG has support for a bunch of bang-commands listed at [1]. There are a bunch of them for documentation sites for all kinds of programming languages, including a couple for lisp it seems like.

[1]: https://duckduckgo.com/bang_lite.html

For learning Common Lisp, I highly recommend https://github.com/ashok-khanna/common-lisp-by-example

I think it's high time we had a webring resurgence. It's impossible to get anywhere with plain search anymore, what we need is curated websites that other domain owners are happy to say "I endorse the people running this site, so if like my stuff you'll like them too"

Id like to see. and be happy to post some various web rings and blogrolls..

One of the things that killed them imho is when google started penalizing sites that linked to some other sites.

This was compounded by the expired-domain market..

wordpress even took out linkrolls around that time, people that had them in sidebar widgets would have them disappear unless they installed a new plugin to bring them back.

Webrings that auto-add the "nofollow tag" I guess could make them okay for people again.

Might be cool to have a github type page with a list of rings to reccomend.. a script auto-pulls it into your page, adding nofollow - and then other people could copy your list or clone/fork..

If they can inject a random link into a page, why couldn't they also be able to inject a web ring link?

Isn't that what people go to social media for?

Social media is gamed the same way? Sharebots, etc

Do you suppose Web rings wouldn't be when there's money in it? There was plenty of that when they were just for fun.

It’s fundamentally about trust models. That is to say, about the audience.

Everybody gave up trusting webrings because Google provided better results. Now that Google results are shit, there’s room for other information vendors to come along, even if it’s in narrow areas.

Actually, HN is already this for me in some respects.

In my opinion this site is not really so different from, say, Reddit, beyond having more focused rules and being smaller. So I don't think my idea that social media have supplanted the Web ring is wide of the mark.

This is my view too. Yes, I’d love to go back to a time when Google’s algorithms were unknown enough for SEO to be futile but those days are gone and the problem isn’t limited to Google.

I also noticed that Apple users see way more fake online shop results than Linux users, from the same IP, with regularly cleared browser cache and identical search terms.

Those fake shops are part of discussions in politics right now. Usually they're registered in Ireland or Malta as companies due to their specific banking laws. They make millions with those scams and people can't differ between legit online shops and fake ones - because the legit ones actually look crappier than the fake ones when it comes to the website designs.

In Germany, we have at least for hardware the "geizhals" website which is kind of an index for all kinds of electronics shops and they try to verify as much as possible.

But for other online shop sectors (e.g. clothing or home stuff) I wouldn't trust anything. Even on Amazon I got scammed a lot and heard absurd things from others...like getting packages with no content in them and Amazon refusing to see that the seller is a scammer etc.

Google is a cesspool because the spammers and SEO-hackers are in full force, and Google is only reactive to these threats these days. I mean, does it really matter if they are making hundreds of billions of dollars a year? They seem to be doing something right.

The only time something will change is when traffic starts decreasing to their site, but it's good enough such that people won't change. Look at Facebook, I don't know anyone who uses it as much as they used to 10 years ago, but it's making the most money it ever has. Why on earth would any behavior change? From their points of view, everyone is happy with it!

I don’t like google and don’t really want to defend it, but this is more of a lots of crappy websites problem than a google problem.

Google, to justify its huge capital worth, should deal with that crap. Why else bother?

google isn't the cesspool, people who want to appear at the top of a list of search results are doing whatever it takes to create a cesspool, because that's what it takes to earn more money.

being willing to make other things in order to have more money always creates cesspools.

Google’s mission was “organize the world’s information and make it useful” and they are doing a poorer job now than historically.

Of course there are scammers, that’s part of what makes organizing so hard.

Cynically, I think that Google is worse as filtering scammers is because they care less now. Half the page is ads so they make money either way.

Google is a cesspool because it’s their job to fix it and they failed. I stopped using Google search because of how far it’s fallen.

If its the only way to make money, it doesn't really feel like the burden is on the people to make a cleaner pool

there is never only a single way to make money. some ways are easier. some ways let you take advantage of others; these are of the variety that create cesspools.

> Half the time I find myself having to do ridiculous search contortions to get somewhat useful results - appending site: .edu or .gov

A great opportunity for students and public servants to sell premium URLs.

dont forget adding quotes to things to stop the random "did you mean to spell this?" crap

basically, like everything in modernity, its a race to the bottom of the infinite dullards of popular

I wish duckduckgo had better results. google still better

I can't remember the last time I searched on Google without appending "reddit" to the end.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact