Hacker News new | past | comments | ask | show | jobs | submit login
CNET is deleting old articles to try to improve its Google Search ranking (theverge.com)
801 points by mikece 11 months ago | hide | past | favorite | 574 comments

So google's shitty search now economically incentivizes sites to destroy information.

Can there be any doubt that Google destroyed the old internet by becoming a bad search engine? Could their exclusion of most of the web be considered punishment for being sites being so old and stable that they don't rely on Google for ad revenue?

I'll just assume you neglected to read TFA, because if you had, you would have discovered that it links to an official Google source that states CNET shouldn't be doing this.[1]

[1] https://twitter.com/searchliaison/status/1689018769782476800

I could imagine CNET‘s SEO team got an average rank goal instead of absolute SEO traffic. So by removing low ranked old pages the avg position of their search results goes closer to the top even though total traffic sinks. I‘ve seen stuff like this happen at my own company as well, where a team‘s KPIs are designed such that they‘ll ruin absolute numbers in order to achieve their relative KPI goals, like getting an increase in conversion rates by just cutting all low-conversion traffic.

In general, people often forget that if your target is a ratio, you can attack the numerator or the denominator. Often the latter is the easier to manipulate.

Even if it's not a ratio. When any metric becomes a target it will be gamed.

My organization tracks how many tickets we have had open for 30 days or more. So my team started to close tickets after 30 days and let them reopen automatically.

Lower death rate in hospitals by sending sick people to hospice! https://www.nbcnews.com/health/health-care/doctors-say-hca-h...

Meanwhile that's not necessarily a bad outcome. In theory it makes the data better by focusing on deaths that might or might not have been preventable, rather than making every hospital look responsible for inevitable deaths.

Of course the actual behavior in the article is highly disturbing.

This is why KPIs or targets should NEVER be calculated values like averages or ratios. The team is then incentivized to do something hostile such as not promote the content as much so that the ratio is higher, as soon as they barely scrape past the impressions mark.

When deciding KPIs, Goodhart's law should always be kept in mind: when a measure becomes a target, it ceases to be a good measure.

It's really hard to not create perverse incentives with KPIs. Targets like "% of tickets closed within 72 hours" can wreck service quality if the team is under enough pressure or unscrupulous.

Sure they can, e.g. on-time delivery (or even better shipments missing the promised delivery date) is a ratio. Or inventory turn rates, there you actually want people to attack the denominator.

Generaly speaking, an easy solution is to attach another target to either the nominator or denominator, a target that requires people to move that in value in acertqin direction. That might even be a different team thanthe one having goals on the ratio.

> Sure they can, e.g. on-time delivery (or even better shipments missing the promised delivery date) is a ratio. Or inventory turn rates, there you actually want people to attack the denominator.

These are good in that they’re directly aligned with business outcomes but you still need sensible judgement in the loop. For example, say there’s an ice storm or heat wave which affects delivery times for a large region – you need someone smart enough to recognize that and not robotically punish people for failing to hit a now-unrealistic goal, or you’re going to see things like people marking orders as canceled or faking deliveries to avoid penalties or losing bonuses.

One example I saw at a large old school vendor was having performance measured directly by units delivered, which might seem reasonable since it’s totally aligned with the company’s interests, except that they were hit by a delay on new CPUs and so most of their customers were waiting for the latest product. Some sales people were penalized and left, and the cagier ones played games having their best clients order the old stuff, never unpack it, and return it on the first day of the next quarter - they got the max internal discount for their troubles so that circus cost way more money than doing nothing would have, but that number was law and none of the senior managers were willing to provide nuance.

Selling something in one quarter, with the understanding that the customer returns it the next, is also clean cut accounting fraud.

Yeah, every part of this was a “don’t incentivize doing this”. I doubt anyone would ever be caught for that since there was nothing in writing but it was a complete farce of management. I heard those details over a beer with one of the people involved and he was basically wryly chuckling about how that vendor had good engineers and terrible management. They’re gone now so that caught up with them.

That can be gamed as well: you could either change the scope or cut corners and ship something of lower quality.

I mean, ideally you just have both an absolute and a calculated value to ensure both trend in the right direction.

This is exactly how Red Ventures runs their companies. Make that chart on the wall tv go up, get promotion.

That only says that Google discourages such actions, not that such actions are not beneficial to SEO ranking (which is equal to the aforementioned economic incentive in this case).

So whose word do we have to go on that this is beneficial, besides anonymous "SEO experts" and CNET leadership (those paragons of journalistic savvy)?

Perhaps what CNET really means is that they're deleting old low quality content with high bounce rates. After all, the best SEO is actually having the thing users want.

In my experience SEO experts are the most superstitious tech people I ever met. One guy wanted me to reorder HTTP header fields to match another site. He wanted our minified HTML to include a linebreak just after a certain meta element just because some other site had it. I got requests to match variable names in minified JS just because googles own minified JS had that name.

> In my experience SEO experts are the most superstitious tech people I ever met.

And some are the most data-driven people you'll ever meet. As with most people who claim to be experts, the trick is to determine whether the person you're evaluating is a legitimate professional or a cargo-culting wanna-be.

I’ve always felt there is a similarity to day traders or people who overanalyze stock fundamentals. There comes a time when data analysis becomes astrology…

> There comes a time when data analysis becomes astrology.

Excellent quote. It's counterintuitive but looking at what is most likely to happen according to the datasets presented can often miss the bigger picture.

This. It is often the scope and context that determines logic. It is easy to build bubbles and stay comfy inside. Without revealing much, I asked a data scientist whose job it is to figure out bids on keywords and essentially control how much $ is spent on advertising something at a specific region about negative criteria. As in, are you sure you wouldn’t get this benefit even if you stopped spending the $ and his response was “look at all this evidence that our spend caused this x% increase in traffic and y% more conversions” and that was 2 years ago. My follow up question was - okay, now that the thing you advertised is popular, wouldn’t it be the more organic choice in the market, and we can stop spending the $ there? His answer was - look at what happened when we stopped the advertising in this small region in Germany 1.5 years ago! My common sense validation question still stands. I still believe he built a shiny good bubble 2 years ago, and refuses to reason with wider context, and second degree effects.

The people who spend on marketing are not incentivised to spend less :)

> There comes a time when data analysis becomes astrology...

or just plain numerology

Leos are generally given the “heroic/action-y” tropes, so if you are, for example, trying to pick Major League Baseball players, astrology could help a bit.

Right for the wrong reasons is still right.

Right for the wrong reasons doesn't give confidence it's a sustainable skill. Getting right via randomness also fits into the same category.

My data driven climate model indicates that we could combat climate change by hiring more pirates.

Some of the most superstitious people I've ever met were also some of the most data-driven people I've ever met. Being data-driven doesn't exclude unconscious manipulation of the data selection or interpretation, so it doesn't automatically equate to "objective".

The data analysis I've seen most SEO experts do is similar to sitting at a highway, carefully timing the speed of each car, taking detailed notes of the cars appearance, returning to the car factory and saying that all cars need to be red because the data says red cars are faster.

One SEO expert who consulted for a bank I worked at wanted us to change our URLs from e.g. /products/savings-accounts/apply by reversing them to /apply/savings-accounts/products on the grounds that the most specific thing about the page must be as close to the domain name as possible, according to them. I actually went ahead and changed our CMS to implement this (because I was told to). I'm sure the SEO expert got paid a lot more than I did as a dev. A sad day in my career. I left the company not long after...

Unfortunately though, this was likely good advice.

The yandex source code leak revealed that keyword proximity to root domain is a ranking factor. Of course, there’s nearly a thousand factors and “randomize result” is also a factor, but still.

SEO is unfortunately a zero sum game so it makes otherwise silly activities become positive ROI.

But that's wrong... Do breadcrumbs get larger as you move away from the loaf? No!

It's just a URL rewrite rule for nginx proxy.

If you want all your canonical urls to be wrong and every navigation to include a redirect, sure.

Even if that measurably improves the ranking of your website, it still would be a bullshit job. Also cries for side effects, especially on the web.

I think you're largely correct but Google isn't one person so there may be somewhat emergent patterns that work from an SEO standpoint that don't have a solid answer to Why. If I were an SEO customer I would ask for some proof but that isn't the market they're targeting. There was an old saying in the tennis instruction business that there was a bunch of 'bend your knees, fifty please'. So lots of snakeoil salesman but some salesman sell stuff that works.

That's a bit out there, but Google has mentioned in several different ways that pages and sites have thousands of derived features and attributes they feed into their various ML pipelines.

I assume Google is turning all the site's pages, js, inbound/outbound links, traffic patterns, etc...into large numbers of sometimes obscure datapoints like "does it have a favicon", "is it a unique favicon?", "do people scroll past the initial viewport?", "does it have this known uncommon attribute?".

Maybe those aren't the right guesses, but if a page has thousands of derived features and attributes, maybe they are on the list.

So, some SEO's take the idea that they can identify sites that Google clearly showers with traffic, and try to recreate as close a list of those features/attributes as they can for the site they are being paid to boost.

I agree it's an odd approach, but I also can't prove it's wrong.

Considering their job can be done by literally anyone, they have to differentiate somehow

>our minified HTML

Unreadable source code is a crime against humanity.

Is minified "code" still "source code"? I think I'd say the source is the original implementation pre-minification. I hate it too when working out how something is done on a site, but I'm wondering where we fall on that technicality. Is the output of a pre-processor still considered source code even if it's not machine code? These are not important questions but now I'm wondering.

Source code is what you write and read, but sometimes you write one thing and people can only read it after your pre processing. Why not enable pretty output?

Plus I suspect minifying HTML or JS is often cargo cult (for small sites who are frying the wrong fish) or compensating for page bloat

It doesn't compensate bloat, but it reduces bytes sent over the wire, bytes cached in between and bytes parsed in your browser for _very_ little cost.

You can always open dev tools in your browser and have an interactive, nicely formatted HTML tree there with a ton of inspection and manipulation features.

In my experience usually the bigger difference is made by not making it bloated in the first place... As well as progressive enhancement, nonblocking load, serving from a nearby geolocation etc. I see projects minify all the things by default while it should be literally the last measure with least impact on TTI

It does stuff like tree shaking as well; it's quite good. If your page is bloated, it makes it better. If your page is not bloated, it makes it better.

Tree-shaking is orthogonal to minification tho.

That's true.

and does an LLM care … it feels like minification doesn’t stop one form explaining the code at all.

The minified HTML (and, god forbid, JavaShit) is the source from which the browser ultimately renders a page, so yes that is source code.

"The ISA bytecode is the source from which the processor ultimately executes a program, so yes that is source code."

I suppose the difference is that someone debugging at that level will be offered some sort of "dump" command or similar, whereas someone debugging in a browser is offered a "View Source" command. It's just a matter of convention and expectation.

If we wanted browsers to be fed code that for performance reasons isn't human-readable, web servers ought to serve something that's processed way more than just gzipped minification. It could be more like bytecode.

I find myself using View Source sometimes, too, but more often I just use devtools, which shows DOM as a readable tree even if source is minified.

I'm actually all for binary HTML – not just it's smaller, it can also be easier to parse, and makes more sense overall nowadays.

Let's be honest, a lot of non-minified JS code is barely legible either :)

For me I guess what I was getting at is that I consider source the stuff I'm working on - the minified output I won't touch, it's output. But it is input for someone else, and available as a View Source so that does muddy the waters, just like decompilers produce "source" that no sane human would want to work on.

I think semantically I would consider the original source code the "real" source if that makes sense. The source is wherever it all comes from. The rest is various types of output from further down the toolchain tree. I don't know if the official definition agrees with that though.

>If we wanted browsers to be fed code that for performance reasons isn't human-readable,

Worth keeping in mind that "performance" here refers to saving bandwidth costs as the host. Every single unnecessary whitespace or character is a byte that didn't need to be uploaded, hence minify and save on that bandwidth and thus $$$$.

The performance difference on the browser end between original and minified source code is negligible.

Last time I ran the numbers (which admittedly was quite a number of years ago now), the difference between minified and unminified code was negligible once you factored in compression because unminified code compresses better.

What really adds to the source code footprint is all of those trackers, adverts and, in a lot of cases, framework overhead.

I was thinking transfer speed, although even then, the difference is probably negligible if compressing regardless.

The way I see it, if someone needs to minify their JavaShit (and HTML?! CSS?!) to improve user download times, that download time was horseshit to start with and they need to rebuild everything properly from the ground up.

> It could be more like bytecode.

Isn’t this essential what WebAssembly is doing? I’ll admit I haven’t looked into it much, as I’m crap with C/++, though I’d like to try Rust. Having “near native” performance in a browser sounds nice, curious to see how far it’s come.

If you need to use prettiefy to even have a chance to understand the code, is it still source code?

About the byte code: You mean wasm? (Guess that's what you're alluding to.)

If you need syntax highlighting and an easy way to navigate between files to understand a large code base, is it still source code?

Turrles all the way down

Nobody tell this guy about compilers.

Minifying HTML is basically just removing non-significant whitespace. Run it through a formatter and it will be readable.

If you dislike unreadable source code I would assume you would object to minifying JS, in which case you should ask people to include sourcemaps instead of objecting to minification.

So I guess you think compiled code is even worse, right?

I mean, isn't that precisely why open source advocates advocate for open source?

Not to mention, there is no need to "minify" HTML, CSS, or JavaShit for a browser to render a page unlike compiled code which is more or less a necessity for such things.

Minifying code for browsers greatly reduces the amount of bandwidth needed to serve web traffic. There's a good reason it's done.

By your logic, there's actually no reason to use compiled code at all, for almost anything above the kernel. We can just use Python to do everything, including run browsers, play video games, etc. Sure, it'll be dog-slow, but you seem to care more about reading the code than performance or any other consideration.

I already alluded[1] to the incentives for the host to minify their JavaShit, et al., and you would have a point if it wasn't for the fact that performance otherwise isn't significantly different between minified and full source code as far as the user would be concerned.

[1]: https://news.ycombinator.com/item?id=37072473

I'm not talking about the browser's performance, I'm talking about the network bandwidth. All that extra JS code in every HTTP GET adds up. For a large site serving countless users, it adds up to a lot of bandwidth.

Somebody mentioned negligible/deleterious impacts on bandwidth for minified code in that thread, but they seemed to have low certainty. If you happen to have evidence otherwise, it might be informative for them.


Glad to see the diversity of HN readers apparently includes twelve year olds.

Anyway, you do realise plenty of languages have compilers with JS as a compilation target, right? How readable do you think that is?

>Glad to see the diversity of HN readers apparently includes twelve year olds.

The abuse of JavaScript does not deserve the respect of being called by a proper name.

>Anyway, you do realise plenty of languages have compilers with JS as a compilation target, right? How readable do you think that is?

If you're going to run "compiled" code through an interpreter anyway, is that really compiled code?

>In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language).

Well, no, but it can speed up loading by reducing transfer.

Is that still true with modern compression?

If someone releases only a minified version of their code, and licenses it as free as can be, is it open source?

According to the Open Source Definition of the OSI it's not:

> The program must include source code [...] The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor [...] are not allowed.

The popular licenses for which this is a design concern are careful to define source code to mean "preferred form of the work for making modifications" or similar.

It’s also a crime against zorgons from Planet Zomblaki.

Google actually describes an entirely plausible mechanism of action here at [1]. old content slows down site crawling, which can cause new content to not be refreshed as often.

Sure, one page doesn’t matter, but thousands will.

[1] https://twitter.com/searchliaison/status/1689068723657904129...

It says it doesn't affect ranking and their quote tweet is even more explicit


This is the actual quote from Google PR:

>Removing it might mean if you have a massive site that we’re better able to crawl other content on the site. But it doesn’t mean we go “oh, now the whole site is so much better” because of what happens with an individual page.

Parsing this carefully, to me it sounds worded to give the impression removing old pages won’t help the ranking of other pages without explicitly saying so. In other words, if it turns out that deleting old pages helps your ranking (indirectly, by making Google crawl your new pages faster), this tweet is truthful on a technicality.

In the context of negative attention where some of the blame for old content being removed is directed toward Google, there is a clear motive for a PR strategy that deflects in this way.

The tweet is also clearly saying that deleting old content will increase the average page rank of your articles in the first N hours after it is published. (Because the time to first crawl will decrease, and the page rank is effectively zero before the first crawl).

CNet is big enough that I’d expect Google to ensure the crawler has fresh news articles from it, but that isn’t explicitly said anywhere.

And considering all the AI hype, one could have hoped that the leading search engine crawler would be able to "smartly" detect new contents based on a url containing a timestamp.

Apparently not if this SEO trick is really a thing...

EDIT : sorry my bad it's actually the opposite. One could expect that a site like CNET would include a timestamp and a unique ID in their URL in 2023. This seems to be the "unpermalink" of a recent cnet article.

Maybe the SEO expert could have started there...


I did the tweet. It is clearly not saying anything about the "average page rank" of your articles because those words don't appear in the tweet at all. And PageRank isn't the only factor we use in ranking pages. And it's not related to "gosh, we could crawl your page in X hours therefore you get more PageRank."

It's not from Google PR. It's from me. I'm the public liaison for Google Search. I work for our search quality team, not for our PR team.

It's not worded in any way intended to be parsed. I mean, I guess people can do that if they want. But there's no hidden meaning I put in there.

Indexing and ranking are two different things.

Indexing is about gathering content. The internet is big, so we don't index all the pages on it. We try, but there's a lot. If you have a huge site, similarly, we might not get all your pages. Potentially, if you remove some, we might get more to index. Or maybe not, because we also try to index pages as they seem to need to be indexed. If you have an old page that doesn't seem to change much, we probably aren't running back ever hour to it in order to index it again.

Ranking is separate from indexing. It's how well a page performs after being indexed, based on a variety of different signals we look at.

People who believe removing "old" content aren't generally thinking that's going to make the "new" pages get indexed faster. They might think that maybe it means more of their pages overall from a site could get indexed, but that can include "old" pages they're successful with, too.

The key thing is if you go to the CNET memo mentioned in Gizmodo article, it says this:

"it sends a signal to Google that says CNET is fresh, relevant and worthy of being placed higher than our competitors in search results."

Maybe CNET thinks getting rid of older content does this, but it's not. It's not a thing. We're not looking at a site, counting up all the older pages and then somehow declaring the site overall as "old" and therefore all content within it can't rank as well as if we thought it was somehow a "fresh" site.

That's also the context of my response. You can see from the memo that it's not about "and maybe we can get more pages indexed." It's about ranking.

Suppose CNET published an article about LK99 a week ago, then they published another article an hour ago. If Google hasn’t indexed the new article yet, won’t CNET rank lower on a search for “LK99” because the only matching page is a week old?

If by pruning old content, CNET can get its new articles in the results faster, it seems this would get CNET higher rankings and more traffic. Google doesn’t need to have a ranking system directly measuring the average age of content on the site for the net effect of Google’s systems to produce that effect. “Indexing and ranking are two different things” is an important implementation detail, but CNET cares about the outcome, which is whether they can show up at the top of the results page.

>If you have a huge site, similarly, we might not get all your pages. Potentially, if you remove some, we might get more to index. Or maybe not, because we also try to index pages as they seem to need to be indexed.

The answer is phrased like a denial, but it’s all caveated by the uncertainty communicated here. Which, like in the quote from CNET, could determine whether Google effectively considers the articles they are publishing “fresh, relevant and worthy of being placed higher than our competitors in search results”.

You're asking about freshness, not oldness. IE: we have systems that are designed to show fresh content, relatively speaking -- matter of days. It's not the same as "this article is from 2005 so it's old don't show it." And it's also not what is being generally being discussed in getting rid of "old" content. And also, especially for sites publishing a lot of fresh content, we get that really fast already. It's essential part of how we gather news links, for example. And and and -- even with freshness, it's not "newest article ranks first" because we have systems that try to show the original "fresh" content or sometimes a slightly older piece is still more relevant. Here's a page that explains more ranking systems we have that deal with both original content and fresh content: https://developers.google.com/search/docs/appearance/ranking...

Dude, like who is google? The judicial system of the web?

No. Google has their own motivations here, they are a player not a rule maker.

Don’t trust SEOs as no one actually knows what works, but certainly dont think google is telling you the absolute truth.

Ha, I actually totally agree with you, apparently my comment gave the wrong impression. I was just arguing with the GP's comment which was trying to (fruitlessly, as you point out) read tea leaves that aren't even there.

While CNET might not be the most reliable side, Google telling content owners to not play SEO games is also too biased to be taken at face value.

It reminds me of Apple's "don't run to the press" advice when hitting bugs or app review issues. While we'd assume Apple knows best, going against their advice totally works and is by far the most efficient action for anyone with enough reach.

Considering how much paid-for unimportant and unrelated drivel I now have to wade through every time I google to get what I am asking for, I doubt very much that whatever is optimal for search-engine ranking has anything to do with what users want.

Wrong, the best SEO is having what users want and withholding it long enough to get a high average session time.

And I suppose a corollary is: "claim to have what the users want, and have them spend long enough to figure out that you don't have it"?

See: every recipe site in existence.

> That only says that Google discourages such actions

Nope. It says that Google does not ding you for old content.

"Are you deleting content from your site because you somehow believe Google doesn't like "old" content? That's not a thing!"

Do the engineers at Google even know how the Google algorithm actually works? Better than SEO experts who spend there time meticulously tracking the way that the algorithm behaves under different circumstances?

My bet is that they don't. My bet is that there is so much old code, weird data edge cases and opaque machine-learning models driving the search results, Google's engineers have lost the ability to predict what the search results would be or should be in the majority of cases.

SEO experts might not have insider knowledge, but they observe in detail how the algorithm behaves, in a wide variety of circumstances, over extended periods of time. And if they say that deleting old content improves search ranking, I'm inclined to believe them over Google.

Maybe the people at Google can tell us what they want their system to do. But does it do what they want it to do anymore? My sense is that they've lost control.

I invite someone from Google to put me in my place and tell me how wrong I am about this.


Once upon a time, Matt Cutts would come on HN give a fairly knowledgeable and authoritative explanation of how Google worked. But those days are gone and I'd say so are days of standing behind any articulated principle.

I work for Google and do come into HN occasionally. See my profile and my comments here. I'd come more often if it were easier to know when there's something Google Search-related happening. There's no good "monitor HN for X terms" thing I've found. But I do try to check, and sometimes people ping me.

In addition, if you want an explanation of how Google works, we have an entire web site for that: https://www.google.com/search/howsearchworks/

Google Alerts come to mind.

The engineers at Google do know how our algorithmic systems work because they write them. And the engineers I work with at Google looking at the article about this found it strange anyone believes this. It's not our advice. We don't somehow add up all the "old" pages on a site to decide a site is too "old" to rank. There's plenty of "old" content that ranks; plenty of sites that have "old" content that rank. If you or anyone wants our advice on what we do look for, this is a good starting page: https://developers.google.com/search/docs/fundamentals/creat...

>The engineers at Google do know how our algorithmic systems work because they write them.

So there's zero machine learning or statistical modeling based functionality in your search algorithms?

There is. Which is why I specifically talked only about writing for algorithmic systems. Machine learning systems are different, and not everyone fully understands how they work, only that they do and can be influenced.

It's really hard to get a deep or solid understanding of something if you lack insider knowledge. The search algorithm is not something most Googlers have access too but I assume they observe what their algorithm does constantly in a lot of detail to measure what their changes are doing.

"Are you deleting content from your site because you somehow believe Google doesn't like "old" content? That's not a thing!"

I guess that Googler never uses Google.

It's very hard to find anything on Google older than or more relevant than Taylor Swift's latest breakup.

I think in this context, saying that it's not a thing that google doesn't like old content just means that google doesn't penalize sites as a whole for including older pages, so deleting older pages won't help boost the site's ranking.

This is not the same as saying that it doesn't prioritize newer pages over older pages in the search results.

The way it's worded does sound like it could imply the latter thing, but that may have just been poor writing.

"poor writing" is the new "merely joking guys!"

That Googler here. I do use Google! And yeah, I get sometimes people want older content and we show fresher content. We have systems designed to show fresher content when it seems warranted. You can imagine a lot of people searching about Maui today (sadly) aren't wanting old pages but fresh content about the destruction there.

Our ranking system with freshness is explained more here: https://developers.google.com/search/docs/appearance/ranking...

But we do show older content, as well. I find often when people are frustrated they get newer content, it's because of that crossover where there's something fresh happening related to the query.

If you haven't tried, consider our before: and after: commands. I hope we'll finally get these out of beta status soon, but they work now. You can do something like before:2023 and we wouldn't show pages from before 2023 (to the best we can determine dates). They're explained more here: https://twitter.com/searchliaison/status/1115706765088182272

"Taylor Swift before:2010"

With archive search, the News section floats links like https://www.nytimes.com/2008/11/09/arts/music/09cara.html

Maybe not related to the age of the content but more content can definitely penalize you. I recently added a sitemap to my site, which increased the amount of indexed pages, but it caused a massive drop in search traffic (from 500 clicks/day to 10 clicks/day). I tried deleting the sitemap, but it didn't help unfortunately.

Ye. I am flabbergasted by people that are gaslighting people into not being "superstitious" about Google's ranking.

How many pages are we talking about here?

100K+. Mostly AI and user generated content. I guess the sudden increase in number of indexed pages prompted a human review or triggered an algorithm which flagged my site as AI generated? Not sure.

Just because someone says water isn't wet doesn't mean water isn't wet.

The contrived problem of trusting authority can be easily resolved by trusting authority

Claims made without evidence can be dismissed without evidence.

"The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command."

-George Orwell, 1984

it seems incredibly short-sighted to assume that just because these actions might possibly give you a small bump in SEO right now, they won't have long-term consequences.

if CNET deletes all their old articles, they're making a situation where most links to CNET from other sites lead to error pages (or at least, pages with no relevant content on them) and even if that isn't currently a signal used by google, it could become one.

No doubt those links are redirected to the CNET homepage.

Isn’t mass redirecting 404s to the homepage problematic SEO-wise?

Technically, you're supposed to do a 410 or a 404, but when some pages being deleted have those extremely valuable old high-reputation backlinks, it's just wasteful, so i'd say it's better to redirect, to the "next best page" like maybe a category or something, or the homepage, as the last resort. Why would it be problematic? Especially if you do a sweep and only redirect pages that have valuable backlinks.

I was only talking about mass redirecting 404s to the homepage, which I've heard is not great, I think what you're saying is fine -- but that sounds like more of a well thought out strategy.

Hi. So I'm the person at Google quoted in the article and also who shared about this myth here: https://twitter.com/searchliaison/status/1689018769782476800

It's not that we discourage it. It's not something we recommend at all. Not our guidance. Not something we've had a help page about saying "do this" or "don't do this" because it's just not something we've felt (until now) that people would somehow think they should do -- any more than "I'm going to delete all URLs with the letter Y in them because I think Google doesn't like the letter Y."

People are free to believe what they want, of course. But we really don't care if you have "old" pages on your site, and deleting content because you think it's "old" isn't likely to do anything for you.

Likely, this myth is fueled by people who update content on their site to make it more useful. For example, maybe you have a page about how to solve some common computer problem and a better solution comes along. Updating a page might make it more helpful and, in turn, it might perform better.

That's not the same as "delete because old" and "if you have a lot of old content on the site, the entire site is somehow seen as old and won't rank better."

Your recommendations are not magically a description of how your algorithm actually behaves. And when they contradict, people are going to follow the algorithm, not the recommendation.

Exactly this, not any different than how they behave with youtube, it seems deceptive at best

Yeah, Google’s statement seems obviously wrong. They say they don’t tell people to delete old content, but then they say that old content does actually affect a site in terms of it’s average ranking and also what content gets indexed.

"They say that old content does actually affect a site in terms of it’s average ranking" -- We didn't say this. We said the exact opposite.

Sorry if I’m misconstruing what was said, but then it seems that what was said isn’t consistent with what actually happens.

What the Google algorithm encourage/discourage and what google blog or documentation encourage/discourage are COMPLETELY different things. Most people here are complaining about the former, and you keep responding about the latter.

No one has demonstrated that simply removing content that's "old" means we think a site is "fresh" and therefore should do better. There are people who perhaps updated older content reasonably to keep it up-to-date and find that making it more helpful that way can, in turn, do better in search. That's reasonable. And perhaps that's gotten confused with "remove old, rank better" which is a different thing. Hopefully, people may better understand the difference from some of this discussion.

I think you have misread the tweet. It says it does not work _and_ discourages the action.

Exactly. Google also discourages link buildning. But getting relevant links from authority sites 100% work.

This is another problem of the entire SEO industry. Websites trust these SEO consultants and growth hackers more than they trust information from Google itself. Somehow, it becomes widely accepted that the best information on Google ranking is from those third parties but not Google.

I'm not sure it is so cut and dried. Who is more likely to give you accurate information on how to game Google's ranking: Google themselves, or an SEO firm. I suspect that Google has far less incentive to provide good information on this than an SEO firm would.

Google will give you advice on how to not be penalized by Google. They won’t give you advice on how to game the system in your favor.

The more Google helps you get ahead, the more you end up dominating the search results. The more you dominate the results, the more people will start thinking to come straight to you. The more people come straight to you, the more people never use Google. The less people use Google, the less revenue Google generates.

> The more you dominate the results, the more people will start thinking to come straight to you.

This is a possible outcome but there are people that type in google.com and then the name of their preferred news site, their bank, etc, every day.

The site with the name they search dominates that search but they keep searching it.

I would like to know what dollar amount Google makes on people typing things like “Amazon” into google search and then clicking the first paid result to Amazon.

Search heremetically provides a very small amount of profits. Google makes most of its money on the websites it surfaces via ads.

It’s the same on YouTube - the majority of the people who work there seem to have no idea how “the algorithm” actually works - yet they still produce all sorts of “advice” on how to make better videos.

I got this feeling the YT algorithm is sticky and like "chooses" who to promote in some self reinforcing loop.

There’s an easy proof that those SEO consultants have a point: find a site that according to Google’s criteria will never rank, which has rocketed to the top of the search rankings in its niche within a couple months. That’s a regular thing and proves that there are ways to rank on Google that Google won’t advise.

Of course SEO consultants are trusted more than Google. They often ignore what Google says and bring good results for their clients.

Google has a vested interest in creating a good web experience. Consultants have an interest in making their clients money.

Link building is a classic example where good consultants deliver value. (There are bad consultants than good ones though)

It could be premature to place fault with the SEO industry. Think about the incentives: Google puts articles out, but an SEO specialist might have empirical knowledge from working for a various number of web properties. It's not that I wouldn't trust Google's articles, but specialists might have discovered undocumented methods for giving a boost.

They certainly want you to believe that.

The good ones will share the data/trends/case studies that would support the effectiveness of their methods.

But the vast majority are morons, grifters, and cargo culters.

The Google guidance is generally good and mildly informative but there’s a lot of depth that typically isn’t covered that the SEO industry basically has to black box test to find out.

> Websites trust these SEO consultants and growth hackers more than they trust information from Google itself.

That's because websites' goals and Google's goals are not aligned.

Websites want people to engage with their website, view ads, buy products, or do something else (e.g. buy a product, vote for a party). If old content does not or detracts from those goals, they and SEO experts say, it should go because it's dragging the rest down.

Google wants all the information and for people to watch their ads. Google likes the long tail; Google doesn't care if articles from the 90's are outdated because people looking at it (assuming the page runs Google ads) or searching for it (assuming they use Google) means impressions and therefore money for them.

Google favors quantity over quality, websites the other way around. To oversimplify and probably be incorrect.

Google actively lies on an infinite number of subjects. And SEO is a completely adversarial subject where Google has an interest in lying to prevent some behaviors. While consultants and "growth hackers" are very often selling snake oil, that doesn't make Google an entity you can trust either.

Hey, don't do that. That's bad. But if you keep doing it, you'll get better SEO. No, we won't do anything to prevent this from being a way to game SEO.

Words without action are useless.

"Google says you shouldn't do it" and "Google's search algorithm says that you should do it" can both be true at the same time. The official guidance telling you what to do doesn't track with what the search algorithm uses to decide search placement. Nobody's going to follow Google's written instructions if following the instructions results in a penalty and disobeying them results in a benefit.

If Google says one thing and rewards a different thing, guess which one will happen.

They say "Google doesn't like "old" content? That's not a thing!"

But who knows, really? They run things to extract features nobody outside of Google knows that are proxies for "content quality". Then run them through pipelines of lots of different not-really-coordinated ML algorithms.

Maybe some of those features aren't great for older pages? (broken links, out-of-spec html/js, missing images, references to things that don't exist, practices once allowed now discouraged...like <meta keywords>, etc). And I wouldn't be surprised if some part of overall site "reputation" in their eyes is some ratio of bad:good pages, or something along those lines.

I have my doubts that Google knows exactly what their search engines likes and doesn't like. They surely know which ads to put next to those maybe flawed results, though.

I don’t know man, I read it but I’ve learned to judge big tech talk purely by their actions and I don’t think there’s a lot of incentive built into their system that supports this statement.

I few tweets down they qualify this, saying that it might improve some things like indexing of the rest of the site:


My understanding is that if you have a very large site, removing pages can sometimes help because:

- There is an indexing "budget" for your site. Removing pages might make reindexing of the rest of the pages faster.

- Removing pages that are cannibalising on each other might help the main page for the keywords to rank higher.

- Google is not very fond of "thin wide" content. Removing low quality pages can be helpful, especially if you don't have a lot of links to your site.

- Trimming the content of a website could make it easier for people and Google to understand what the site is about and help them find what they are looking for.

Google search ranking involves lots of neural networks nowadays.

There is no way the PR team making that tweet can say for sure that deleting old content doesn't improve rank. Nobody can say that for sure. The neural net is a black box, and it's behaviour is hard to predict without just trying it and seeing.

Speaking from experience as someone who is paid for SEO optimization there's a list a mile long of things Google says "doesn't work" or you "shouldn't do" but in fact work very well and everyone is doing it.

I remember these kind of sources right from inside in the Matt Cutts era 15+ years ago encouraging and advising so many things which later proven to not be the case. I wouldn't take this only because it was written by the official guide.

Google says so many things about SEO which are not true. There are some rules which are 100% true and some which they just hope their AI thinks they are true.

RTFA does not include reading through every linked source.

Never has your username been so accurate as to who you are.

There's an awful lot of SEO people on Twitter that claim to be connected to Google, and the article he links on the Google domain as a reference doesn't say anything on the topic that I can find. I'm reluctant to call that an official source.

Journalist here. Danny Sullivan works for Google, but spent nearly 20 years working outside of Google as a fellow journalist in the SEO space before he was hired by the company.

He was the guy who replaced Matt Cutts.

1st paragraph is correct , 2nd not quite - Matt Cutts was a distinguished engineer (looking after web spam at Google) who took on the role of the search spokesperson - it’s that role Danny took over as “search liaison”

No. But it's also complicated, as Matt did thinks beyond web spam. Matt worked within the search quality team, and he communicated a lot from search quality to the outside world about how Search works. After Matt left, someone else took over web spam. Meanwhile, I'd retired from journalism writing about search. Google approached me about starting what became a new role of "public liaison of search," which I've done for about six years now. I work within the search quality team, just as Matt did, and that type of two-way communication role he had, I do. In addition, we have an amazing Search Relations team that also works within search quality, and they focus specifically on providing guidance to site owners and creators (my remit is a bit broader than that, so I deal with more than just creator issues).

thanks, Ernie!

I'm the source. I officially work for Google. The account is verified by X. It's followed by the official Google account. It links to my personal account; my personal account links back to it. I'm quoted in the Gizmodo story that links to the tweet. I'm real! Though now perhaps I doubt my own existence....

He claims to work for Google on X, LinkedIn, and his own website. I am inclined to believe him because I think he would have received a cease and desist by now otherwise.

He claims to work for google as search "liason". He's a PR guy. His job is to make people think that google's search system is designed to improve the internet, instead of it being designed to improve google's accounting.

I actually work for our search quality team, and my job is to foster two-way communication between the search quality team and those outside Google. When issues come up outside Google, I try to explain what's happened to the best I can. I bring feedback into the search quality team and Google Search generally to help foster potential improvements we can make.

Yes. All this is saying that you do not write any code for the search algorithms. Do you know how to code? Do you have access to those repos internally? Do you read them regularly? Or are you only aware of what people tell you in meetings about it.

Your job is not to disseminate accurate information about how the algorithm works but rather to disseminate information that google has decided it wants people to know. Those are two extremely different things in this context.

I work on these kind of vague "algorithm" style products in my job, and I know that unless you are knee deep in it day to day, you have zero understanding of what it ACTUALLY does, what it ACTUALLY rewards, what it ACTUALLY punishes, which can be very different from what you were hoping it would reward and punish when you build and train it. Machine learning still does not have the kind of explanatory power to do any better than that.

No. I don't code. I'm not an engineer. That doesn't mean I can't communicate how Google Search works. And our systems do not calculate how much "old" content is on a site to determine if it is "fresh" enough to rank better. The engineers I work with reading about all this today find it strange anyone thinks this.

Probably not; anyone can claim to work for these companies with no repercussions, because is it a crime? Maybe if they're pricks that lower these companies' public opinion (libel), but even that requires a civil suit.

But lying on the internet isn't a crime. I work for Google on quantum AI solutions in adtech btw.

He’s been lying a long time, considering that he’s kept the lie up that he’s an expert on SEO for nearly 30 years at this point, and I’ve been following his work most of that time.

with a gold badge, half a million followers, as well as a wikipedia page that mentions he works at google?

Did you notice that nowadays a lot of websites have a lot of uninteresting drivel giving a "background" to whatever the thing was you were searching for before you get to read (hopefully) the thing you were searching for?

People discovered that Google measures not only how much time you stay on a webpage but also how much you scroll to define how interesting a website is. So now every crappy "tech tips" website that has an answer that fits in a short paragraph now makes you scroll two pages before you get the thing you actually wanted to read.

I've noticed something similar on youtube.

I search for "how to do X", and instead of just showing me how to do it, which might take 30 seconds, they put a ton of fluff and filler in to the video to make it last 5 minutes.

Typical video goes something like:

0 - Ads, if you're not using an ad blocker

1 - Intro graphics/animation

2 - "Hi, I'm ___, and in this video I'm going to show you how to do X"

3 - "Before I get in to that, I want to tell you about my channel and all the great things I do."

4 - "Like and subscribe."

5 - "Now let's get in to it..."

6 - "What is X?"

7 - "What's the history of X?"

8 - "Why X is so great."

9 - finally... "How to do X"

Fortunately you can skip around, but it's still a bunch of useless fluff and garbage content to get to the maybe 30 seconds of useful information.

What makes this worse is that there's an increasing trend for how-tos to be only available on video.

As someone who learns best by reading, I'm already at a disadvantage with video to begin with. To make it worse, instructional videos tend to omit a great deal of detail in the interest of time. Then when you add nonsense like you're pointing out, it makes the whole thing a frustrating and pointless activity.

The nonvideo content is still there but shitty search is prioritizing video. Obviously video ads pay better.

Notice how Google frequently offers YouTube recommendations at the top of things like mobile results, or those little expandable text drop downs? My guess is it is because clicking that let's them serve a high intent video ad at a higher CPM than a search text ad.

As someone who is Deaf, many of these videos are not accessible. They rely on shitty Google auto captions which aren't accurate at least 25% of the time.

It gets even better when you subscribe to YouTube Premium.

You get no random ad content which just cut into the feed at will, which makes for a somewhat better experience. But there's the inevitable "NordVPN will guarantee your privacy", "<some service here which has no ads and was made by content creators so you don't have to look at ads if you subscribe but hey all our content is on YT but with ads and here is an ad>" ad.

There is no escape. I actually pay for YT premium and it's SO much better than being interrupted by ads for probiotic yoghurt or whatever. I know there are a couple of plugins out there which I have not tried (I think nosponsors is one of them) but I really don't think there is any escape from this stuff.

uBlock Origin and Sponsor Block provide a better experience than YouTube Premium.

... on desktop. My kingdom for Youtube ad blocking on my TV.

SmartTubeNext, if it's an Android smart TV.

That is explicitly encouraged by YouTube, if your video is at least 8 minutes (I think) you're allowed to add more ads.

I've noticed that any video that is 10:XX minutes long almost always is useless

They're probably reaching some YT threshold to have more ads show on it

I can only tolerate modern YouTube at 1.5 or 2x playback speed. Principally because speaking slower to stretch video length has become endemic.

Same. I think it's a mix of that and this "presenter voice" everyone thinks they have to use. My ADHD brain doesn't focus on it well because it's too slow so it's useless to me but all my life I've been told when presenting I should speak slowly and articulately while the reality is that watching anyone speak that way drives me nuts

What's great about writing is that readers can go at their own pace. When speaking, you have to optimize for your audience and you probably lose more people by being too fast vs. the people you lose by talking too slow. I have to say I appreciate YouTubers that go a million miles an hour (hi EEVBlog). As a native speaker of English, I can keep up. But you have to realize, most people in the world are not native speakers of English.

(The converse is; whenever I turn on a Hololive stream I'd say that I pick up 20% of what they're saying. If they talked slower, I would probably watch more than every 3 months. But, they rightfully don't feel the need to optimize for non-native speakers of Japanese.)

> What's great about writing is that readers can go at their own pace.

100%, and you can skim it so that your pace subconsciously varies depending on how relevant or complex that section is.

This is why I hate the trend of EVERYTHING being made into a video. Simple things that mean I have to watch 4-5min of video and have my eardrums blasted by some dubstep intro so some small quiet voice can say "Hi guys, have you ever wanted to do _x_ or _y_ more easily?" before finally just giving me the nugget of information I came for.

I wish more stuff were available in just text + screenshots..

Some of those people seem to be speaking so slow that it is excruciating to listen to them. When I find someone who speaks at a normal speed and I have to slow the video down, they usually have more interesting things to say.

That said, tinkering before and after youtube has been two different worlds. I really like having video to learn hands-on activities. I just wrapped up some mods to a Rancilio Silvia, and I noticed my workflow was videos, how-to guides and blog posts, broader electrical information documentation, part specific manuals / schematics, and my own past knowledge. I felt very efficient having been through the process before, and knowing when to lean on which resource. But the videos are by far the best resource to orient myself when first jumping in to the project, and thus save me a lot of time.

I mean, people are bad at editing. "I didn't have time to write a short letter, so I've written a long letter instead." I don't think it's a conspiracy.

I definitely write super long things when I consciously make the decision to not spend much time on something. Meanwhile, I've been working on a blog post for the better part of 2 years because it's too long, but doesn't cover everything I want to discuss. If you want people to retain the content, you have to pare it down to the essentials! This is hard work.

> I mean, people are bad at editing. "I didn't have time to write a short letter, so I've written a long letter instead." I don't think it's a conspiracy.

Making a long video isn't like writing a rambling letter. It takes work to make 10 minutes of talk out of a 1-minute subject. And mega-popular influencers do this, not just newbs who haven't learned how to edit properly yet.

"Tell me everything you know about Javascript in 1 minute." Figuring out what not to say is the hard part of that question. Rambling into the camera for an hour is easy.

But we're not talking about people taking 10 minutes to summarize a complex topic. We're talking about people taking 10 minutes to deliver 30 seconds of simple, well-delineated info.

This is something that happens a lot. I'll Google a narrow technical question that can be answered in three lines of text--there's literally nothing more of value to say about it--and all the top hits are 5+ minute videos. That doesn't happen by accident.

There's certainly a wide gamut of creators out there, and the handymen I've seen have videos like you mentioned. I imagine the complaints above are about the far more commercialized channels that do in fact model their videos after YT's algorithm.

It doesn't have to be a literal conspiracy. Why do you reject the possibility that people and organizations are reacting to very real and concrete financial incentives which clearly exist?

Certainly there are a lot of people that stretch their videos out to put in more ads, but not everyone with a long video is playing some metrics optimization game. They're just bad at editing.

I think the situation that people run into is something like "how do I install a faucet" and they are getting someone who does it for a living explaining it for the first time. Explaining it for the first time is what makes it tough to make a good video. Then there are other things like "top 10 AskReddit threads that I feel like stealing from this week" and those are too long because they are just trying to get as much ad revenue as possible. The original comment was about howtos specifically, and I think you are likely to run into a lot of one-off channels in those cases.

Sponsorblock is great for cleaning this crap up. Besides skipping sponsor segments, I have it set to autoskip unpaid/self promotion, interaction reminders, intermissions/intros, endcards/credits, and filler tangent/jokes set to manual skip.

SponsorBlock is basically mandatory to watch YouTube now. I can’t even imagine what it would be like without Premium.

One thing I really wish sponsorblock would add is the ability to mask off some part of the screen with an option to mute. More and more channels are embedding on-screen ads, animations, and interaction "reminders" that are, at best, distracting.

> More and more channels are embedding on-screen ads, animations, and interaction "reminders" that are, at best, distracting.

Do you have an example of a video that does this? Seems like an interesting problem to solve.

uBlock Origin is my extension of choice for this. It makes it really easy to block those distractions, and there are quite a few pre-existing filters to choose from.

I think you missed what the poster is asking for. They want to block a portion of the video itself. For example when you watch the news on TV, there is constant scrolling text on the bottom of the screen with the latest headlines. They want to block stuff like that.

I believe you're right! I can't think of any extension that would be able to modify the picture of a stream itself in real-time. What came to my mind was the kind of 'picture-in-picture' video that some questionable news sites display as you scroll down an article, usually a distracting broadcast which is barely related to the news itself.

There’s a channel that explains how to pronounce words that is a particularly bad offender. They talk about the history of the word up front, but without ever actually saying the word. They only pronounce it in the last few seconds, right as the thumbnail overlay appears.

You forgot the sponsor stuff that's 20% of the video's length.

Yeah, I usually skip up to a half of a typical video until they get to the point, sometimes more. People feel like just getting down to business is somehow wrong, they need first to tell the story of their life and how they came to the decision of making this video and why I may want to watch it. Dude, I am already watching it, stop selling it and start doing it!

> Did you notice that nowadays a lot of websites have a lot of uninteresting drivel giving a "background" to whatever the thing was you were searching for before you get to read (hopefully) the thing you were searching for?

I know you came here looking for a recipe for boiled water, but first here's my thesis on the history and cultural significance of warm liquids.

It’s interesting that water is often boiled in metal pots. There are several kinds of metal pots. Aluminum, stainless steel, and copper are often used for pots to boil water in.

Water boils in pots with different metals because only temperature matters for boiling water. If the water is 100c at sea level, it will boil.

Finally, that SA writing skill we learned in school can be put to practice!

Also you might want look into beer as a replacement for water, or mineral oil

Tea made with mineral oil is awful.

Also, "jump to recipe" could be a simple anchor tag that truly skips the drivel. But for some reason it executes ridiculous JavaScript that animates the scroll just slowly enough to trigger every ad's intersection observer along the way.

I hate when you click a search result and it doesn't even have the keyword(s) you searched...

There should be a search engine penalty for loquacious copy.

I want the most succinct result possible when I search the web.

This has been going on for about a decade now. This alone has caused me to remove myself from Google's products and services. They have unilaterally made the internet worse.

How do they know how much you scroll? Does this mean you get penalized in search results if you don't use Google Analytics?

I mean, penalizing sites in their search platform for not using their advertising platform would be blatantly anticompetitive behavior, right?

Surely Google is too afraid of our vigorous pro-competition regulatory agencies and would never do such a thing.

and the cherry on top is that they also own the browser. helps to thwart attempts to "scam" google analytics and track those poor a-holes that don't use it.

Hello from Google! I work for our search ranking team. Sadly, we can't control publishers who do things that we do not advise and do not recommend.

We have no guidance telling publishers to get rid of "old" content. That's not something we've said. I shared this week that it is not something we recommend: https://twitter.com/searchliaison/status/1689018769782476800

This also documents the many times over the years we've also pushed back on this myth: https://www.seroundtable.com/google-dont-delete-older-helpfu...

Are the employees at Google working on Search aware of how bad search results have become in the past year or two? Literally almost everyone I know, inside and outside of tech, has noticed a significant downgrade in quality from Google search results. And a lot of it is due to artificially inflated SEO techniques.

We've been diligently working to improve the results through things like our helpful content system, and that work is continuing. You can read about some of it in a recent post here (and it also describes the Perspectives feature that's live on mobile): https://blog.google/products/search/google-search-perspectiv...

It's great that you responded to the question. Is there a reason you didn't answer it, though?

"We've been diligently working to improve the results" was the response to the question of "Are the employees at Google working on Search aware of how bad search results have become in the past year or two?" I thought that was a clear response.

To be more explicit, yes, we're aware that there are complaints about the quality of search results. That's why we've been working in a variety of ways, as I indicated, to improve those.

We have continued to build our spam fighting systems, our core ranking systems, our systems to reward helpful content. We expanded our product reviews system to cover all types of reviews, as this explains: https://status.search.google.com/incidents/5XRfC46rorevFt8yN...

We regularly improve these systems, which we share about on this page: https://status.search.google.com/products/rGHU1u87FJnkP6W2Gw...

The work isn't stopping. You'll continue to see us revise these systems to address some of the concerns people have raised.

It was clearly a response, yes, but an answer is always better than a response. Thank you for answering!

I am of the opinion that its just internet becoming more spammy and unhelpful rather than google searching becoming bad. Every Tom and his mom seems to have a blog/website which they don't even write themselves. Most of the content on the internet is now for entertainment rather than purpose or knowledge. So, I do wonder if its just the state of the internet these days. As a layman, these days I just go directly to Wikipedia/reddit/youtube rather than searching on google.

The Internet is becoming spammy and bad because of Google's rules for ranking. The fact that Google favors newer content and longer pages with filler text is why people are making the content lower quality.

> Are the employees at Google working on Search aware of how bad search results have become in the past year or two?

I would assume they didn't answer this because the answer is either "No" because echo chamber or "Yes" but they don't want to say that publicly.

Because politics, not solutions, drive big tech.

My strong impression is that in the last two years there a couple change were rolled out to search that sent it straight into the sewer - search seemed to be tweaked to crassly, crudely put any product name above anything else in the searches. But since then, it seems like quality has crept back up again. Simple product terms still get top billing but more complicated searches aren't nerfed.

So it seems the search quality team exists but gets locked on in the closet by advertising periodically.

I know you can't verify anything directly but maybe we could set a system of code for you to communicate what's really happening...

You are also talking to someone who is on the PR team. This term gets thrown out a lot but in this case it is factually true, you are literally talking to a shill. I mean no disrespect to Danny but you are not going to get an honest and straightforward answer out of him.

If you think I am exaggerating, try to prompt him to see if you can get him to acknowledge that Googles current systems incentivize SEO spam. See if he passes the Turing test.

Don't kick the messenger. It's already good that someone (allegedly) from a department related to the situation could give some input. No need to dump all your frustrations on them

You realize they have to combat an entire fleet of marketers and writers who are trying to leverage their algorithms?

Facebook doesn't have guidance telling content creators to publish conspiracy theories, but their policies are willfully optimized to promote it. Take responsibility for the results of your actions like an adult.

We don't have a policy or any guidance saying to remove old content. That said, we absolutely recognize a responsibility to help creators understand how to succeed and what not to do in terms of Google Search. That's why we publish lots of information about this (none of which says "old content is bad." A good place to review the information we provide is from our Search Essentials page: https://developers.google.com/search/docs/essentials

> Take responsibility for the results of your actions like an adult.

"your actions", give me a break. The parent commenter doesn't own Google, and you aren't forced to use the platform.

Are people on this site really convinced that an L3 Google engineer can flick the "Fix Google" switch on the search engine?

> Are people on this site really convinced that an L3 Google engineer can flick the "Fix Google" switch on the search engine?

No, it's just when someone speaks on behalf of the company with the terms "we," they are typically addressed with "you." That doesn't mean we think they're the CEO. Are you unfamiliar with this concept? I can send you an SEO guide on it.

You're missing the point; This guy has zero power over what Google does so publicly berating him is not going to accomplish anything.

And anyways, the sentence "Take responsibility for the results of your actions like an adult" actually does imply he has some personal responsibility here. It's not helpful to the discussion and it's rude.

If you choose to throw yourself on a public forum doing PR for a company doing dumb things and you also insult everyone's intelligence by lying to them, people are gonna be a little rude

"choose to throw yourself"

"a company doing dumb things"

"insult everyone's intelligence"

"lying to them"

That's a little hyperbolic, don't you think? Do you even hear yourself? I fully understand Google hate but directing it at one person who is literally just doing their job (and hasn't lied to anyone despite your allegation) is childish and counterproductive. Save that for Twitter.

Danny has been here since 2008. Your account was created in 2022.

And also, "people" aren't being rude, you are. Own your actions.

No, I'm not being hyperbolic. There is one reason for the SEO algorithm to reward longer articles, and that's ad revenue. To paint it as anything else is lying. And you opened up this conversation extremely rudely with "OMG are you so dumb you think he owns Google."

How long I've been here is irrelevant.

>you opened up this conversation extremely rudely

That wasn't me. Maybe pay closer attention?

>reward longer articles

The age of articles was being discussed, not article length. Maybe pay closer attention?

Actually... you know what, never mind.

The age of the articles was discussed in the original article, but when I was speaking to this engineer, I was talking about the length of articles which is the main criticism levied against Google SEO. I'm aware you didn't read any of it

I followed the thread just fine. You accused me of being rude (it was someone else) and also accused the other commenter of lying. Neither of which are true.

You did that, not me. It's you who seem to be having a problem with understanding the thread.

You said L3 so I was curious. I looked up the guy's LinkedIn [0] and honestly an L3 engineer would have a lot more context about Google's search. Danny, what do you even do?

[0] https://www.linkedin.com/in/dannysullivan/

Before Google, Danny Sullivan was a well respected search engine blogger/journalist. As far as I know, he isn't an engineer. There's no need to be rude.

I work for our search quality team, directly reporting to the head of that team, to help explain how search works to people outside Google and bring concerns and feedback back into team so we can look at ways to improve. I came to the position about six years ago after retiring from writing about search engines as a journalist, explaining how they work to people from 1996 onward.

So you're PR

Yes, I believe you are correct.

That’s impressive! Congrats.

Making statements that you wish publishers wouldn't do various things, doesn't change the actual incentives that the real-world ranking algorithms create for them.

I mean, saying that you should design pages for people rather than the search engine clearly hasn't shut down the SEO industry.

This is the usual if a hazard isn't labeled, it isn't a hazard fallacy.

It doesn't matter if your guidance discourages it, your SEO algorithm is encouraging it. What you call "helpful" in your post is what is financially helpful to Google, not what's helpful to me.

There's no denying Google encourages long rambling nonsense over direct information

No one has demonstrated getting rid of "old" content somehow makes the rest of the site "fresh" and therefore ranks better. What's likely the case is that some people have updated content to make it more useful -- more up-to-date -- and the content being more helpful might, in turn, perform better. That's a much different thing that "if you have a lot of old content, the entire site is somehow old." And if you read the CNET memo, you'll see there's a confusion with these points.

But there's the rub, you're not making content more helpful. You're making it longer and more useless so we have to scroll down more so Google can rake in more ads. The fact that you're calling it more "helpful" is insidious. That's why garbage SEO sites are king on the internet right now. It's the same thing you guys do with Youtube, where you decreased monetization for videos under a certain length. Now every content creator is encouraged to artificially inflate the length of their video for more ads.

You're financially rewarding people for hiding information.

This is our guidance about how people should see themselves to create helpful content to succeed in Google Search: https://developers.google.com/search/docs/fundamentals/creat...

That includes self-assessment questions, including this:

"Are you writing to a particular word count because you've heard or read that Google has a preferred word count? (No, we don't.)"

That's not telling people to write longer. Our systems are not designed to reward that. And we'll keep working to improve them.

Google is destroying the internet is a good way to put it. AD dollars are their only priority. I hope Google dies because of it.

What a fantasy. It does not show any sign of profit decrease. How would a company die with $279.8B revenue, steadily increasing yearly?

I think the theory of Google's death is that they are "killing the golden goose." The idea is that they are killing off all the independent websites on the internet. That is, all the sites besides Facebook/Instagram/Twitter/NetFlix/Reddit/etc. that people access directly (either through an app or a bookmark) and which (barring Reddit) block GoogleBot anyway.

These are all the sites (like CNET) that Google indexes which are the entire reason to use search. They are having their rankings steadily eroded by an ever-rising tide of SEO spam. If they start dying off en masse and if LLMs emerge as a viable alternative for looking up information, we may see Google Search die along with them.

As for why their revenues are still increasing? It's because all the SEO spam sites out there run Google Ads. This is how we close the loop on the "killing the golden goose" theory. Google uses legitimate sites to make their search engine a viable product and at the same time directs traffic away from those legitimate sites towards SEO spam to generate revenue. It's a transformation from symbiosis/mutualism to parasitism.

Edit: I forgot to mention the last, and darkest, part of the theory. Many of these SEO spam sites engage in large-scale piracy by scraping all their content off legitimate sites. By allowing their ads to run on these sites, Google is essentially acting as an accessory to large-scale, criminal, commercial copyright infringement.

Directs traffic away not to generate revenue but to generate revenue faster this quarter in time for the report. They could make billions without liquifying the internet but they would make billions slowly

> Google uses legitimate sites to make their search engine a viable product and at the same time directs traffic away from those legitimate sites towards SEO spam to generate revenue.

[Disclosure: Google Search SWE; opinions and thoughts are my own and do not represent those of my employer]

Why do you assume malicious intent?

The balance between search ranking (Google) and search optimization (third-party sites) is an adversarial, dynamic game played between two sides with inverse incentives, taking place on an economic field (i.e. limited resources). There is no perfect solution; there’s only an evolutionary act-react cycle.

Do you think content spammers spend more or less resources (people, time, money) than Google’s revenue? So then the problem becomes how do you win a battle with orders of magnitude less people, time, and money? Leverage, i.e., engineering. You try your best and watch the scoreboard.

Some people think Google is doing a great job; some think we couldn’t be any worse. The truth probably lies across a spectrum in the middle. So it goes with a globally consumed product.

Also, note, Ads and Search operate completely independent. There’s no signals going from Ads to Search, or vice versa, to inform rankings; Search can’t even touch a lot of the Ads data, and Ads can’t touch Search data. Which makes your theory misinformed.

> Why do you assume malicious intent?

Not GP, but to me, admittably a complete non-expert on search, there are so many low-hanging fruits if search result quality was anywhere on Google's radar that it is really difficult not to assume malicious intent.

Some examples:

- why pinterest is flooding the image results with absolute nonesense? How difficult it would be to derank a single domain that manages to screw google's algorithm totally?

- why there is no option for me to blacklist domains from the search result? Are there really some challenges that can't be practically solved in a couple of minutes of thinking?

- Does google seriously claim they can't differentiate between stackoverflow and the content copying rip-off SEO spam sites?

> why there is no option for me to blacklist domains from the search result?

You might already be aware of this, but you can use uBlock Origin to filter google search results.

Click on the extension --> Settings --> My Filters. Paste in the bottom


Every time I get mislead on clicking onto an AI aggregator site, my filter list grows...

The issues you pointed out might be due to a company policy of not manually manipulating search results and leaving it all to the algorithm. It can be argued that this leads them to improve their algorithm, although at this point I don't think any algorithm other than a good and big LLM/classifier-transformer can solve the ranking problem, and that is probably not economical or something. But OTOH they manually ban domains they deem to be not conformant to the views of the Party. (not CCP, 1984)

Also, note, Ads and Search operate completely independent

That’s the mistake. They should be talking. Sites that engage in unethical SEO to game search rankings should be banned from Google’s ad platform. Why aren’t they? Because Google is profiting from the arrangement.

> Why do you assume malicious intent?

There doesn't have to be any malicious intent, just an endless chase for increased profit next quarter. SEO spam has more ads, thus generates more income for Google. Even if Ads and Search operate "completely independently", there must be a person in the corporate hierarchy which has control over both and could push the products to better synergize and make that KPI tick up.

Actually deranking sites which feature more than three Google Ads banners would improve search quality (mainly by making sites get to the point rather than padding a simple answer into an essay like an 8th grader at an exam) - but it would reduce Ads income so you cannot do it, no matter how independent you claim to be.

I think dismissing the relationship and impact adtech and search continue to have on web culture is an incredibly pointy-headed misstep. It's the sort of willful oversight that someone makes when their career relies on something being true.

Unless you have a clear view by leadership of what they desire the web should be and are willing to disclose it in detail, then there's not much to add by saying you work in Search.

When I enter a search query, that goes into Ads so that half the page can be relevant Ads instead of search results. That's a signal.

I've also kinda been wondering if Google has been ruining it's search to bolster youtube content.

What a fantasy. It does not show any sign of profit decrease. How would a company die with $279.8B revenue, steadily increasing yearly?

At one time both buggy whips and Philco radios had hockey stick growth charts, too.

You must not bet old enough to remember when people thought MySpace would always drive the internet.

> You must not bet old enough to remember when people thought MySpace would always drive the internet.

You know there's a difference between "people thought" and dollars.

Some people think the earth is flat. Opinions can change very quickly - like 5 years ago, people thought elon musk was the hero of the internet.

Here's the stats on MySpace revenue: It generated $800 million in revenue during the 2008 fiscal year.

And MySpace generates about 10% of that now.

Google could be generating $27 billion in revenue in 2038 and be considered a massive failure compared to what it is now.

I fail to see the point you are trying to get at?

Their search results are declining rapidly in quality. "<SEARCH QUERY> reddit" is one of their most common searches. Their results are filled with SEO spam and bots now.

At some point a competitor will emerge. The tech crowd will notice it and begin to use it. Then it will go widespread.

I suspect that the revenue increases have more to do with the addition of new users in developing markets rather than actual value added. Once all potential users have been reached, Google will have to actually improve their product.

No such suspicion is necessary. Google's revenue from the United States has only increased as a share of its total revenue over the past decade. https://abc.xyz/assets/4c/c7/d619b323f5bba689be986d716a61/34...

I'm reminded of the 00's era joke that Microsoft could burn billions of dollars, pivot to becoming a vacuum cleaner manufacturer, and finally make something that doesn't suck.

I don't think Google dying would be good (lots of things would have to migrate infra suddenly), but the adtech being split off into something else would certainly be a welcome turn of events, IMO. I'm tired of seeing promising ideas killed because they only made 7-figure numbers in a spreadsheet where it'd have been viable on its own somewhere it wasn't a rounding error.

Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing. Perhaps it's time to try again?


> Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing.

False. Google Directory (and many other major name, mostly now defunct, web directories) were powered by data from DMOZ which was crowdsourced (and kind of still is, through Curlie [0] and while some parts of the website show updated as recently as today, enough fairly core links are dead or without content that its pretty obviously not a thriving operation.) Also, it was not pre-Wiki: WikiWikiWeb was created in 1995, DMOZ in 1998. It was pre-Wikipedia, but Wikipedia wasn’t the first Wiki.

[0] https://curlie.org/

Interestingly, a static snapshot of DMOZ is still out there: https://dmoztools.net/ | https://web.archive.org/web/20180126194656/http://dmoztools....

Actually, several static snapshops exist (a benefit of open licensing) despite the fact attempts to fork and continue have been not so successful. In addition to the one upthread there are also:



One issue there was that human-curated directories are everywhere. Hacker News is one. Reddit is another. And during the Yahoo times, directories were made everywhere and all over the place. Which one is authoritative? There's too much of them out there.

That said, in NL a lot of people's home pages for a long time was set to startpagina.nl, which was just that, a cool directory of websites that you could submit to the website. It seems to exist still, too.

I don't think we need any "AI" in the modern sense of that word. It would be an improvement to bring google back to its ~2010 status.

Not sure if the kagi folks are willing to share, but I get the impression that pagerank, tf-idf and a few heuristics on top would still get you pretty far. Add some moderation (known-bad sites that e.g. repost stackoverflow content) and you're already ahead of what google gives me.

I feel like they presume I'm a gullible person they need to protect who is just on the Internet for shopping and watching entertainment.

Wasn't the point of them tracking us so much to customize and cater our results? Why have they normalized everything to some focus group persona of Joe six-pack?


Let's try an experiment

Type in "Chicago ticket" which has at least 4 interpretations, a ticket to travel to Chicago, a citation received in Chicago, A ticket to see the musical Chicago and a ticket to see the rock band Chicago.

For me I get the Rock band, citation, baseball and mass transit ticket in that order.

I'm in Los Angeles, have never been to Chicago, don't watch sports, and don't listen to the rock band. Google should know this with my location, search and YouTube history but it apparently doesn't care. What do you get?

Or it know way too much. An alternative explanation for it could also be:

It knows you're in LA and did not look up "Chicago flight", so you probably aren't looking for flights there.

Chicago musical isn't playing in LA so probably not the right kind.

Probably why most people get parking ticket listed higher. It would be interesting to see the results in a city where the band, team or musical has an event soon.

Google tracks you to "customize search results" and I even have a 100% google'd phone (Pixel) but when I'm searching for restaurants it still shows me stuff from portland oregon instead of portland maine. This despite literally having my "Home" marked in my google account as portland maine.

Are you ignoring the ads?

My ads are:

- Flights - Chicago the Musical - Flights - More Flights

My search results are:

- Citation - Citation payment plan - News report on lawsuit regarding citations in Chicago - Baseball

I also live very far from Chicago, and the only time I was there was for a connecting flight some time in the '90s.

I get four top results for paying a parking ticket in Chicago, a city I’ve never been to.

Same here. Never been to Chicago, live in Germany.


Going deep into personalization on Google.com (answering queries using your signed in data across other google properties) feels like high risk low reward. In a post gdpr & right to be forgotten environment they know they have targets on their back. Is super deep personalization really worth getting slapped with megafines?

Where you'll see integrations like this used to be assistant, but is now bard. Both of which have lawyer boggling Eulas and a brand they can sacrifice if need be.

Aren't they both doing the economically incentivized thing tho? Are you saying maybe some things should be beyond economic incentives?

Yes, some things should be beyond economic incentives. Destroying the historical record, for instance. We have plenty of precedent around that, now that we realised it's bad.

This is interesting, as I am actually doing the same thing with a site I have as I noticed my crawl budget has gotten less especially this year and fewer new articles are being indexed.

I suspect this is a long-term play for Google to phase out this search and replace with Bard. Think about it all these articles are doing now is writing a verbose version of what Bard gives you directly unless it’s new human content.

Google has in essence stolen all their information by scraping and storing in a database for its LLM and is offering its knowledge of this directly to users, so in a way, this is akin to Amazon selling its own private label products.

An article about reduced quality was pretty popular on HN a few years ago, that Google results looks like ads. But I believe we have hit a new low recently. Perhaps that is true for the overall quality of publications on the net. The amount or either approved news sites without significant content or outright click farms is immense. Even for topics that should net results. A news site filter would already help a lot, but even then the search seems to only react on buzzwords. Sometimes even terms you didn't search at all that were often associated with said buzzwords.

They could just noindex them.

Google still needs to crawl them to see the noindex tag. And when Google is crawling a lot of pages on your site, it'll be slow.

This is only an issue if you have millions of pages

Also, it will slow down the crawl frequency if you noindex it

So it's a non problem

>This is only an issue if you have millions of pages

You know, like a news website that's been on the internet since the 90s

> Also, it will slow down the crawl frequency if you noindex it

Eventually Google stops crawling noindexed pages.

Even if we pretend for a moment that your statement, that google's search is "shitty", is universally accepted as truth, you can't blame this one on Google.

People have been committing horrifying atrocities in the name of SEO for years. I've seen it firsthand. And it spectacularly backfired each time.

This can very probably be yet another one of such cases.

Can Google tell the difference between old relevant information and old irrelevant (or outdated) information? I'm not seeing any evidence of that. A search engine is not a subject matter expert of everything on the Internet, and it shouldn't be.

In all fairness, there is some old information I would love to disappear eventually. Nothing quite as frustrating than having a question and all the tutorials are for a version of a software that is 15 years old and behaves completely different than the new one.

To be fair, the old internet wasn't killed. It just passed away. The curious voices that were prevalent back then are buried now.

I don’t see why google gets the blame when spam is what forces google to search how it searches.

> Can there be any doubt

This is such terrible, low-quality, manipulative content. It does not belong on HN.

I stopped using search, I have switched 90% over to chatgpt.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact