Hacker News new | past | comments | ask | show | jobs | submit login

Since Matt is responding here, I figured this is worth a shot, no harm in asking. Matt, would love a response from you if you get a chance, since the Webmaster Tools appeals process gives no insight whatsoever to our situation.

Following on from one of the comments here, namely the idea that "value is in the eye of the beholder", I'd like to raise our own plight. I run a number of aggregator sites - the largest and oldest of them being celebrifi.com, which was a PageRank 5 until Google de-indexed us in December (along with some of our other sites, but interestingly not all).

A little background - the purpose of the sites is to aggregate, organize, rank and add context to what's happening in the news, with each site focusing a specific vertical. Think Techmeme, but with more context.

I'll be the first to admit, there is no original content, but I strongly believe that we "add value" by figuring out what exactly is going on in any given story or blog post.

We add value to publishers, by always linking to the original source (indeed, many publishers directly request that we add their feeds to the sources we track), we respect copyright by only displaying a short snippet of the original text and only displaying thumbnail images and we add value to users by giving them easy access to a lot more content on the same topic/story, all in the same place.

Google's Quality Guidelines clearly state that duplicate content is penalized, and that is totally fine with us, but is it right to totally de-index a site for duplicate content? I wouldn't even want to rank above the original source for any given piece of content, as I respect the hard work that writers and publishers put into creating quality content, but aggregators who add value have a role to play in the content ecosystem. Digg, for example, uses the "wisdom of the crowd" to aggregate and rank content - hence adds value. Topix takes a local approach to aggregating content, and uses comments to rank content - hence adds value. We take a verticalized approach to aggregating and ranking content, and hence I believe that we add value.

As mentioned above, we got de-indexed in December, and despite going through the appeals process, fixing a few things on our end to do with sitemaps, and clearing out some older "low quality" sources that we were tracking, we received no clarity into what our crime was.

Matt, I'd like to raise this issue with you - both as it relates to us, but also as a general industry question - are all aggregators going to be de-indexed? And if not, which aggregators are and which aren't? What is the criteria, and who decides? If its algorithmic, then I am very curious to know what on our sites triggered the de-indexing? And even more curious to know why some of our sites got de-indexed, and some didn't.

I have great respect for Google's efforts to clean up spam and low-quality content - and would always expect to see original content ranking higher than aggregated content. But to completely de-index an established aggregator site and strip it of its PageRank seems very draconian.

I would love to hear your/Google's position, and look forward to some more clarity on both our situation, and the future of news/content aggregators.

Respectfully yours, Niko

Respectfully - Sites like yours ruin the internet. You admit it yourself: you produce no content of your own. I do NOT want to see results from pages of that ilk when I google for things. Google made the right choice to de-index it.

Google (in the context of search) produce no content of their own, are they ruining the internet?

The goal from the user's perspective is to get to the content they want as quickly as possible. A search engine helps in that, as presumably you don't know where the content you want is if you're visiting a search engine. A search engine that links to an aggregator site doesn't - the search engine should just send you to the original content directly.

Presumably, aggregator sites by themselves also help in content discovery. I find a lot of content through Hacker News. But they should do so by being good enough to be a destination in themselves. An aggregator that needs to be found by search engine isn't doing users any favors.

I understand what you're saying, but doesn't Google News do the exact same thing?

Google News provides snippets of content, and helps people discover the news providing direct links to the original source of news. For Google to deindex a site like celebrifi and while running a competing product (Google News) smells a bit of monopolistic behavior. It's suspect it's unintentional, but Google is going to have to walk a very very fine line as you start deindexing certain sites.

When News results show up on the search result page, they link directly to the story, they don't link to the Google News landing page or category where you'd then have to click on a link.

Google News itself falls into the second category - an aggregator that stands on its own, as a destination. Much like Hacker News. I personally don't use it, but the people that do go because they find it lets them discover a bunch of content that they otherwise wouldn't know about.

I know that you're saying that Google News is a dedicated site, and separate from the search results. But, Google links to it's own aggregation service from its results on a regular basis, and at the very top of most every results page in Google there is a link to Google News version of the search.

Google News does stand alone as an aggregator, but you have to admit that it is promoted heavily by Google search. If GOOG keeps doing stuff like that, I suspect there are going to be a lot more companies that start to take umbrage, and start challenging this behavior in court claiming that it's anti-competitive behavior.

No, because Google search doesn't intentionally pollute other sites with aggregated content. I don't see Google search pages in Bing's results, outranking original content. I only see Google results if I go to google.com and specifically request them. That's the difference.

I go to Google when I want to find something. I click on a link when I think I have found what I'm looking for. When I click on that link, I want to go straight to my destination, not get lost in an maze of sites that do nothing but link to each other and dilute the content.

I go to Google because it provides value to me. Aggregator / republishing sites trick you into visiting them with the lure of what looks like actual, original content. This makes the internet less useful because you end up reading the same content over and over.

It depends what you think the goal is. I expect Google's goal with searching is to get you to the source data. What is the point in going to an aggregator for a query? Isn't that what google is in a nutshell? I'm not advancing a position, just genuinely posing the question. What role should aggregated results play in a search query if any at all?

The point of using an aggregator is that they supply MORE information surrounding your topic that the original source might. The original article or video might be great, but has no surrounding contextually relevant information that might also be useful or helpful. I personally love using sites that aggregate information for me about my hobbies, whether it be games or sports. I find more about what is going on because that site has done the "value added" work of finding and organizing it for me. Saves me time, because there are millions of sites out there that might have great original content, but I will never find them for many reasons. Not the least of which their SEO might stink. So thank you to aggregators whose SEO chops are good!

I have often found myself landing on an aggregator for a particularly query that has been a helpful resource for related problems.

Of course original content has priority, the original poster explicitly agreed with that and there are some site who do produce no value at all, but saying every site who purely uses other sites content are all ruining the internet is obviously very wrong, considering this is google we are talking about.

From looking at some Google Places I would say they are the king of scrapping content and definitely help contribute to the ruining of the Internet.

Example would be my wife's company, which is a large non-profit science center that helps educate kids and adults about science and has a wonderful website that they pay a lot of money to maintain and support. However, Google doesn't see any problem with the fact that they've scrapped my wife's company website to get their contact data, hours, and a few other points that they then put on their own page (Google Places - whatever that is), along with their own Google ads, as well as link to my wife's company competitors all for the purpose of keeping visitors on their site so that they can get the ad dollars while adding no additional value of their own.

To me this is the biggest scam in the world and makes them look like giant lying crooks on the web because they steal content from others, while banning competitors that do the same thing and all the while telling people that they doing it for the good of the web.

Yeah right?!? Google is doing it for the good of your pocket books. Here's a thought, why doesn't Google start paying sites, or splitting profits with the sites that they scrap data from and then make money off of. They are stealing visitors, traffic, and ad dollars while providing no additional content or benefit to the reader. I believe Google is getting really close to anti-trust issues and should be looked at very closely by the US Gov. They broke Ma Bell up in the 80's because they got too big and I see something like this coming down the line for Google.

People are starting to get concerned that their hands are dappling into many areas that cross-support each other, putting a lot of control and power into one company's hands. Google is not the Internet, they make no original content of their own, they regularly police the web and try to tell us what is relevant or not, similar to a dictator in a closed society, and they make money off all of this so of course it's in their best interest to keep it going and make us think we need them to show us the web. Truth be told, I can find anything I'm looking for on any engine, and often I find good and bad results on all of them so to me Google is just another search engine and if they keep going down their slippery slope they will wind up like Alta Vista.

Here's a suggestion. Drop your bunk PR and link structured algo and develop a new tool that doesn't support spamming the web with paid links. Google created this monster when they put so much emphasis on links and people all over the web buy and sell links to push their rankings up. This model is so old and out-dated that it doesn't make sense. Now Google is putting band-aids all over their old bunk algo trying to keep their out-dated ways going instead of putting in the time, dollars, and effort on making a new model that will actually produced good nature results instead of manipulated results that have to be regularly policed and manipulated to keep it going. The fact that Google has to regular update their algo shows that they know there are problems with it, but they just keep sticking more patches and band-aids on it instead of building a new model that will not support the spam techniques on the web.

All I can say is CONGRATS on putting another band-aid on the OLD ALGO. Maybe this one will only effect a small number of legit sites and, heck, those are casualties of the war on spam, right?

Good luck to all the legit sites that are trying to jump through Google's hoops! Just remember, if you don't make it through today you will be gone tomorrow.

Like I said, and others have said as well, value is in the eye of the beholder - if you follow your own logic, then every single aggregator, including the likes of HuffPo (something like 60-70% of their content is little "curated snippets" from around the web), are ruining the internet. There is very little "truly original" content out there, virtually every single blog rehashes the content of others. Is Techmeme of no value? Is Digg of no value? Is Topix of no value?

Also, we do actually produce some original content, we have a small staff of writers who create "featured posts" every day - sold ourselves a little short there.

Those services add value to the content - comments, votes, opinion, and/or journalistic research.

You guys have it, but it's buried under a pile of spam. Lose the spam and you'll recover your integrity.

The original content you do have just seems to be short rewritten versions of other online sources and any comments they have appear to just be random Tweets that use one of the same keywords.

I'm curious, what is your opinion of TechMeme?

Techmeme, like Google, is a great and valuable resource.

However, the debate is whether or not these aggregators sites should show up in search results instead of the original source.

I use Techmeme all day long, but don't expect it to ever show up in a search result when I am looking for a specific topic. I want to the original site. I can always go to Techmeme directly to get the "value-added" context on my own.

Aggregator sites are great destinations. They shouldn't be in search results (at least, not above the original sites).

I disagree. Sites that aggregate information make it easier to peruse what you are specifically interested in without poring through an RSS feed or visiting site after site. And if only a portion of the text is presented, with a link to the original post, you can always visit the originating site to get the full article.

> Sites like yours ruin the internet. You admit it yourself: you produce no content of your own

are you talking to the parent post or matt?


Though sarcastic, I think this succinctly summarizes the issue.

Aggregators, it's time to wake up. Google is not your friend, Google is your competitor.

We can agree that Bing and Google are competitors, right? And we can agree that Bing labels themselves a "Decision Engine" and not a "Search Engine" right?

The line between a Search->Decision->Topical Aggregation Engine is so blurry I'm not sure it exists anymore. Google's stated mission is to "organize the world's information." - they are going to add social and they are going to use your preferences to figure out which stories/pages are interesting to you.

As an aggregation site, your SEO shouldn't expect to outrank the content you're sourcing - that's unethical. At best, your SEO should focus on the service you provide.

Looking at your site, it's easy to see why Google bumped you down:



It looks like your service is nothing like Techmeme, adding ad-related pages prior to accessing significant content. And the context-aware parts seems to lack any form of editorial choice, such as the Google-image Vodafone photos. In short, you have no insight, let alone opinion, and thus add nothing of value to the information.

Now aware of your site, I don't find any benefit and would ignore it or add it to my block list in my custom Google search.

Let me address these points one by one:

1. We don't expect our SEO to outrank the original source, I was very clear about that in my original post. SEO is a tool to be used among many other tools to ensure that content is properly "classified", nothing more than that. When a search engine visits, you want that search engine to immediately know what any given page is about, and we do that very well.

2. The Vodafone image concern I don't understand - we identified Vodafone as one of the main entities in the story (Vodafone shut off service to Egypt), and that is why the image is showing up there. If you visit the Renesys blog (the original source) you will see that Vodafone is mentioned there. The Vodafone image is not an ad, it is there to identify what the story is mainly about - context.

3. Regarding adding insight, opinion or value - we believe that the role of algorithmic news aggregation is not to have an opinion, but instead to uncover news that you may not otherwise have been aware of, give you context in the form of links to the main entities in that story, or other stories on the same topic. We believe we do that very accurately, and are always working on making it better and smarter.

4. Regarding the comparison to Techmeme - everyone has their favorite aggregator, and I am a big fan of Techmeme as well. I would, however, point out that Techmeme also displays ads in order to make a living, and they also have "content pages" that you can find through Google search. Indeed, virtually all aggregators run ads against the aggregated content they display on their sites, and all aggregators have "content pages" above and beyond a homepage.

The issue I am trying to uncover is not whether you like our sites or not, but rather what exactly we have done wrong, when compared with other aggregators, that caused us to not just have our pages bumped down in ranking, but to be totally de-indexed and have our PageRank stripped away (from a respectable 5 on Celebrifi). That, and to understand better what the future of content or news aggregation might be - should we expect all aggregators to become de-indexed? Are there guidelines that should be followed, or changes that should be made, to not fall afoul of Google? Are all aggregators on a level playing field, or will the lesser-known ones be shut off while the chosen few with established brand names survive?

These are valid concerns, not just for us but for many others in this space, and we are more than prepared to put in whatever changes might be needed to get back into Google's good books.

Radley is spot on. Instead of trying to counter his points you need to read them over and over. If I want to find about Charlie Sheens cocaine and pussy habit I will go to TMZ. You're 'magic algorithms' are NOT what decides what's hot and what's not. It's money through ads on duplicate content. http://informifi.com/company.php?page=about

The big aggregators will win and rise to the top. The knock-offs will be de-indexed. There's only so much room for news and gossip sites that don't add value, don't take it personally just move on.

Perhaps you're looking at it the wrong way. You haven't been de-indexed, Google simply found other sites that were more relevant.

It's pretty clear to me that if I were to Google for something, I'd want to see the original source of the content over a link to a page on an aggregator site.

On the other hand, if I wanted to see an aggregator site like Huffingtonpost or Digg, then I could Google for a site like that, or even Google for a keyword about a discussion that happens on the aggregator site. That seems legitimate to me, but showing up when I'm googling for something related to the original source is obviously not.

It's because when I'm searching for something I want a link to the original source. I don't want a link to your page, really. The results page would become polluted if it contained links to all agregators such as yours.

It's unfair to be in the results page with somebody else's content when the people searching for that content are actually targeting the original source, not yours.

I have to second Niko's position here. Our main business model is "adding value through aggregation". We provide users in many markets aggregated content from a variety of publicly available sources on specific topics. While much of the content on our sites is technically "duplicate", so is much of the content on the NY Times. They syndicate content from the AP, as do many news sites. We do the same, and always take care to credit the authors and provide valuable links back to the original content source. Does this algorithmic change affect sites like ours?

So, which SEO forum was a link to this thread posted to?

Can you elaborate on your use of the word "algorithmic change?" I'm not sure you're using it in the sense that I'm used to and I'm interested in your assertion that simple aggregation adds value.

That said, I'd be careful of analogizing aggregators to wire services. The AP actually employs their own reporters.

I see you lost a half a million visitors in traffic after the de-ranking… Ouch!

IMO.You are kind of hiding the original source and do make it hard to find. Yes you show the source and your call to action is "Read More" but your site does not give the appearance of a news aggregator.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact