Following on from one of the comments here, namely the idea that "value is in the eye of the beholder", I'd like to raise our own plight. I run a number of aggregator sites - the largest and oldest of them being celebrifi.com, which was a PageRank 5 until Google de-indexed us in December (along with some of our other sites, but interestingly not all).
A little background - the purpose of the sites is to aggregate, organize, rank and add context to what's happening in the news, with each site focusing a specific vertical. Think Techmeme, but with more context.
I'll be the first to admit, there is no original content, but I strongly believe that we "add value" by figuring out what exactly is going on in any given story or blog post.
We add value to publishers, by always linking to the original source (indeed, many publishers directly request that we add their feeds to the sources we track), we respect copyright by only displaying a short snippet of the original text and only displaying thumbnail images and we add value to users by giving them easy access to a lot more content on the same topic/story, all in the same place.
Google's Quality Guidelines clearly state that duplicate content is penalized, and that is totally fine with us, but is it right to totally de-index a site for duplicate content? I wouldn't even want to rank above the original source for any given piece of content, as I respect the hard work that writers and publishers put into creating quality content, but aggregators who add value have a role to play in the content ecosystem. Digg, for example, uses the "wisdom of the crowd" to aggregate and rank content - hence adds value. Topix takes a local approach to aggregating content, and uses comments to rank content - hence adds value. We take a verticalized approach to aggregating and ranking content, and hence I believe that we add value.
As mentioned above, we got de-indexed in December, and despite going through the appeals process, fixing a few things on our end to do with sitemaps, and clearing out some older "low quality" sources that we were tracking, we received no clarity into what our crime was.
Matt, I'd like to raise this issue with you - both as it relates to us, but also as a general industry question - are all aggregators going to be de-indexed? And if not, which aggregators are and which aren't? What is the criteria, and who decides? If its algorithmic, then I am very curious to know what on our sites triggered the de-indexing? And even more curious to know why some of our sites got de-indexed, and some didn't.
I have great respect for Google's efforts to clean up spam and low-quality content - and would always expect to see original content ranking higher than aggregated content. But to completely de-index an established aggregator site and strip it of its PageRank seems very draconian.
I would love to hear your/Google's position, and look forward to some more clarity on both our situation, and the future of news/content aggregators.
Respectfully yours, Niko
Presumably, aggregator sites by themselves also help in content discovery. I find a lot of content through Hacker News. But they should do so by being good enough to be a destination in themselves. An aggregator that needs to be found by search engine isn't doing users any favors.
Google News provides snippets of content, and helps people discover the news providing direct links to the original source of news. For Google to deindex a site like celebrifi and while running a competing product (Google News) smells a bit of monopolistic behavior. It's suspect it's unintentional, but Google is going to have to walk a very very fine line as you start deindexing certain sites.
Google News itself falls into the second category - an aggregator that stands on its own, as a destination. Much like Hacker News. I personally don't use it, but the people that do go because they find it lets them discover a bunch of content that they otherwise wouldn't know about.
Google News does stand alone as an aggregator, but you have to admit that it is promoted heavily by Google search. If GOOG keeps doing stuff like that, I suspect there are going to be a lot more companies that start to take umbrage, and start challenging this behavior in court claiming that it's anti-competitive behavior.
I go to Google because it provides value to me. Aggregator / republishing sites trick you into visiting them with the lure of what looks like actual, original content. This makes the internet less useful because you end up reading the same content over and over.
Of course original content has priority, the original poster explicitly agreed with that and there are some site who do produce no value at all, but saying every site who purely uses other sites content are all ruining the internet is obviously very wrong, considering this is google we are talking about.
Example would be my wife's company, which is a large non-profit science center that helps educate kids and adults about science and has a wonderful website that they pay a lot of money to maintain and support. However, Google doesn't see any problem with the fact that they've scrapped my wife's company website to get their contact data, hours, and a few other points that they then put on their own page (Google Places - whatever that is), along with their own Google ads, as well as link to my wife's company competitors all for the purpose of keeping visitors on their site so that they can get the ad dollars while adding no additional value of their own.
To me this is the biggest scam in the world and makes them look like giant lying crooks on the web because they steal content from others, while banning competitors that do the same thing and all the while telling people that they doing it for the good of the web.
Yeah right?!? Google is doing it for the good of your pocket books. Here's a thought, why doesn't Google start paying sites, or splitting profits with the sites that they scrap data from and then make money off of. They are stealing visitors, traffic, and ad dollars while providing no additional content or benefit to the reader. I believe Google is getting really close to anti-trust issues and should be looked at very closely by the US Gov. They broke Ma Bell up in the 80's because they got too big and I see something like this coming down the line for Google.
People are starting to get concerned that their hands are dappling into many areas that cross-support each other, putting a lot of control and power into one company's hands. Google is not the Internet, they make no original content of their own, they regularly police the web and try to tell us what is relevant or not, similar to a dictator in a closed society, and they make money off all of this so of course it's in their best interest to keep it going and make us think we need them to show us the web. Truth be told, I can find anything I'm looking for on any engine, and often I find good and bad results on all of them so to me Google is just another search engine and if they keep going down their slippery slope they will wind up like Alta Vista.
Here's a suggestion. Drop your bunk PR and link structured algo and develop a new tool that doesn't support spamming the web with paid links. Google created this monster when they put so much emphasis on links and people all over the web buy and sell links to push their rankings up. This model is so old and out-dated that it doesn't make sense. Now Google is putting band-aids all over their old bunk algo trying to keep their out-dated ways going instead of putting in the time, dollars, and effort on making a new model that will actually produced good nature results instead of manipulated results that have to be regularly policed and manipulated to keep it going. The fact that Google has to regular update their algo shows that they know there are problems with it, but they just keep sticking more patches and band-aids on it instead of building a new model that will not support the spam techniques on the web.
All I can say is CONGRATS on putting another band-aid on the OLD ALGO. Maybe this one will only effect a small number of legit sites and, heck, those are casualties of the war on spam, right?
Good luck to all the legit sites that are trying to jump through Google's hoops! Just remember, if you don't make it through today you will be gone tomorrow.
Also, we do actually produce some original content, we have a small staff of writers who create "featured posts" every day - sold ourselves a little short there.
You guys have it, but it's buried under a pile of spam. Lose the spam and you'll recover your integrity.
However, the debate is whether or not these aggregators sites should show up in search results instead of the original source.
I use Techmeme all day long, but don't expect it to ever show up in a search result when I am looking for a specific topic. I want to the original site. I can always go to Techmeme directly to get the "value-added" context on my own.
Aggregator sites are great destinations. They shouldn't be in search results (at least, not above the original sites).
are you talking to the parent post or matt?
Aggregators, it's time to wake up. Google is not your friend, Google is your competitor.
We can agree that Bing and Google are competitors, right?
And we can agree that Bing labels themselves a "Decision Engine" and not a "Search Engine" right?
The line between a Search->Decision->Topical Aggregation Engine is so blurry I'm not sure it exists anymore. Google's stated mission is to "organize the world's information." - they are going to add social and they are going to use your preferences to figure out which stories/pages are interesting to you.
Looking at your site, it's easy to see why Google bumped you down:
It looks like your service is nothing like Techmeme, adding ad-related pages prior to accessing significant content. And the context-aware parts seems to lack any form of editorial choice, such as the Google-image Vodafone photos. In short, you have no insight, let alone opinion, and thus add nothing of value to the information.
Now aware of your site, I don't find any benefit and would ignore it or add it to my block list in my custom Google search.
1. We don't expect our SEO to outrank the original source, I was very clear about that in my original post. SEO is a tool to be used among many other tools to ensure that content is properly "classified", nothing more than that. When a search engine visits, you want that search engine to immediately know what any given page is about, and we do that very well.
2. The Vodafone image concern I don't understand - we identified Vodafone as one of the main entities in the story (Vodafone shut off service to Egypt), and that is why the image is showing up there. If you visit the Renesys blog (the original source) you will see that Vodafone is mentioned there. The Vodafone image is not an ad, it is there to identify what the story is mainly about - context.
3. Regarding adding insight, opinion or value - we believe that the role of algorithmic news aggregation is not to have an opinion, but instead to uncover news that you may not otherwise have been aware of, give you context in the form of links to the main entities in that story, or other stories on the same topic. We believe we do that very accurately, and are always working on making it better and smarter.
4. Regarding the comparison to Techmeme - everyone has their favorite aggregator, and I am a big fan of Techmeme as well. I would, however, point out that Techmeme also displays ads in order to make a living, and they also have "content pages" that you can find through Google search. Indeed, virtually all aggregators run ads against the aggregated content they display on their sites, and all aggregators have "content pages" above and beyond a homepage.
The issue I am trying to uncover is not whether you like our sites or not, but rather what exactly we have done wrong, when compared with other aggregators, that caused us to not just have our pages bumped down in ranking, but to be totally de-indexed and have our PageRank stripped away (from a respectable 5 on Celebrifi). That, and to understand better what the future of content or news aggregation might be - should we expect all aggregators to become de-indexed? Are there guidelines that should be followed, or changes that should be made, to not fall afoul of Google? Are all aggregators on a level playing field, or will the lesser-known ones be shut off while the chosen few with established brand names survive?
These are valid concerns, not just for us but for many others in this space, and we are more than prepared to put in whatever changes might be needed to get back into Google's good books.
The big aggregators will win and rise to the top. The knock-offs will be de-indexed. There's only so much room for news and gossip sites that don't add value, don't take it personally just move on.
On the other hand, if I wanted to see an aggregator site like Huffingtonpost or Digg, then I could Google for a site like that, or even Google for a keyword about a discussion that happens on the aggregator site. That seems legitimate to me, but showing up when I'm googling for something related to the original source is obviously not.
It's unfair to be in the results page with somebody else's content when the people searching for that content are actually targeting the original source, not yours.
That said, I'd be careful of analogizing aggregators to wire services. The AP actually employs their own reporters.
IMO.You are kind of hiding the original source and do make it hard to find. Yes you show the source and your call to action is "Read More" but your site does not give the appearance of a news aggregator.