Hacker News new | past | comments | ask | show | jobs | submit login
Google Forecloses on Content Farms with "Farmer" Algorithm Update (searchengineland.com)
221 points by InfinityX0 on Feb 25, 2011 | hide | past | web | favorite | 103 comments

Here's a specific query moultano: "ikea malm bed" before: http://www.google.com/search?num=100&hl=en&safe=off&... after: http://www.google.com/search?num=100&hl=en&safe=off&...

The ProductWiki page moves from rank 5 to 9 being beaten out by eHow and Scribd.

The ProductWiki page contains more than a dozen reviews and a slew of comments/discussions.

Seems like we got hit as a "low quality" site while scribd and eHow didn't. Amazing.

Ok, been looking at some other searches and finding some interesting things, at least in the shopping vertical.

I present another example "HP 2310m" Before: http://www.google.com/webhp?num=100&hl=en&safe=off&#... After: http://www.google.com/webhp?num=100&hl=en&safe=off&#...

Major differences are that other than the CNET review the new listings are completely dominated by retailers. All the way to page 3. Compare that with the old listings where you see a mix of retailers and other product-centric sites (review sites and forums specifically).

The other major difference is the presence of a Google Products "One Box" in the new listing. I saw this in other examples as well where the Google Products section shows up a lot more often. And product centric sites are being dinged.

I'm seeing us (ProductWiki) and sites in the same space (Retrevo and TestFreaks) have been dinged. The difference between us and those other two sites is that we have unique original content from our community. Traditionally this has served us well by getting better rankings usually (we follow the make a good site and Google will reward you philosophy), but it seems like we're being lumped in with those guys now.

From an optics standpoint this really doesn't look good. Shut out the product listing sites and start promoting Google's own product listings a lot more aggressively.

I'm disappointed at the general trend of ranking shopping sites higher than review and information sites when searching for a product name. If I'm looking for some products, most likely I need information about it, and I don't need to be bombarded by what equates to the circulars. If I want to buy it I can go directly to Amazon's site or any other online store. I think results from shopping sites should be kept inside the Shopping search (unless they also contain product reviews I suppose.)

I disagree. When I enter just a product name, I am looking for the product. When I want reviews, I append 'reviews' to the product name. I would venture to guess that I am not the only one expecting it to work that way if that's the way Google has changed things.

Unfortunately it's rare that you find a product listing which hasn't got a "review(s)" section at the end so this isn't an effective strategy.

You would have thought it was in Google's interests to do this. If the organic search tended to be full of the reviews (and if I am searching for a product I am generally going to want to find out about it, especially user reviews, not where to buy it), then that incentivizes the online shops to buy paid ads.

Everyone seems a winner if they do this, so it seems strange they may have gone in the other direction.

Tangentially, I was looking at your query parameters -- you're just using gl=it to get the "before" results, correct? (because they've only rolled this out to US, I assume)

Also, what's the mcdonalds coupon string in there for?

Oh I just took the links that mvandemar provided and then did my own query with them. Didn't think it would preserve his original search.

McDonalds coupons?

I don't know if it's live yet, but my concert listings site is now number one for a lot of common searches (philly concerts, philadelphia concerts).

It used to be like 2 or 3 pages down below a bunch of content farms, very glad about this.

Edit: Actually, now I'm number 4 for philadelphia and number 1 for Philly, still pretty happy though.

Enjoy the ride :) You may have hit the google lottery!

We launched our music start-up(now dead) and it was getting few k uniques/day. Then overnight we start seeing thousands of visitors from Thailand. And each day--literally--the number would increase by a few grand. Turns out YouTube was blocked in Thailand and we were getting a good chunk of YouTube's traffic from Google. Eventually YouTube was unblocked(six mos later) and our traffic flat-lined.

This could be very painful for the likes of Mahalo. I remember Jason Calcanis mentioning, perhaps when he first noted a change of direction for Mahalo to high-quality content, that he'll make sure Mahalo is the number one Google result for "how to cook a turkey" and similar queries, where they've spent hundreds of dollars (maybe more) on quality content, notably videos. I just Googled "how to cook a turkey", without quotes, and Mahalo is nowhere to be seen! Not sure if tha t's a good thing or a bad thing, but the guys at Mahalo might just be freaking out right now.

One of Mahalo's "pride and joy" results was "How to Play Guitar". They were #2 for a while, now they're 5-6.

Their "How to Play Guitar" page is actually pretty good. It should easily be 3 or 4, ahead of wikiHow, guitarreference, and about.com.

Mahalo is the top result here. Brazilian Google, but with language set as English. Removing the .br from the url takes Mahalo to the second place.


So I started tracking rankings on 164 of eHow's top keywords (selected based on SEMRush report of their most-valuable, SearchVol * CPC).

Anyway, here's the downward movement since the algo started rolling out:

http://www.google.com/search?q=how+to+build+a+robot+from+scr... (-1 ranking)

http://www.google.com/search?q=eurotop+bed (-1)

http://www.google.com/search?q=watch+live+cable+tv+online (-1)

http://www.google.com/search?hl=en&q=find+answers+to+cro... (-1)

So while this may have hit eHow to some degree, 4/164 doesn't seem like a massacre. That said, seoMoz only updates rankings weekly, so maybe next Wednesday will be a different story.

Thanks for this research. ;)

"best digital camera under 300"

Before: http://www.google.com/search?num=100&hl=en&safe=off&...

After: http://www.google.com/search?num=100&hl=en&safe=off&...

ReviewGist page moves from rank 1 to 5.

ReviewGist listing might not have original content but it is the most relevant and accurate. Every other page lists the best cameras under $300 for the previous years, from 2008 to 2010. Only ReviewGist page has the cameras that you should buy right now for under $300 as we update our lists every week. Ask any shop keeper who knows the latest models and they will agree with ReviewGist recommendations more than any of the sites listed from 1 to 5.

I can't help but wonder if this small change will upset the economic incentive to publish oceans of garbage onto the web. At the end of the day it's about the money, and the farms have a very good understanding of how much they can make off of their various "offerings".

What has been troublesome over the last two years is not so much that Google seemed to look the other way (until now), but that larger media companies like AOL and Yahoo were turning to this kind of behavior as a "viable" strategy for the future. It's amazing how many people will work for very, very little an hour writing garbage as opposed to minimum wage with possible tips at a restaurant. The allure of easy money has corrupted people's incentive from the top to the very bottom.

For once I see an actual way to compete with Google. Bing could outright ban sites that produce garbage and make their search results look pretty good by comparison. The question is whether Microsoft is willing to drop the pretense of objectivity to do so. Would users care? Would advertisers?

Search for "Share Bookmarks" AOL/TC about google is No. 1 while delicious is No.2 Interesting

mahalo is No.3 when you search for it. http://www.google.com/search?q=mahalo

many outdated sites are on 1st page for other terms

This is absolutely fantastic. I just googled a few programming topics and the difference is very noticeable. Kudos to Google.

Would love links to the queries if you can remember them.

Not his, but mine: "split string python"

1-4: python.org (good, though one result is for version 2.3)

5: java2s.com (terrible. no content, huge ads. Thank you, flashblock)

6: tutorialspoint.com (w3schools-like. Borderline content farm)

7: stackoverflow.com (pretty relevant, good result)

8: diveintopython.org (should be the first non-python.org result)

9: diveintopython3.org (good)

10: oreilly.com (sample chapter for "Learning Python", excellent)

So, 1-4 are python.org, results 5-6 blow, and 7-10 are superb and should be nearer the top. Not a bad performance, but not perfect.

I would argue that results 1-4 are more confusing than 'good'. Most users in that situation are just going to open each one of those 4 in tabs.

Compare to DDG[1]:

1. An extract from a StackOverflow answer showing how to split a string into a list in python. It uses tokenize and isn't exactly what was being asked, but good.

2. python.org - A single result

3. Stackoverflow answer (a better answer)

4. java2s (urgh)

5. A shitty mailing list archive page from the dev group[2]

6. Another mailing list archive

7. A good tutorial, should be second result

8. A forum thread that is outdated

9. Wikibooks - a good answer, should be further up

Not one result in any search engine links directly to the str split() entry in the official docs.

[1] http://duckduckgo.com/?q=split+string+python

[2] Pages like this should be purged from all search engines, I hate finding them: http://bytes.com/topic/python/answers/473717-string-split

Thx for the specific example. Presumably http://duckduckgo.com/?q=split+python is more what you were after with the 0-click function reference?

We've been trying a lot harder to get those into 0-click, but it doesn't show up yet when you put string in the middle.

ye thats it, it's just that the phrasing of the original example was a bit different. are you guys able to go through search logs to see different permutations of how visitors search for such terms?

ie. "python how to split", "split string python", "str split python" should all = "split python"

Did you see the wikibooks result? That was excellent, I didn't even know about it. Would it be possible to highlight that in the same way stackexchange is?

Those mailing list archive results are a real pain, they only have the answer <1% of the time and are hard to read.

DDG is awesome - it has been my default browser for a little while now. Can I just add one suggestion - put the 'make default browser search' somewhere on your homepage after detecting the user browser? I remember it took me 10+ clicks and links to work out how to make DDG default in Chrome.

Awesome, yeah, that's what we're trying to do now (wrt to the permutations), but we don't have a lot of data so it's a bit difficult.

Noted on the other results and thx for analyzing it so deeply. That realy helps.

There is an 'Add to Chrome' on the homepage, but it goes away after you click on it. Unfortunately, Chrome doesn't make it easy to switch providers. I wrote this up here (about 5 major browsers): http://ye.gg/addto

interesting post! that explain it - thanks

if you go into more specific questions (e.g. jsf select menu or oracleresultset tomcat), the top result used to be from efreedom, which rips off from stackoverflow. now, no more efreedom! freeedommm!

roseindia is still around though.its contents are usually a rewritten form of javadocs. and its like the epitome of horrible UI.

If this gets rid of "experts exchange" I will be very happy. At least content farms usually give some sort of answer to your question...

On the contrary, searching for javascript technical topics still throws up a lot of nonsense. developer.mozilla.org rarely comes up among the top results. For example [javascript offsetwidth] shows java2s.com results on position 2 and 3 for me (may be just my anecdotal case though, css queries seem better off).

It will be interesting to see if this negatively affects the traffic to some of the 'newspapers' (Daily Mail, Telegraph) who just recycle PR in the guise of journalistic endeavour. cf. http://churnalism.com

Does the algorithm update also apply to Google News?

Does the Daily Mail even show up in Google News? I thought they were like the Onion, but not as funny.

Well, unless you count some of the badly Photoshopped pictures they've released.... See http://www.psdisasters.com/search/label/Daily%20Fail if you don't know what I mean.

This intrigues me as well due to the fact that many news and sports websites simply post official Associated Press material. I would think that this sort of thing is not frowned up as strongly as simple content farming due to licensing agreements.

But of course that is relative to the observer.

Google's listings aren't really about being "frowned upon" or not. It's about the best results for a given search. An RSS aggregator like the Planet X sites or local newspaper simply running AP stories isn't doing anything wrong, but that still doesn't mean Google should return them to you when it could send you to the original source. Scrapers and content farms are parasites, and again to the extent possible I mean that term morally neutrally. It's simply what they are. Google doesn't have to make a moral call about parasitism to determine that it isn't in their interests to return those pages.

I don't think there is an "original source" URL for an AP story. The content on hosted.ap.org is more like a scrape of various client newspaper sites than vice versa and it's certainly not the most usable interface to AP content.

Frankly, a lot of local newspaper sites are looking more and more cookie cutter. You will see that many papers use very similar CMS systems & their national news is almost identical. They are also plastered with ads as well. They do have defining local content though, but their national coverage is generic and usually outsourced. I am not sure I would rank the quality of some of their pages as being better than a content farm that got their info from the same source.

It'll be interesting to see where this goes in the next 24 hours: http://money.cnn.com/quote/quote.html?symb=DMD. And by "where this goes", I mean how much it tanks.

I will be watching my AdWords spend, since it is dominated by farmed content with no conceivable source of traffic other than Big Daddy G.

Edit to add:

Let me quantify "dominated" for you. Here's the top ten sites BCC ads show on -- all stats 2/1/11 ~ 2/25/11:


Of the ones not marked as content farms or outright spam, 2 (maybe 2.5) are sites I'd be happy to have my mother visit, and the other ones are ad-filled monstrosities whose sole saving grace is that they are not MFA spam or content farms.

Sadly, all of these are quite profitable for me. cries

Which category do those people fall into?

* Wanted to find something like BCC, ended up on those sites and saw your add.

* Wound up on those sites, saw your ad and said "oh, now that you mention it..."

In other words, is that site putting itself between you and your customers, or is it actually attracting people that might not have otherwise bothered?

Judging by the URLs, they searched for e.g. [how to make bingo cards] and found an eHow page. In an ideal world, I'd rank higher for that than eHow, but I cannot make a page for each of sixty ways to phrase that without essentially copying their farming methods.

In other words, your ads are actually more relevant for them than the search results they found?

Somehow, that just seems wrong.

That's the problem, right? It's a system that's profitable for you, profitable for the content farms, and profitable for Google.

No good for the end user though.

Why do people rag on Demand all the time? I thought Cracked.com's lists were in vogue. Or is it just eHow people are mad at? Genuinely curious.

I personally never thought the content farm problem was as big as it's blown up to be - but I am genuinely curious to see how far their stock drops after this update. This is mostly because I have little awareness as to how markets work, or how much something like this could potentially impact their stock price.

"Tanks" is just a superfluous term that makes it seem like I personally have it out for them, when in actuality that's not the case. As an SEO, though, they (and other content farms) do compete with many of the websites I work on, so I am glad that they have dipped - although I have no particular ill will against them.

I don't think enough sophisticated investors hold Demand stock for this change, which Demand did a good job of spinning the news of, to have a great impact on the price.

It is mostly pension and mutual funds that make up the shareholding. There aren't even any dedicated analysts on DMD atm.

It helps them that Google didn't mention any companies by name when referring to low-quality content and farms.

Hmm... knowing exactly how much DM will be affected could give someone a trading edge.

Anyone here willing to admit they've done this?

When you guys are talking about stocks tanking (or conversely doing very well), do any of you actually short the stock?

I'm asking because I'm a college student and don't have much money, but an opportunity like this looks good.

Buuut I also don't know enough about the market to feel comfortable making a bet like this. My main concern is if people who trade stock often can set their shorts and sell extremely early (maybe minutes after the market opens) leaving the average trader only able to buy shares once the price has fallen too far for a short to be a good strategy.

I poked around on Google but it would also be great to hear what people here think.

SEOMoz CEO Rand Fishkin suggested a few weeks ago that many SEOs were doing this (I didn't, I've yet to delve into the stock game) - and it made perfect sense to do.


It looks like it was gaining a lot at close yesterday and then backed off just a bit... wonder what that gain was about.

Looks like they released Q4 results yesterday. I suspect investors will react far more strongly to hard data like the Q4 then the FUD created by Google.

Actually, I was wrong. The results are US only, so (for now, anyways) you can view what the results looked like before the update by changing the language parameter (&hl=) or the &gl= parameter in the url. For instance, pre-algo rankings (Mahalo #1):


Same query after the update (Mahalo at #7):


There are other confounding effects when you change the language parameter. Consider, for example, the query [bank] using your method:

http://www.google.com/search?gl=en&q=bank bankofamerica.om bankfashion.co.uk

http://www.google.com/search?gl=it&q=bank wikipedia.org bancaditalia.it

Coincidentally (or maybe not, if someone at Google has a wicked sense of humor) PaidContent published an interview this morning with the CEO of Demand Media:


Check out his response to the question: What happens if the company you’re most synergistic with turns you off? Is that something you think about? Do you have to make sure you have other revenue that isn’t reliant on this synergy with Google?

His response (for those who don't want to read another article):

"That could happen but it would be against their best interest and the consumer’s best interest. It’s kind of like Zynga just got a $9 billion valuation. Facebook could turn them off at any time. The iPhone could have been turned off by Verizon or AT&T (NYSE: T). There are a lot of synergistic partnerships that make sense for both parties that last a very long time.

We are diversifying our traffic because the internet is moving that way. We’re aggressively focusing on diversifying traffic. We had 100,000 individual eHow articles receive traffic in December alone just from Facebook. We receive traffic from Twitter. We receive traffic from Digg. We receive traffic from all across the web. We receive direct traffic and traffic from apps like Livestrong. We are naturally diversifying our revenues—not because we’re afraid of Google but because that’s where people are spending their time."

He must have missed the memo. Apple and Zynga have contracts signed in blood that keep them from beig "turned off". As we saw today, Google has no reason to keep Demand Media's lights on.

Google would lose money to but they could take the hit, the demand media model would become completely unviable.

But would Google loose money? If the big content farms shut down tomorrow how would that effect Google? There would instantly be fewer ad views. Wouldn't the fewer views be worth more (same number of people placing ads with fewer places to put them) causing an increase in ad price to cover the difference?

Depends if there is a non demand media site for every search query where they provided an answer to still capture the audience. Ads could be more expensive up to a point, but many would probably not want to run ads more expensive than the fairly cheap rates they get through the long tail of the display network.

If you take the top several dozen or so most-blocked domains from the Chrome extension, then this algorithmic change addresses 84% of them

I wonder what those domains are.

I'm especially curious about the 16% the new algorithm doesn't get rid of.

When folks here posted their blocklists, there were a few borderline sites that I noticed. About.com is the notable one that comes to mind.

One of the surprises I encountered setting up my custom search engine* for anime/manga searches was that I couldn't simply toss about.com into the blacklist - because while they did mirror Wikipedia material, they also had hired or otherwise gotten a passel of legitimate manga & novel reviewers!

* http://www.google.com/cse/home?cx=009114923999563836576:1eor...

about.com's content has noticeably improved in the last year or two.

I remember them being an absolutely scuzzy site, just no useful content at all. Lately that has definitely changed, and definitely for the better.

I googled "sharpen a knife" yesterday early afternoon and I'm pretty sure I got ehow high on the front page. Now it's halfway down the second page.

Appropriately, the summary for eHow is: "on 3/31/2009 This article provides a high level overview of sharpening, but doesn't provide enough detail to enable a beginner to sharpen a knife ..."

Is there some tool SEO people use to compare results before/after Google algorithm changes? Would be interesting to see.

kalvin, there would be no way to do that unless you were prepared and happened to store the queries beforehand. However, if you take a look at the Alexa page for one of the sites in question you can see what some of the top queries were that used to bring those sites traffic, and then manually go see where they rank now. For instance, Mahalo used to be in the #1 spot for many of the queries listed on it's page:


If you look now though you can see they are at #5 for [mcdonalds coupons] (which is still higher than they deserve), #6 for [how to play guitar], more than 5 pages deep for [bed bath and beyond], etc. Check ehow.com and you can see similar results no longer ranking:


I don't know of any resource of ranking information, so you would need to have collected the data beforehand. That being said, I've been collecting data on some queries that showed a lot of eHow results, and I suspect something related to knife sharpening may have been on my list (but I'll have to double check.) Data is stored locally at my office; I'll post a quick update here tomorrow if I have something interesting.

I don't see any major dip on WiseGeek, which is directly-measured traffic:


The results may be directly measured but they are not real time. Click on the 7 Day view and you can see that the latest numbers are from Feb 22, before the change was rolled out. You would have to wait until Sunday or Monday, I would think, before seeing the difference.

see my comment above about the 164 eHow keywords I started tracking last week in anticipation of something like this.

How you view this algo change, I guess depends on if you control websites that might be classified as "content farms".

Job sites, classifieds sites, news archive sites, social sites like HN or reddit etc.

There are a few legit businessmodels (as in, not against Google TOS) that feel the heat from this update.

I am all for banning scraper sites, especially if they outrank the source. But I don't like this update at all: There are still too many what-if's and classification problems (where do you stop?). Do the giants get a free pass, and do the new sites have to fight an uphill battle?

What do I tell new clients? I've seen the same with Keyword-In-Domain's outranking more established sites. What is a whitehat SEO to do, but claim a few Keyword-In-Domains. Now KID's start to become more and more greyhat. Not because claiming a KID is so bad, but because Google has problem ranking relevancy over KID's.

Having a curated content farm, in itself is not a problem and perfectly whitehat. If its a good idea after this update, time will tell. I would really like to know if curated content farms with an editorial staff will be hurt by this update. I don't feel safe right now at all.

P.S. I guess I've found the first blackhat technique to combat being classified a low-quality non-unique site. Google says to add value. So you pull in content from multiple sources, instead of a single source, you article spin the content a little, you add reviews and comments, and then you comment on/review your own stories. Content farms will turn into comment/review farms, and no one will be the wiser.

Also affiliate sites (Google always had you in her sights) and ecommerce sites that used the supplied product descriptions will have a harder time now. Realistically that would include smarter affiliate sites like hackerbooks.com (no unique content, just an Amazon storefront for all Google cares)

I have a decent list of queries I've been collecting data on to compare once word came that a change was rolled out. This is an interesting one that I saw in a comments thread somewhere...


The Demand Media angle is really interesting too. I love the bit quoted in the Searchengineland article from the CEO, wondering how they got tagged with the "content farm" label. Pretty sure this is where I first saw it...


eHow still shows as number one for me in Google. It's nowhere to be found in duckduckgo.

That eHow article actually seems quite helpful and straightforward. Is there a problem with it? Hopefully Google can rank the results on a case-by-case basis instead of condemning an entire site that actually has some useful material.

This result looks the same to me as it always has - I see two eHow articles above the State Department's passport site (one on how to renew an expired passport, and one on how to renew an expired US passport in person.) While I'd agree that they are decent as far as eHow content goes, I think the Dept of State site really is a fantastic and authoritative resource that deserves to be ranked higher. The DOS isn't doing themselves any favor with that crummy title ("Passports"), but I'd have to say that this is a case of relevancy winning out over authority in a nonsensical way.

So a question came up tonight - "How to use an oscilliscope" - googled it and no ehow. Wonderful.

Today will be an interesting day to watch web stats.

I'm really happy they are doing something about the programming/scraping sites. Asking the same question and getting the same answers from top 3 or 4 results was driving me bonkers.

Playing contrarian, though, I wonder how much of these changes are generated by actual user feelings? I am concerned that there is a very vocal minority (which is probably represented the most strongly inside the hacker community) who is now starting to determine what makes a good site or not. If Google starts getting swung around by 2% of its user base simply because they're the loudest, I don't think that would necessarily result in a better product for all -- even though so far, so good.

I'm not sure the "getting pregnant" advice was such a good example. Of course people will laugh when advice involves sex. If the 4 paragraphs had their order changed, so if the first two were switched with the last 2, and perhaps the two paragraphs involving sex were edited down to one, the complaint wouldn't hold water. I posit that such content is just a lightweight overview, which is badly edited and doesn't have too many specific points of action. Compared to other useless scraper content I've seen, it's not that bad.

Neat the example I posted is fixed. In fact me talking about the search in another thread shows up fairly high on the first page.


Only downside is that the explainextended.com article shows up below stackoverflow.com which is too bad because that article would teach you far more than the stackoverflow question would.

12% is HUGE. I fear the inevitable false positives.

If you have a site triggering a false positive, maybe you need to go back and look at your content, where it originates from and how.

Even with some false positives, I think it's still worth it.

Please please roll this out in Australia too.

Can you post some examples of searches that you think it will improve, along with your current results from them?

does this get rid of expertsexchange? <cross fingers>

I wish.

Hopefully this means no more "Big resource gettin' bigger" in my search results ...

See the drastic drop in alexa: http://www.alexa.com/siteinfo/efreedom.com#

Anyone think the HuffingtonPost is glad it sold two weeks ago... Could you imagine what a 12% shakeup did to a republisher.

I bet huffpost gets a lot of direct traffic.

News results look much improved. Yellowpages.com and variants (yp.com) continue to clutter up local search results.

Google is now as far ahead of their competition as they were in 2003.

That depends who you consider their competition to be. I've been using duckduckgo for a while now, and I've rarely seen any of the quality problems everyone has been complaining about with Google.

Until Bing starts getting "signals" from what users are clicking on Google reflecting Google's algorithmic changes...

Bing is now as far ahead as Google was ten clicks ago.

I'm vaguely disturbed by all the talk about "sites people want to see fall". That sounds like almost manual bias against certain sites to me. Didn't hear anything about language analysis and figuring out what is high quality vs low quality content.

The official Google post on this update addresses that. This change is purely academic, not based off feedback data from their recently launched spam tool.

I didn't say that it was based off of the spam tool. But going on Google clearly had targets. It's not like they couldn't figure out what farmers were without the tool.

I wonder how much longer it will be after Google's changes go live that we start to see the exact same changes "magically" appear in Bing as well.


Does Bing have a problem with over-ranking content farms?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact