
Google algorithm change launched - Matt_Cutts
Earlier this week Google launched an algorithmic change that will tend to rank scraper sites or sites with less original content lower. The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content. Note that the algorithmic change isn't specific to stackoverflow.com though.<p>I know a few people here on HN had mentioned specific queries like [pass json body to spring mvc] or [aws s3 emr pig], and those look better to me now. I know that the people here all have their favorite programming-related query, so I wanted to ask if anyone notices a search where a site like efreedom ranks higher than SO now? Most of the searches I tried looked like they were returning SO at the appropriate times/slots now.
======
RealGeek
Will this change effect sites like filestube.com and freshwap.net? FilesTube
ranks for majority of the long tail keywords, even those not related to
downloads/torrents/rapidshare.

I see filestube's auto-generated search listing pages ranking on Google all
the time. Pages like: <http://www.filestube.com/m/matt+cutts>
<http://www.filestube.com/g/google+scraper>

Same goes for freshwap: <http://www.freshwap.net/387/dl/Google+Matt+Cutts>

These sites will give out an auto-generated page for every keyword you enter
into it. Apparently, Google loves to index them... there are 126 million pages
of files tube indexed in Google. I thought indexing search listing pages of
other search engine was against Google's policies.

~~~
xenophanes
There's some really annoying torrent sites like this. I mean, sites that
pretend they have search results for whatever torrent you're searching for.
Those show up on a google a lot and they're useless.

~~~
moultano
This is another class of problems we're working on. Expect some changes here
in the next few months.

~~~
skinnymuch
Your account page doesn't say anything. I assume you work for Google?

~~~
moultano
Yep, in fact I wrote the change we're talking about in this thread. :)

~~~
skinnymuch
Hah. Good to know. I guess I'm your 'enemy' then since if there was a label
for me it'd probably be blackhatter. Though my 'spam' tends to be of much
greater quality than what most of the stuff BHW sort of people produce (I have
some original content written and excluding my bottom of the barrel sites, the
others have their automated parts like scraping edited/checked by a hired
person).

~~~
rhizome
_Though my 'spam' tends to be of much greater quality than what most of the
stuff BHW sort of people produce_

You threw a vitamin pill into a bucket of mud?

------
seanalltogether
Matt, I just went through my search history because i remembered a very
specific instance of seeing this. Here's the query.

[http://www.google.com/search?q=nstoolbar+bottom+bar&ie=u...](http://www.google.com/search?q=nstoolbar+bottom+bar&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-
US:official&client=firefox-a)

You'll notice that efreedom.com shows on the first page with content taken
directly from stackoverflow. While stackoverflow does show in the results, the
exact page that efreedom copies does not. Anyway, I'm glad you guys are taking
this seriously.

For reference here is what I see right now -
<http://dl.dropbox.com/u/1437645/googlesearchresult.png>

~~~
Matt_Cutts
It looks like we've got SO above efreedom for that query, but it's always nice
to find a url that we didn't have that we'd like to be indexed. That lets us
check whether we can improve our crawling/indexing. Thanks for the example!

~~~
seanalltogether
Yes SO is above efreedom in this instance, but the SO results are actually
worse then the efreedom result based on the query.

~~~
chaosmachine
This is a situation I've seen many times in the past. Often the right site
_is_ on top, but it's showing the wrong result, meanwhile the scraper site
surfaces the correct one.

It seems like just showing a few more results from the "real" site would solve
the problem.

------
Alex3917
Not exactly a scraper site, but if you do a search for "learn to hack" the top
result is just a list of SEO keywords:

<http://www.learn-to-hack.com/>

Several of the other results are rather dubious as well. The reason I bring it
up is because the Squidoo lens that comes up is something I made, and while
certainly not perfect it's still a much better than many of the SEO spam sites
and fake eBooks that rank above it. (And plus the ad revenue is going to
charity rather than some shady organized crime ring.)

Anyway sorry if it's a faux pas to complain about my own stuff, but I feel
like it's a legitimate problem with the way Google works.

~~~
moultano
>Anyway sorry if it's a faux pas to complain about my own stuff, but I feel
like it's a legitimate problem with the way Google works.

Definitely not a faux pas. Thanks for the example. My biggest annoyance in
threads like these is people who write essays about their site losing traffic
but then aren't willing to provide an url for people to check out.

------
runjake
FYI:

Don't flag this because there's no link or "citation". Matt Cutts is the web
spam guy at Google.

~~~
sjs382
Here's the accompanying link: <http://www.mattcutts.com/blog/algorithm-change-
launched/>

~~~
Matt_Cutts
Yup, I actually posted over here at HN a little bit before I did a post on my
personal blog.

~~~
jjcm
I'd like to mention that this is really an impressive way of working with the
community Matt. Companies say all the time that they value their customer's
opinions, but rarely do you see a grievance that's posted on a social news
site being a.) initially responded to by the guy who's responsible for it, and
b.) amended by that person and his team with a request for review. It's
heightened my faith that maybe Google can actually keep that small-company-
feel no matter how large they get. I really hope you guys continue in this
fashion, and thanks for the search fix.

~~~
Matt_Cutts
HN has been a really high signal/noise site in discussing these issues, so it
only seemed fair to give folks a heads-up here and see what other issues
people were seeing. But thank you. :)

------
brown9-2
Is this change restricted to programming-related queries only?

I noticed today that a search for "mubarrek london" returns a page of results
where every result on the first page, besides the top one, is spam from
www.88searchengines.com, www.30searchengines.com, www.70searchengines.com,
etc.

I know this might not be related to topic of scraper sites directly, but not
sure how else one can easily report these types of things.

~~~
treeface
I can't seem to replicate this:

<http://i.imgur.com/kDwPd.png>

Are you sure there isn't something else going on?

~~~
brown9-2
Perhaps it's related to location, it's reproducible with incognito search (to
make sure no personalized settings are being applied):
<http://i.imgur.com/Yd7FN.png>

------
bhavin
I would like to point out one interesting thing I noticed today. I was looking
for "gcc optimization flags for xeon".

Following query is from google.com and contains no efreedom on the front page.
[http://www.google.com/search?hl=en&q=gcc+optimization+fl...](http://www.google.com/search?hl=en&q=gcc+optimization+flags+for+xeon&aq=f&aqi=g-v1&aql=t&oq=)

Now, the same query from google.ie (ireland site) contains 2 efreedom on the
top page!

[http://www.google.ie/search?hl=en&q=gcc+optimization+fla...](http://www.google.ie/search?hl=en&q=gcc+optimization+flags+for+xeon&btnG=Search&aq=f&aqi=&aql=&oq=)

Why this strange search behavior to a query which has no relevance to user's
location?

P.S. I was logged on to my google account while searching, not sure if that
has any effect whatsoever.

~~~
bhavin
UPDATE: I am even more surprised to see that when I logged out of my account
and searched again on google.ie, one of the efreedom disappeared!!

So, does this mean that Google search now takes into account my past search
result clicks (or rather mis-clicks)? Or ranks contents in someway that proves
efreedom is somehow more relevant to me?

~~~
Matt_Cutts
If you're using personalized search, we do look at past clicks to change your
ranking. So if you clicked on efreedom in the past, that could affect your
ranking. You can add "&pws=0" (or use incognito mode in Chrome without logging
in) to turn off personalized search and see whether that's the factor for you.

~~~
bhavin
Adding &pws=0 turned the personalized search off and hence got rid of the
efreedom.

Also, I cleared my web history and searched again. The results are same now
signed in or signed out. Problem solved! :)

Thanks!

~~~
Matt_Cutts
Glad to hear it. &pws=0 is handy to diagnose whether personalized search is
causing something to rank higher for you.

------
Aaronontheweb
Glad to see that Matt went out of his way to help us geeks and the
StackOverFlow community!

In the future is there going to be some way for webmasters to do something
like rel="canonical" across domains so if I want to syndicate a piece of
content across two properties I own I can indicate which one is the original
source? My understanding is that rel="canonical" is only meant to be used
between pages on the same root domain today but I could be mistaken.

~~~
sjs382
rel="canonical does work across domains.

~~~
gregable
[http://googlewebmastercentral.blogspot.com/2009/12/handling-...](http://googlewebmastercentral.blogspot.com/2009/12/handling-
legitimate-cross-domain.html) was the official announcement.

------
miah_
This is awesome! Now if only I could completely remove certain sites from my
search results ala the Google Wiki stuff. I'd love to drop swik, and
expertsexchange and a few other annoying sites. Its possible this algorithm
change will make these sites less annoying to me though.

~~~
gregable
Bonus trick. Experts Exchange actually often has some good stuff on it. The
pages are designed to make you think you have to buy a subscription to see the
answers, but if you simply scroll down beyond the ads for their service the
answers (which aren't scraped from elsewhere) are visible near the bottom of
the page. Yes annoying, I wish they didn't work this way, but there are times
that I've found experts exchange to be the most useful result.

~~~
X-Istence
This only seems to be the case when coming from Google, if you refresh the
page or paste the link to a co-worker the results are gone.

As for the long list of crap between the question and the answer, use AdBlock
to remove that various DIV's.

------
kqueue
Here's a query that shows efreedom above SO. I did site:efreedom.com and
picked few titles until I found one.

query: Mailengine with .NET API

The right page that should show up is

[http://stackoverflow.com/questions/1720900/mailengine-
with-n...](http://stackoverflow.com/questions/1720900/mailengine-with-net-api)

If you google the link below you'll see that the page _is_ indexed.

[http://stackoverflow.com/questions/1720900/mailengine-
with-n...](http://stackoverflow.com/questions/1720900/mailengine-with-net-api)

However, the page doesn't show up when you use the query I mentioned.

------
WillyF
Matt, this is great news.

How about sites that rank well with no content, just navigation? Here's an
example:

[http://www.collegegrad.com/entryleveljob/entrylevelaccountin...](http://www.collegegrad.com/entryleveljob/entrylevelaccountingjobs.shtml)

It's generally a high quality site, but that page has absolutely no relevance
to the query except for a title tag and some internal anchor text. The search
terms aren't even on the page.

If I remember correctly, it used to rank #1 for "accounting entry level jobs,"
and now it's down to #8. My question is why is it even ranking at all? It's
not even low quality content. It's no content.

~~~
tivaceous
I notice that you run a competing site to collegegrad.com - namely
onedayonejob.com. Obviously you are likely to have paid attention to some of
the things your competitor has been doing. But are you sure you're not just
using this as an opportunity to stick it to them? If so, that would strike me
as a distasteful use of this forum, especially since Matt has been very
gracious to give this opportunity to the HN community.

~~~
wizard_2
Even if they were direct competitors with the exact same market and product
(Which they don't appear to be), that could mean that he's got a much better
idea of how his competitors may be doing spammy things then we do. Since these
are algorithmic changes not site specific changes any resulting fix would get
applied to his own site as well.

The motives here don't offend me.

------
Nickwiz
Hey matt, thanks for the update I wanted to know how would it work in this
case:

[http://www.google.com/#q=major+online+dating+sites+koopa&...](http://www.google.com/#q=major+online+dating+sites+koopa&hl=en&client=firefox-a&sa=N&rls=org.mozilla:en-
US:official&fp=1)

We submitted an article to ezinearticles from our blog.koopa.com. Just
wondering how our blog is not remotely listed but ezine which we posted are
article through our blog has? How does this work when our authors submit to
article directories?

On that same note I notice a bunch of other sites just below which have copies
ezines article to the tee and are ranking higher then the our original blog
that posted it.

Is this related to the current algorithm change or just that our blog may not
be indexed yet? Thanks for the update once again. I love the fact that you and
the google team are constantly updating and changing your algo to give value
to rightful content owners.

~~~
moultano
What is the url of your blog?

~~~
Nickwiz2
<http://blog.koopa.com>

------
akie
Yes - I've been doing a lot of YII related searches the past week, and I've
noticed that a lot of times a site called 'devcomments.com' pops up somewhere
in the first 5 results. Usually with a bogus page that does contain the
keywords you were looking for, but not the actual discussion/forum thread. It
appears they have copied their content directly from the official site,
yiiframework.com.

Example:
[http://www.google.com/search?q=Any+yii+way+to+get+the+previo...](http://www.google.com/search?q=Any+yii+way+to+get+the+previous+URL%3F)

In that particular instance it is result number 3 (and the original is number
1), but on more than one occasion it was the top result and it's _never_ what
you are looking for.

------
hsmyers
While I applaud such an improvement, I still despair over Google's inability
to handle context. For instance the search phrase 'tex decorative rules' is
totally miss managed. First it overrides the search and changes tex to text.
Humorously enough, when you counter override back to tex, the search is even
worse. We won't even speak of what happens when you change tex to latex---
porno doth ever rise to the top I guess. While we may be in a new millennium,
some things remain far behind. And yes I know that things are being worked on
from the those who were academic and are now working at Google, but still...

~~~
Matt_Cutts
tex is a hard one. And [latex decorative rules] is even trickier--most people
wouldn't expect technical documentation if they just saw that as a random
query. That's more of an issue to pass on to the synonyms team and not in the
scope of this change, but I'll pass this one on to the right folks

~~~
runningdogx
Would it be feasible to introduce a search customization setting: "lower the
estimated probability that I mistyped"? I assume google uses something like
two cut-offs: A < B, where if P(mistype)>A, it suggests but doesn't search for
the correction, while if P(mistype)> B, it displays auto-corrected results.
Simply an user option to nudge P(mistype) lower would probably suffice, or an
option to increase B.

A major usability problem for me with auto-corrected results is the two lines
with links: "Showing results for <link>" and "Search instead for <link>".
Interpreting those two lines takes me out of the flow of analyzing search
results. I haven't yet been conditioned to click on the second link. I never
want to click on the first link, because google is already showing me those
results.

I frequently search for unusual acronyms, terms, variables, made-up words, and
gibberish that shows up in logs, among other things that google likes to
"correct" for my benefit. I realize those use cases are not typical, but
current auto-correction behavior can be extremely aggravating in those cases.

Auto-suggest a correction all you want, but auto-correcting and suggesting the
original is going too far I think, unless there are zero hits, or some
equivalently strong algorithmic determination is made that I couldn't possibly
want the query the way I typed it.

~~~
Matt_Cutts
Personally, I think this would be a good idea. You could learn how
tech/Google-savvy different users are and adjust the search results
accordingly. Someone who has done a site: search is probably more likely to
know their way around Google, for example.

------
spidaman
Nice to hear about this change Matt. "Fuckin' efreedom..." is an oft heard
missive around our office, godspeed in ranking them and their ilk down.

------
nikothefinn
Since Matt is responding here, I figured this is worth a shot, no harm in
asking. Matt, would love a response from you if you get a chance, since the
Webmaster Tools appeals process gives no insight whatsoever to our situation.

Following on from one of the comments here, namely the idea that "value is in
the eye of the beholder", I'd like to raise our own plight. I run a number of
aggregator sites - the largest and oldest of them being celebrifi.com, which
was a PageRank 5 until Google de-indexed us in December (along with some of
our other sites, but interestingly not all).

A little background - the purpose of the sites is to aggregate, organize, rank
and add context to what's happening in the news, with each site focusing a
specific vertical. Think Techmeme, but with more context.

I'll be the first to admit, there is no original content, but I strongly
believe that we "add value" by figuring out what exactly is going on in any
given story or blog post.

We add value to publishers, by always linking to the original source (indeed,
many publishers directly request that we add their feeds to the sources we
track), we respect copyright by only displaying a short snippet of the
original text and only displaying thumbnail images and we add value to users
by giving them easy access to a lot more content on the same topic/story, all
in the same place.

Google's Quality Guidelines clearly state that duplicate content is penalized,
and that is totally fine with us, but is it right to totally de-index a site
for duplicate content? I wouldn't even want to rank above the original source
for any given piece of content, as I respect the hard work that writers and
publishers put into creating quality content, but aggregators who add value
have a role to play in the content ecosystem. Digg, for example, uses the
"wisdom of the crowd" to aggregate and rank content - hence adds value. Topix
takes a local approach to aggregating content, and uses comments to rank
content - hence adds value. We take a verticalized approach to aggregating and
ranking content, and hence I believe that we add value.

As mentioned above, we got de-indexed in December, and despite going through
the appeals process, fixing a few things on our end to do with sitemaps, and
clearing out some older "low quality" sources that we were tracking, we
received no clarity into what our crime was.

Matt, I'd like to raise this issue with you - both as it relates to us, but
also as a general industry question - are all aggregators going to be de-
indexed? And if not, which aggregators are and which aren't? What is the
criteria, and who decides? If its algorithmic, then I am very curious to know
what on our sites triggered the de-indexing? And even more curious to know why
some of our sites got de-indexed, and some didn't.

I have great respect for Google's efforts to clean up spam and low-quality
content - and would always expect to see original content ranking higher than
aggregated content. But to completely de-index an established aggregator site
and strip it of its PageRank seems very draconian.

I would love to hear your/Google's position, and look forward to some more
clarity on both our situation, and the future of news/content aggregators.

Respectfully yours, Niko

~~~
optimusclimb
Respectfully - Sites like yours ruin the internet. You admit it yourself: you
produce no content of your own. I do NOT want to see results from pages of
that ilk when I google for things. Google made the right choice to de-index
it.

~~~
daleharvey
Google (in the context of search) produce no content of their own, are they
ruining the internet?

~~~
nostrademons
The goal from the user's perspective is to get to the content they want as
quickly as possible. A search engine helps in that, as presumably you don't
know where the content you want is if you're visiting a search engine. A
search engine that links to an aggregator site doesn't - the search engine
should just send you to the original content directly.

Presumably, aggregator sites by themselves also help in content discovery. I
find a lot of content through Hacker News. But they should do so by being good
enough to be a destination in themselves. An aggregator that needs to be found
by search engine isn't doing users any favors.

~~~
iamelgringo
I understand what you're saying, but doesn't Google News do the exact same
thing?

Google News provides snippets of content, and helps people discover the news
providing direct links to the original source of news. For Google to deindex a
site like celebrifi and while running a competing product (Google News) smells
a bit of monopolistic behavior. It's suspect it's unintentional, but Google is
going to have to walk a very very fine line as you start deindexing certain
sites.

~~~
nostrademons
When News results show up on the search result page, they link directly to the
story, they don't link to the Google News landing page or category where you'd
_then_ have to click on a link.

Google News itself falls into the second category - an aggregator that stands
on its own, as a destination. Much like Hacker News. I personally don't use
it, but the people that do go because they find it lets them discover a bunch
of content that they otherwise wouldn't know about.

~~~
iamelgringo
I know that you're saying that Google News is a dedicated site, and separate
from the search results. But, Google links to it's own aggregation service
from its results on a regular basis, and at the very top of most every results
page in Google there is a link to Google News version of the search.

Google News does stand alone as an aggregator, but you have to admit that it
is promoted heavily by Google search. If GOOG keeps doing stuff like that, I
suspect there are going to be a lot more companies that start to take umbrage,
and start challenging this behavior in court claiming that it's anti-
competitive behavior.

------
tristanperry
Sounds good, thanks for the update Matt. I must ask though: I wonder whether
you've seen the "January 26 2011 Traffic Change - Back to 'Zombie Traffic' "
discussion over at Webmaster World? A number of webmasters there (who own
websites with fully unique content) are reporting that they've seen lower
quality content sites and/or content scrapers rank above their established
sites with good content, starting from around the 26th Jan.

There's no specific queries/websites talked about there (I might be wrong but
I think Webmaster World has some rules preventing specific discussion of
websites/queries), although I thought I'd flag it up since some webmasters
have noticed adverse affects from a Jan 26th algo change; and it sounds like
this might be the cause.

Anywhoo, that being said: it's great to see Google continuing to be on the
ball and responding to the recent feedback from various blogs and other
sources (e.g. here).

~~~
Matt_Cutts
I love Webmaster World, but one frustrating aspect of webmaster forums is that
specific sites and queries are rarely given. It can be very hard to assess
what's really going on since you don't know the specific site that's being
discussed.

------
cperciva
_searchers are more likely to see the sites that wrote the original content_

This is great news, but I have to wonder: How do you figure out which site
wrote the original content?

I'm wondering based on what happened to Tarsnap last week, where
hackzq8search.appspot.com outranked tarsnap.com on a search for "tarsnap".

------
jswinghammer
Ok here is the search that first brought this issue to my attention:

mysql spatial index example lft rgt

I see the following order:

1: <http://planet.mysql.com/entry/?id=23512>

2: [http://efreedom.com/Question/1-1743894/Mysql-Optimizing-
Find...](http://efreedom.com/Question/1-1743894/Mysql-Optimizing-Finding-
Super-Node-Nested-Set-Tree)

3: [http://explainextended.com/2009/09/29/adjacency-list-vs-
nest...](http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-
mysql/)

4: [http://stackoverflow.com/questions/1743894/mysql-
optimizing-...](http://stackoverflow.com/questions/1743894/mysql-optimizing-
finding-super-node-in-nested-set-tree)

------
23david
Will this new change impact the issue where scrapers that take videos and
video descriptions from youtube and turn them into 'blog posts' show up higher
than the youtube page that contains the actual original content? I worked for
nearly a year producing a few hundred videos only to find that spam sites
(usually running adsense ads) were showing up ahead of the video on
youtube.com and my own site. Our site's domain name was even plastered all
over the description and it still didn't matter. The spam sites still showed
up way ahead of us in the search results.

------
seles
This is good news, but I feel like it is only treating a symptom not the
actual disease.

If the algorithm properly detected site relevance, importance and viewer
satisfaction, those copycat sites should never have ranked higher in the first
place. In a way this is admitting that it is impossible to stop the gaming of
search engine optimization, and that the only way to deal with it is to "win"
in some special cases.

That being said I provide no real solution, this is a huge problem with
millions behind each side.

Although this is good news it is also gloom news.

~~~
LiveTheDream
How would an algorithm detect viewer satisfaction?

~~~
seles
How long user stays on page (hint would be how soon they make next
search/click), whether or not they continue looking for something under a
similar search query, having actual buttons users can click to rate, etc.

~~~
moultano
Spam gets upvoted to the top of reddit on a regular basis. Most people are
pretty apathetic to legitimacy most of the time. All we ever have is proxies
for quality, the hope is that we have enough of them to cover the space.

------
giberson
I imagine theres not much you can be specific about when talking about
google's algorithm, but can you at least disclose if the identification of
"original" content providers is "determined" (by some automatic process) or is
it "specified" (manual intervention)?

Stack overflow is an obvious benefactor of this new change, I'm just wondering
if smaller content providers might benefit as well?

~~~
moultano
There's nothing manual about this. Everybody who authors their own content
should be helped. :)

------
bretthellman
Fantastic though I'd rather see google block sites like efreedom all together.

~~~
jjclarkson
I think that would be a dangerous precedent to set. Whose opinion should
decide what sites add value to content that may not be completely original?
I'm not saying efreedom is not "being evil", but I could foresee someone
somewhere using another's original content and making it more accessible. For
example someone could easily improve the accessibility of the content from
experts-exchange.com.

~~~
benologist
Google already makes that decision about lots of search results, usage on
their services etc.

------
kqueue
The fact that efreedom results are showing up in the results is irritating by
itself. We all know efreedom is spam, and so does Google.

Now I am always on alert when clicking on a link in the results to avoid the
spam pages like efreedom, expertexchange and whatnot. Why not just remove them
from search results?

They are causing enough bad publicity to Google.

------
joelhaus
"scraper _sites_ or _sites_ with less original content"

To clarify, this change measures unique content site-wide, not just for a
particular web page/url?

I may be too focused on semantics, but it would be important if you are trying
to maximize visibility for a single page in Google's search results and your
other pages have a significant amount of duplicate content.

For instance, one page on your site is about a WordPress plugin you've created
(it's totally unique), but on most of your other pages, you've copied and
organized relevant sections of the WordPress Codex so that your users can
easily find the documentation needed to customize the plugin. Is your unique
webpage about the plugin safe? I was always under the impression that search
rankings were determined on a page-by-page basis rather than site-wide.

P.S. Sorry if this was already asked and I missed it.

------
samd
I don't suppose you'll tell us how you know whether a site's content is
original or not.

What if you have a blog with lots of quotes from other sites; will that hurt
your rankings because Google sees "unoriginal" content?

Is there some ratio of original to unoriginal content that must be met to keep
from being flagged as a scraper?

------
cd34
[http://www.google.com/search?sourceid=chrome&ie=UTF-8...](http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=draggable+jquery+revert#hl=en&sugexp=ldymls&xhr=t&q=draggable+jquery+reset&cp=22&qe=ZHJhZ2dhYmxlIGpxdWVyeSByZXNldA&qesig=_boNwazhD7dikYEGk18Yzg&pkc=AFgZ2tkkEndGT60KCdrxZwmd61AqPYPe0cczjEDCb9-zT8cumR5bv3R-CJqxMHe3aN4KcPD7E9DzrvWLdTJCC6KUZqxNKKej3g&pf=p&sclient=psy&safe=off&source=hp&aq=f&aqi=&aql=&oq=draggable+jquery+reset&pbx=1&fp=fca90e9507624f80)

for me, the 10th result is:

www.questionhub.com/StackOverflow/3910933 - Cached

I think it is safe to say they are scraping, and, while there are some SO
answers, that particular answer does correctly interpret what my intention
was.

In fact, the SO page scraped shows up as the 12th entry.

------
endergen
Matt would you comment on the thought process of Google on curation. There are
sites that are generally strongly disliked.

Examples: Demand Media in general. Or say w3schoolS.com for
JavaScript/CSS/HTML.

I can understand that there would likely be a deluge of legal action against
Google if this was done in a heavy handed way.

Or is it a principled thing where everyone should be treated equal? If so
isn't that a lot of algorithm ideology. In practice hard rules without human
judgment leads to ridiculous edge cases or bypassing the intent of the rules.

I always assumed you curated in a way via having teams that specialized in
creating topic specific search techniques which then all get combined together
via topic detection or some other meta algorithm. This would be a good balance
between manual curation and having it be a scalable approach.

------
jjclarkson
I ran across this list just now on sites that may be using SO content:
[http://meta.stackoverflow.com/questions/24611/is-it-legal-
to...](http://meta.stackoverflow.com/questions/24611/is-it-legal-to-copy-
stack-overflow-questions-and-answers/48962#48962)

------
badwetter
Bravo!

~~~
Matt_Cutts
I'll be curious to collect programming-related queries where we're not
returning Stack Overflow or some other site that we should. Computer science
and programming queries are easy for engineers to assess and say "Ah, here's
something we need to do better on."

~~~
dminor
What's the best way to get these to you (after this article has dropped off
the front page)?

~~~
Matt_Cutts
I'll still circle back to this page for quite a while, or you can tweet them
to me. If it's long feedback, you can blog it and tweet a link to the blog
post.

~~~
cryptoz
This comment made me chuckle. I agree all this should be discussed out in the
open, but blogging and then tweeting links to the blog post....what happened
to email!? :)

~~~
Matt_Cutts
I get way too much email already, but the main reason is that email is a poor
use of my cycles for doing support. In the 10 minutes that I would use to
reply to one person, I could make a webmaster video that 1000+ people would
benefit from, or 1/6th of a blog post that could answer other peoples'
question for a year or more. Even Twitter is 1 to many instead of 1 to 1. I do
get and reply to a ton of email, but when possible I try to communicate in
ways that will help multiple people at once.

~~~
mceachen
This is the best legitimate advertisement for twitter that I've ever read.

------
MelissaLB
will there be any penalization for sites that use wikipedia content.
Specifically a site that would NOT show up in the same search results as a
wikipedia page but a commerce site that uses some unaltered wiki content in
the descriptors of the products?

I've seen many sites use this method of adding text to an image heavy site and
have watched one particular site that uses this method of adding text, drop
dramatically out of the search results since October 28th and again around Dec
28th.

Would you advise to discontinue the use of wikipedia content even though the
targeted keywords differ from those of the actual wikipedia page?

------
jmikel
Anyone else notice that even Matt's site ranks below other sites who have
copied the content after this change? Here's a search for the 3rd paragraph of
content in the original blog post:

<http://goo.gl/vVs8A>

Some sites, including this one:
[http://www.boonebank.com/brc/SBR_template.cfm?Document=headl...](http://www.boonebank.com/brc/SBR_template.cfm?Document=headlines.cfm&article=1106)

... which _links back to the original post_ (is that not supposed to
acknowladge the original source of the article?) and quotes Matt ranks higher
in my SERP results

------
ironmanjakarta
If a site ranks high in a SR, isn't it because it must have good backlinks?

Shouldn't that be more important than whether it has original content or not?

Maybe an aggregator displays the content in a more useful way than the
originator so it gets linked to more than the originator.

If the content creator doesn't like his content copied he can take it up with
the copier. It's not Googles job to get involved in that.

Google's job is to give the searcher a list of the sites that matches his
keywords in an unbiased way. They should do that mostly on what the internet
thinks is the best site, not what Google thinks is the best site.

------
dcdan
At what point does Google cross over from "ranked by algorithm" to "ranked by
algorithm selection as editorialized by Googlers and bloggers?"

Publicly discussing algorithms changes like this seems like a potential PR
problem.

~~~
Matt_Cutts
The whole history of Google is trying to find algorithms that encode our
philosophy and mental model of what we think users want. We've been discussing
algorithmic changes with people online since 2001, when GoogleGuy would show
up on webmasterworld.com to dispel misconceptions.

~~~
dcdan
This algorithm change could come across as more editorial and less empirical.
It's publicized as highly targeted ("slightly over 2% of queries"), responding
to small tech discussion (Atwood, SO, HN, quality launch meeting), and
seemingly rolled out in a short period of time. The media could easily boil
this down to "a small number people felt sites they liked were under-ranked,
so Google moved them up a week later."

I mention this because Google often talks publicly about being entirely
algorithmic, and elements of this narrative feel human.

~~~
nostrademons
2% of queries is _huge_. If you average that across a population (which of
course isn't how it's actually distributed...), a searcher could expect to run
across such a query every few days.

------
pilooch
One potential solution to spam and personalization of results lies in doing
the job on your machine, much like you can get rid of spam in your local
mailbox.

I'd recommend 'Seeks', (<http://www.seeks-project.info/>). It requires you run
it on your machine, or use a public node. Though while on your machine, it
'learns' from your navigation and re-ranks the results based on these local
data. Additionally I use regexps to remove websites I don't want to hear
about, like expertexchange.com.

------
rgrieselhuber
This is great news, thanks very much.

------
Galaxis
My current favourite spammy search (Windows process names) seems slightly
improved, but still produces mostly unusable results.

Try something like: <http://www.google.com/search?q=hidfind.exe>

Another similar search area (Windows drivers for various hardware) has
improved a lot judging from the few examples I just tried - actual hardware
manufacturers are now amongst the top results for a change.

So that's quite a step forward...

~~~
quag
Repeating a quick search I was doing a week or so ago to find drivers, I'm not
seeing an improvement. Asus actually has a great driver site, that is way
better and less hassle than the spam sites, but I've yet to see it show up in
search results. I only found it because one of the spam sites was using asus's
servers to serve up the drivers.

[http://www.google.co.nz/search?sourceid=chrome&ie=UTF-8&...](http://www.google.co.nz/search?sourceid=chrome&ie=UTF-8&q=sis+asus+motherboard+drivers)

------
forgotAgain
When I search for "rabbitmq exchange declare" (no quotes) I noticed the
following.

Mailing list entries from [http://lists.rabbitmq.com/cgi-
bin/mailman/listinfo/rabbitmq-...](http://lists.rabbitmq.com/cgi-
bin/mailman/listinfo/rabbitmq-discuss) show up from old.nabble.com before the
original source.

From a quick look there is significant improvement.

------
tmsh
One caveat, though I imagine this has been thought of before, is that mobile
versions of sites often have the same content as full-browser versions of
sites.

So ideally, perhaps m.google.com would be able to sort through this and not
penalize the duplicated-nature of the mobile version.... Anyway, something to
think about if you haven't already.

------
rodh257
So does this mean that Stack Overflow could remove the first tag from the
start of the title and they would still rank above the scrapers? I found it
really distracting and often double take on a SO result in google because I've
scanned the first word and saw a spammy looking tag first.

------
PHPAdam
I run an letting agency (estate agency) I put the property's on my low ranking
website. The content is not scraped, but re-published to several property
portals, some big names.

Who'm gets penalised? The higher ranking, more frequently indexed property
portals or my local website.

------
jacquesm
Now that's good news. Thanks!

------
JWilder
matt, kudos! this is great news.

a question that I always pondered, what is googles approach on more clever
forms of rehashed content that involve photoshopping / cropping an original
image, is this something that google looks at?

Example source photo:
([http://shoes.n-sb.org/img/thumbs/472153d5fc674fa1c685f3c7814...](http://shoes.n-sb.org/img/thumbs/472153d5fc674fa1c685f3c7814ecee9.jpg))

rehashed image: ([http://images.sneakernews.com/wp-
content/uploads/2011/01/nik...](http://images.sneakernews.com/wp-
content/uploads/2011/01/nike-sb-blazer-low-end-theory.jpg))

------
EGreg
I got 99 comments but of which this ain't one :)

------
alexsherrick
thanks matt this is awesome!

------
eurohacker
may be here is a good place to ask - may be someone can answer - what does the
term "original content" actually mean , original in what sense ?

if you have a blog about wines, then what do you need to write - in order to
be "original" blog about wines - does it mean mostly:

1) express original opinions about wines, 2) original structure of sentence
and original wording - but the same opinions that 10 other sites write about,
3) original brands of wine you talk about - original names mentioned, 4)
original content in the sense that you have not copied the text 100% from
someone else - or what 5) combination of the factors mentioned above 6)
something else that makes your content original ..

~~~
socialmediaking
In order to be considered "original" from the search engine perspective, it
needs to be around 30% new content. There are websites like
<http://www.copyscape.com/> that can tell you how original something appears.

Internet marketers use software to "spin" text and make it appear unique.
Sometimes the text is annoying for humans to read, but search engines eat it
up. You can take one article and turn it into 50 with the click of a couple of
buttons.

------
bkaid
While you are at it, can you go ahead and de-index this site:
<http://www.google.com/search?q=site:livestrong.com>

------
klbarry
Hi Matt, I'm sure your busy and don't want to answer all SEO related questions
anyway. I wanted to ask, though, what Google things about the importance of
exact anchor text in rankings as remarked upon in
[http://www.seomoz.org/blog/how-organized-crime-is-taking-
con...](http://www.seomoz.org/blog/how-organized-crime-is-taking-control-of-
googles-search-results)?

------
avstraliitski
Very happy to see the end of copied Wikipedia sites.

------
hananc
Please don't take my word for it.

Hit <http://duckduckgo.com> with your programming queries.

No - I am not affiliated and am not getting paid to write this.

~~~
iPadDeveloper
Then why the advertisement? This is about Google's search result algorithms.

~~~
WilliamC
Because he couldn't find a better place to spam with traffic.

------
angelbit
Hi Matt. Why Google don't test the results after changes on algorithm? A
simple program that store and compare n SERP results before and after the
changes.

~~~
Matt_Cutts
We do.

~~~
fzk390
I still see sites with low quality duplicate content but highly stuffed
keywords. Here's one such example
[http://www.google.com/search?hl=en&q=internet+phone+serv...](http://www.google.com/search?hl=en&q=internet+phone+service)
Check the listing for www.internetphoneguide.org Also for
[http://www.google.com/search?hl=en&q=voip+phone+service](http://www.google.com/search?hl=en&q=voip+phone+service)
see the listing for www.zimbio.com which is not an authority on the topic.
Last but not the least
[http://www.google.com/search?hl=en&q=home+phone+service](http://www.google.com/search?hl=en&q=home+phone+service)
The site www.freelifelinephone.com jumped to Page 1 over night with the latest
update.

