
Google search and search engine spam - jsm386
http://googleblog.blogspot.com/2011/01/google-search-and-search-engine-spam.html
======
mmaunder
"we’re evaluating multiple changes that should help drive spam levels even
lower, including one change that primarily affects sites that copy others’
content and sites with low levels of original content."

"As “pure webspam” has decreased over time, attention has shifted instead to
“content farms,” which are sites with shallow or low-quality content. In 2010,
we launched two major algorithmic changes focused on low-quality sites. "

Looks like 2011 is the year that Google kills the scrapers. Look for an uptick
in the sensitivity of the duplicate content penalty.

~~~
ryanwaggoner
I don't think they're talking about duplicate content so much as stuff like
Demand Media (ehow), Associated Content, Hubspot, ezinearticles, etc, etc.

~~~
seunpy
What's wrong with eHow?

~~~
bmastenbrook
This is HN, so I'll give an example related to my startup. We make a wireless
(802.11g) flash drive called the AirStash, which works like an ordinary USB SD
card reader on a PC and uses HTML5 for the interface on wireless devices.
Here's an example of spam content from eHow that talks about our product:

[http://www.ehow.com/how_6861903_install-wireless-flash-
drive...](http://www.ehow.com/how_6861903_install-wireless-flash-drive.html)

"Wireless flash drives communicate with wireless devices using wireless
protocols."

You don't say?

"Advanced wireless flash drives stream data to more than one device at a
time."

Well, we're the only one out there, so I guess they're all advanced!

"Insert the USB portion of your wireless flash drive into an available USB
port on your computer. Your computer should automatically recognize the
device. If not, click the "Start" button and then click "My Computer." Double
click the flash drive in the removable media section. This opens the drive and
displays the files. Double click the executable file (start.exe, for example)
to start the installation process."

Uh, no. No software installation is ever required, a point we make abundantly
clear on our web site.

Basically, it's all wrong.

~~~
cosgroveb
I just googled the AirStash and it looks awesome!

------
eps
Metrics-shmetrics. Once I stop seeing StackOverflow clones listed above
StackOverflow's original pages I will gladly believe that Google's search
quality is "better than ever before."

~~~
Matt_Cutts
I've been tracking how often this happens over the last month. It's gotten
much, much better, and one additional algorithmic change coming soon should
help even more.

I'm not saying that a clone will never be listed above SO, but it definitely
happens less often compared to a several weeks ago.

~~~
yellowbkpk
My experience is the exact opposite: I am seeing many, many more clone sites
in my search results in the last few months. It feels like it increases when I
accidentally click a clone site.

This happens for more than StackOverflow clones. Mailing lists, Linux man-
pages, FAQs, published Linux articles, etc. all have clone pages that are
obvious link farms (sometimes they even include ads that attempt to harm my
computer) that rank higher than the "official" (or at least less-noisey)
pages.

Ideally, I'd like to completely remove domains from result as has been
discussed elsewhere on HN. Hopefully this upcoming push for social networking
that Google has will reintroduce a better-implemented "SearchWiki" feature...

~~~
lincolnq
Try DuckDuckGo. Gabriel has been doing an aggressive job about removing
unsavory domains and I've been fairly impressed. I think that Google probably
can't be nearly as aggressive for political reasons.

~~~
Sephr
The reason Google isn't doing the same thing as DuckDuckGo is most likely
because manually banning a domain instead of improving their algorithms to
avoid unwanted behaviors will only temporarily work, and only in select cases.
There will always be new spam and content farm sites.

~~~
hessenwolf
There seem to be a small number of large content farms (perhaps suggesting
economies of scale are pretty important). In this case, manually killing them
will work well for Herr. Weinberg.

~~~
greglindahl
Over at blekko, we leave in a few large but marginal sites like eHow, and let
users kill them with their personal spam slashtags. For smaller spam websites,
we can frequently use Adsense IDs to kill them in groups.

------
abrahamsen
Google have two strong incentives to weed out AdSense drivel sites in the
search results.

1\. They diminish the value of Google Search as an advertising platform. And
Google Search is likely the most valuable virtual estate on the net. I more
often click on ads in Google Search than I click on ads on all other sites
combined. This is because when I'm on Google Search I'm actually searching for
something, so I might click on a relevant ad.

2\. They diminish the value of AdWords content network ads. People pay Google
to display their ads because they believe they get better return for their
money there than on the alternatives (Yahoo and Microsoft). Ads on low quality
sites are unlikely to be competitive, so these sites decrease the relative
value of AdWords.

That is, high-ranked low quality sites with AdSense are a double threat to the
main source of income for Google, and I expect Google to make it their main
priority.

Why, then, aren't they more successful? My guess: Because the problem is a lot
hard than any armchair designer would believe. Problems tend to be a lot
simpler when you are not the one who must solve them.

~~~
rfergie
I agree with you on point 1; if Google's search quality is not the best than
people will (eventually) go elsewhere.

I disagree on point 2. Users on low quality AdSense sites almost certainly
arrived there from a search engine so if I can display my adverts on a users
landing page it will be almost as good as if they arrived on my site straight
from Google.

~~~
storborg
In fact, AdSense ads on sites with shitty content are even more likely to be
effective, because users won't find what they're looking for in the page
content.

~~~
zaidf
Bingo. When our indie music site's stream server would go down, our adsense
ctrs would skyrocket. The people who come from google go apeshit and start
clicking anything--especially adsense units--when the content they were
expecting isn't found or isn't functional.

------
dpapathanasiou
And yet Mahalo is still tolerated, somehow.

E.g., this query -- "travel agent vermont" (which I got from this post
complaining about Mahalo spamming the web and Google not enforcing its own qc
standards [http://smackdown.blogsblogsblogs.com/2010/03/08/mahalo-
com-m...](http://smackdown.blogsblogsblogs.com/2010/03/08/mahalo-com-meet-the-
new-spam-worse-than-the-old-spam/)) _still_ returns a Mahalo result in the top
10.

~~~
Matt_Cutts
Google has taken action on Mahalo before and has removed plenty of pages from
Mahalo that violated our guidelines in the past. Just because we tend not to
discuss specific companies doesn't mean that we've given them any sort of free
pass.

~~~
dpifke
On a similar note, how is the expert sex change site still in your index? They
very clearly are serving different content to the crawler (as evidenced by the
"cached" link) than they are to people who click through on the SERPs. I
though this was a big no-no?

For an example (which was submitted as search feedback a month ago), try
searching for "XMPP load balancing" and look at the third organic link.

(Edit: actually, in that case it appears they're using JavaScript to hide the
indexed content. Same effect, however: the cache link shows the "solution" but
clicking the search result displays an ad.)

~~~
fname
While I'm not a fan of that site either, it's not true -- scroll to the bottom
of the page. Sneaky? Absolutely, but the content and solution is there.

~~~
roc
It's _no longer_ true.

Short version is: they used to, and got busted for, serving answers to the
spiders and ads and pitches to the surfers. So _now_ they show the answer at
the bottom of a pile of ads and pitches.

But they still suck. Horribly. And are the number one example I hear when
people say "I wish Google would let me blacklist domains".

~~~
dhruvbird
I don't believe no one has created a FF plugin for expert sex change (yet)!!

Edit: Even a GM script to remove all the leading spammy divs would do...

~~~
lamnk
FYI:

On Chrome you have Search Engine Blacklist
[https://chrome.google.com/extensions/detail/jiicbcimbjppjbck...](https://chrome.google.com/extensions/detail/jiicbcimbjppjbckmoknagndlhjbeohb)

On Firefox you can use the filter option of Optimize Google
[https://addons.mozilla.org/en-
US/firefox/addon/optimizegoogl...](https://addons.mozilla.org/en-
US/firefox/addon/optimizegoogle/)

------
noibl
_One misconception that we’ve seen in the last few weeks is the idea that
Google doesn’t take as strong action on spammy content in our index if those
sites are serving Google ads._

That's not quite what I've been reading. I believe the more common claim is
that Google has a disincentive to algorithmically weed out the kind of drivel
that exists for no other reason than to make its publisher money via AdSense.
It's about aggregate effects, not failure to clamp down on individual sites.
Or, put another way, it's not _if_ certain sites are serving Google ads, it's
_because_ that kind of content is usually associated with AdSense.

AdSense is definitely a problem for search quality. It creates the same
imperative for the content farm as Google Search has: get the user to click
off the page as soon as possible. And the easiest way to do that is to create
high-ranking but unsatisfying content with lots of ad links mixed in.

~~~
trevelyan
I agree. Also interesting to see that Google defines webspam as "pages that
cheat" or "violate search engine quality guidelines." By this definition,
scraper sites are not spam at all. Nor are the spammy sites in my field which
super-optimize for keywords in ways that make it difficult for legitimate
content to rise to visibility.

If Google did not operate AdSense, it seems hard to believe the company would
not have penalized this sort of behavior ages ago. A love for AdSense is
probably the single largest thing spam sites have in common worldwide.

~~~
Matt_Cutts
"By this definition, scraper sites are not spam at all."

Disagree. Our quality guidelines at
[http://www.google.com/support/webmasters/bin/answer.py?hl=en...](http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769)
say "Don't create multiple pages, subdomains, or domains with substantially
duplicate content." Duplicate content can be content copied within the site
itself or copied from other sites.

Stack Overflow is a bit of a weird case, by the way, because their content
license allowed anyone to copy their content. If they didn't have that
license, we could consider the clones of SO to be scraper sites that clearly
violate our guidelines.

------
jonpaul
You know, sometimes it really makes me mad that it's difficult to get into
contact with a person at Google for support of their products. But, I've got
to hand it to them. They could have issued a non-personal public statement
like most companies and signed it with "Google Search Team" But instead there
is a personal touch. It's public statements like these that add just a bit of
personal touch that makes people love them. My 2 cents. Entrepreneurs take
note.

~~~
jsherry
The reason you can't in contact with people at Google is because you're not
the customer. Paying businesses are the customers. Not just trying to sound
snarky - it's true.

~~~
Matt_Cutts
No, the reason why it's difficult to contact people at Google is that 1
billion+ users visit us each week, and we only have ~20,000 Google employees.
Even if every single employee did nothing but user support 24/7, each Google
employee would need to do tech support for 50,000 users apiece.

Likewise, there are 200,000,000+ domain names. Even if every single employee
did nothing but webmaster support 24/7, each Google employee would need to do
tech support for 10,000 domains apiece. The same argument goes for supporting
hundreds of thousands of advertisers.

The problem of user, customer, and advertiser support at web-wide levels is
very hard. That's why we've looked for scalable solutions like blogging and
videos. I've made 300+ videos that have gotten 3.5M views:
<http://www.youtube.com/user/GoogleWebmasterHelp> for example. There's no way
I could talk to that many webmasters personally.

So we haven't found a way to do 1:1 conversation for everyone that has a
question about Google. That's not even raising the back-and-forth that some
people want to have with Google. See
[http://www.google.com/support/forum/p/Webmasters/thread?tid=...](http://www.google.com/support/forum/p/Webmasters/thread?tid=5d498a633ec07950&hl=en)
and
[http://www.google.com/support/forum/p/Webmasters/thread?fid=...](http://www.google.com/support/forum/p/Webmasters/thread?fid=21e50ed1333526fc00049a55576e8089&hl=en)
to get a glimpse at the sort of prolonged conversations that people want to
have with Google. In short: it's a hard problem.

~~~
antirez
Of course not everybody should be able to contact Google 1:1, but at least all
the people that were subject to an action that required human intervention
from Google.

Example: I get my adsense account or site banner in a non automatic way since
there is some problem with the content: so not into an automated way, but
because somebody looked at my site.

I should, in that case, have a chance to communicate with Google. This is
inherently scalable as everything started with a 1:1 action.

~~~
jlees
You do, though. I recently had my AdWords account suspended and got to talk to
a human at Google to handle the issue (through normal channels). For Chrome
OS, we have an actual call center with real people sitting in it.

~~~
antirez
That's very good, this way it is balanced as actually nobody with a minimal
business idea can expect google to reply 1:1 to normal users that happen to
don't find what they want in the search engine or like.

------
dustingetz
Google has taken a lot of criticism on HN and elsewhere for an apparent
perverse incentive, to direct searchers to content farms with adwords, instead
of the original source (like StackOverflow or Amazon reviews).

I'm skeptical, because spammy-ad clickthrough rates are already low and
trending lower, and I speculate google has great incentive to send people
where they want to go lest their competitors get stronger.

~~~
Matt_Cutts
Google also has a decade+ of track record of choosing the right long-term
thing for users instead of short-term revenue: 1) not running annoying punch-
the-monkey banner ads in the early days when everyone else was doing it. 2)
not running pop-up ads when everyone else was doing it. 3) little-known fact:
if the Google ads aren't ready by the time your search results are ready, you
just don't see ads for that query. We don't delay your search results in order
to show you ads.

It just wouldn't make sense for Google to suddenly abandon that (very
successful) strategy and say "let's keep spammy/low-quality sites around and
send users there because we make money off the ads." We make more money long-
term when our users get great results. That makes them more happy and thus
more loyal.

~~~
ryanwaggoner
Regarding #3, why are these two things coupled in terms of page load? With all
the js stuff you guys are doing now with instant, it seems like you could load
up the ads on the right a split second after the search results and no one
would notice or care...but I'm guessing you've tried this and it didn't test
as well :)

~~~
Matt_Cutts
Great question. The "don't wait for ads" policy has been around since ~2001,
way before AJAX became common. In theory you could make it so that the ads
loaded when they were ready, but that could also generate a visual "pop" that
I imagine would annoy many users.

My preference is just to enforce a hard time deadline. If the ads team starts
to miss that deadline and revenue decreases, then they're highly motivated to
speed their system up. :)

------
snewman
To me, the most interesting aspect of this situation is the conflict between
Google's view and the blogosphere's view. On the one hand, "...according to
the evaluation metrics that we’ve refined over more than a decade, Google’s
search quality is better than it has ever been...". On the other hand, you
can't open an RSS reader today without tripping over someone griping about
content farms polluting the search results. There are intelligent, thoughtful
people on both sides of the debate. Why such disparate viewpoints?

As Matt's post suggests, it could simply be that people's expectations are
rising -- search results are getting so good in general (which they are) that
we notice the problems more. Or it could be that Google is focused on a narrow
definition of "spam" that doesn't cover content farms. It could even be that
both sides are "right" -- that overall search quality is rising even as the
content farm problem worsens, if Google has been successfully reducing other
causes of low search quality.

I'd love to see some hard analysis of this. For instance, pick some a
reasonably large set of sample queries, and show what the results looked like
five years ago, and what they look like today. Of course, you'd first have to
find a set of sample queries and results from five years ago.

~~~
Matt_Cutts
We do have some data on this. I'll ask about whether do some comparisons of
"Google today" vs. "Google three years ago."

------
bmastenbrook
Beyond the flood of SEO spam and Demand Media-style content mills, there's
another search quality problem I have with Google: torrent sites. I will
frequently search on the exact name of a song or album on Google in order to
find out more information about that song or album, but lately most of the
results have been links to torrents, including results on the first page. This
applies even if I add "review" to the search query. I will even see links to
torrents ranking above links to iTunes.

These songs and albums are not available legitimately through torrents. What
value is there in providing links to pirated content? I understand that Google
is not under any legal obligation to remove these results, but as a non-pirate
these results are significantly lowering my perception of the quality of
Google's search results.

~~~
klbarry
Google tries to make no judgments about the sites that go the top of their
algorithms.

~~~
bmastenbrook
Isn't this the whole problem? 90% of the web is crap. If Google can't deliver
the non-crap, I'll be looking elsewhere.

------
theoretical
I'd be interested in a further explanation of "Google absolutely takes action
on sites that violate our quality guidelines [...]".

Does that mean that Google manually decrease rankings of spammy sites that
their algorithms haven't caught? Does this entail decreasing the rank of the
entire domain, the IP? Does blacklisting ever happen?

I ask since Google have previously[1] said they don't wish to manually
interfere with search results.

[1] "The second reason we have a principle against manually adjusting our
results is that often a broken query is just a symptom of a potential
improvement to be made to our ranking algorithm" -
[http://googleblog.blogspot.com/2008/07/introduction-to-
googl...](http://googleblog.blogspot.com/2008/07/introduction-to-google-
ranking.html)

~~~
Matt_Cutts
"Does that mean that Google manually decrease rankings of spammy sites that
their algorithms haven't caught?"

Although our first instinct is to look for an algorithmic solution, yes, we
can. In the blog post you mentioned, it says

"I should add, however, that there are clear written policies for websites
recommended by Google, and we do take action on sites that are in violation of
our policies or for a small number of other reasons (e.g. legal requirements,
child porn, viruses/malware, etc)."

As the quote mentions, we do reserve the right to take action on sites that
violate our quality guidelines. The guidelines are here, by the way:
[http://www.google.com/support/webmasters/bin/answer.py?hl=en...](http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769)

------
powrtoch
Am I the only one who was really hoping for some specifics about what they're
doing and plan to do about content farm rankings? Without that, the article is
virtually devoid of content other than "we're really not so bad!"

Edit: By specifics, I don't necessarily mean implementation details, just
anything more informative and plan-of-action than acknowledging the problem.

~~~
Matt_Cutts
Our policy in search-quality is not to pre-announce things, but we did give
some pretty strong hints about planned improvements to search quality in that
post (e.g. talking about scraper sites). I'll be happy to talk more about them
soon when they launch.

~~~
jeremycolins
Hey Matt, is there anything y'all can do about the content farm sites where
someone buys an old high pr domain and sells 100's of links on it and drops
them in between tons of content? Here's a prefect example of that:
<http://www.dcphpconference.com/>.

~~~
Matt_Cutts
That is clearly webspam. We only indexed five pages from that site, but
there's no reason for that site to show up at all. Thanks for mentioning it.
It won't be in our index for much longer.

~~~
jeremycolins
Hey Matt, thanks for the response. I actually have a huge list of these type
of sites that I've submitted through the spam report and 90% of them are still
indexed and have pagerank. Is there another way I can send them in?

------
ericb
While I applaud the direct personal response, I feel like the content says "we
don't see a problem." If users see a problem and you don't, smaller
competitors can eat your lunch. I'm kind of hoping for some competition in the
field.

In terms of adsense, if you really think about it, adsense content on a page
should probably be a slightly negative ranking signal (not just not a positive
signal). The very best quality pages have no ads. Think of government pages,
nonprofits, .edu, quality personal blogs, etc. If no one is making money off a
page (no ads) then whatever purpose it has, it is likely to be non-spammy.

~~~
moultano
_I feel like the content says "we don't see a problem."_

We see a virtually unbounded number of problems with our search results, and
we're working constantly to fix them. Most of the people I talk to who work on
search have the attitude that Google is horribly broken all the time, it's
just also measurably the best thing available.

Google as a company, and search quality in particular, does not rest on its
laurels. The people who hate Google's search results the most all work at
Google. If you think you hate Google's search results as much as we do, you
should come work for us. :)

------
jpalomaki
I believe it is very hard to implement algorithms that can make a difference
between stackoverflow.com and a rip-off, or a legitimate Apache mailing list
archive and a rip-off.

Why not allow the community to sort this out. "Google Custom Search" already
exist. Google could extend that to the direction where people could customize
the Google search to exclude certain sites from the results (right now it is
only possible to specify a list of sites to include in the search).

Blacklists for at least specific "fields of searching" would emerge very
quickly. People could select what blacklists to use, if any.

------
mmaunder
As a pointer: Matt_Cutts is the head of the webspam team at Google. He's been
very active on this thread. Please search below for his posts.

------
eitland
There are a few signals that should be possible to pick up. Examples:

\- When power searchers start adding -somedomain.xyz to their searches

\- Increase spam reporting by adding some kind of feedback to the spam
reporting feature. I think I'd love to get an automated mail saying something
like: "The site somespamdommain.xyz that you and others reported x days ago is
now handled by our improved algorithms". Submitting spam reports really
doesn't feel useful when it seems like nothing ever happens.

\- Adding weight to spam reports. You know a lot about us, and I guess you can
filter out who are power searchers. This could help stop people from gaming
the system into blocking competitors.

------
jmount
Google AdSense for Domains ( <http://www.google.com/domainpark/>) really makes
a lie of not wanting useless content. The designed a revenue source for
parkers/squatters.

~~~
drm237
How often have you searched Google and been sent to one of those pages?

They don't want useless content in their serps. If someone goes directly to
the domain, that's in no way related to search quality.

~~~
jmount
It is very related. Google made span sites and splogs profitable (through
their various add networks). Therefore they are the major funder of spam
sites- even if they themselves do not promote them.

------
jonknee
The timing of this is interesting... About a week before Demand Media's IPO.
Must be a bad day for the investment bankers.

------
bitskits
I think there might be something else at work here: our rising expectations of
how search engines should work.

In years past, Google's results were measurably less relevant than they are
today. In the time between "then" and "now", we're grown more accustomed to
high quality, fast, relevant results. I think this makes it seem like small
problems in search are bigger than they are.

It would be great if there was a "Google of 2004" to test this side by side,
but I don't think that is possible. :)

------
mwilton13
@Matt I like that you pinpointed some specific here, but is the algorithm
going to be strong enough to easily pick up things like this:
<http://posterous.com/people/YrCushFlSet> This single account is feeding 100
different websites alone that all feed hundreds of others. This is helping
fuel numerous sites in the plastic surgery niche and it's quite disturbing.

~~~
Matt_Cutts
mwilton13, I passed this on to my team when you mentioned it to me on Twitter
recently. Have you mentioned the site to Posterous too?

~~~
mwilton13
Thank you Matt. I am following up with Posterous and the other sites involved
on Monday. We reported one of the content farms on blogger last year too.

------
gojomo
Please, exile half or more of Demand Media's pages from the index _before_
their imminent IPO!

------
tmsh
I'm just impressed with how this was handled. Consider how much more
technocratic this is than a news release from ten years ago.

Google issues an announcement via blog post. TC and others start to pick it
up. And the original author of the blog post takes questions and provides
technical answers, where allowed, in HN.

------
vannevar
Google cannot escape their fundamental conflict of interest: they make money
by selling web traffic to advertisers, then buying that traffic back at
discounted rates and re-selling it again, over and over. None of their revenue
comes directly from search, though search is their primary source of raw
traffic. Their search results don't have to be good, they just have to be good
enough to sustain traffic. And right now there are so many people who
reflexively use Google out of habit that their results could deteriorate
substantially (and many would argue already have) before it impacts their
shell-game revenue stream.

------
nowarninglabel
Curious though what the metrics they use to evaluate effectiveness against
spam are. It could have just as much (or as little) spam indexed as it had 5
years ago, and in some comparisons that would be valid, but what if much of
the spam had moved from being evenly distributed throughout results to being
distributed in the top positions? Then, one could say spam was even lower than
ever in total quantity, but it would be even worse in terms of user
experience.

That said, I agree with Google, users' expectations have skyrocketed, and it
is tough to keep pace with them.

~~~
Matt_Cutts
I actually feel quite comfortable with our metrics. Back in 2003 or so, we had
pretty primitive measures of webspam levels. But the case that you're
wondering about (more spam, but in different positions) wouldn't slip past the
current metrics.

~~~
nolok
How do you interpret the backlash from the users recently ? In your eyes, have
we become more used to "perfect" results, or are the fewer bad results left
more insidious and thus more harmful (despite the overall level of quality
being higher) ?

Personally I tend to find what I'm looking for by adding a few more words, but
in the case of reviews and tech stuff it doesn't always work and I often have
to rewrite my query one or more times to get something valuable.

~~~
CWuestefeld
_in the case of reviews and tech stuff it doesn't always work_

This is one of my pet peeves. If I search for "product X review", most of the
result I get back are of the form "be the first to review product X", which is
absolutely not what I want.

------
mssfldt
Spam is not only in the organic search but also in the image-search. I
observed a site that steals 140.000 (!) images by hotlinking (also some of my
pictures). First the pages itself seems to be "clean". They only set hotlinks
to blended search images, and they got a lot of traffic, that sure. Then they
switched the site: on the top there are two porn-ads (it was xslt*us) I wrote
a spam report and posted it an webmasterforum. But it took about 10 days until
the site was removed. Hope this gets better... And: hotlinking is a great
problem.

------
zone411
First, I haven't noticed any significant increase in the frequency of spam
sites appearing the Google search results. The biggest problem that I did
notice is social bookmarking sites, like reddit and digg, outranking original
content. They often have nothing more than a copied-and-pasted paragraph,
sometimes supplemented by low quality comments (as is common with these types
of sites). Since this site is very similar, not many people will be concerned
about this issue.

Second biggest issue is poor Wikipedia articles appearing in the top results
for almost any reference type query. Many less frequently updated Wikipedia
articles are nothing but regurgitated content lifted from other quality
sources. What makes it worse is that Wikipedia is using no-follow for their
links, so even if these sites are linked in the reference section, they won't
get any credit. It's interesting to see so many people complain about low
quality content on commercial sites, but they never mention Wikipedia, which
is a much bigger offender (I guess this might be because Wikipedia gets its
content for free and doesn't have ads and other sites pay for the content and
do have ads).

Third, I hope Google doesn't make any changes without checking very carefully
that good sites will not be negatively affected. For example, newspapers will
often have the exact same articles from the AP, but also original content
based on their own reporting. Punishing them for having duplicate content
would not work well. There are many similar possible pitfalls.

------
beefman
"The short answer is that according to the evaluation metrics that we’ve
refined over more than a decade, Google’s search quality is better than it has
ever been in terms of relevance, freshness and comprehensiveness."

The long answer is that without Wikipedia results, Google's search quality
would be at an all-time low in terms of relevance, freshness and
comprehensiveness.

------
kellysutton
When a company needs to write a blog post in this tone, they are definitely
losing ground.

What you are saying != how you are performing.

------
darksaga
This is just lip service - Google's quality has dropped off and its really
obvious. Recently, I've been regularly comparing the results I get from Google
and the ones I get from Bing. Needless to say, Bing's far more relevant. The
biggest point people have made are on the money searches like "MP3 Player" but
the results I've been comparing have been local searches and programming
things like: "show/hide text boxes in Javascript." In Google all I get is
links to Amazon and other random results to link farms like
javascriptworld.com. In Bing, I get links to forums and tutorials which is
what I'm looking for.

Time and time again Google has failed. I've already moved on to Bing and
Duckduckgo and I would recommend you do too. Unless you like digging through
hordes of useless SERPS.

~~~
dori
Consider that possibly the problem isn't Google; perhaps it's you.

For instance, <http://www.javascriptworld.com> (which I run) is NOT a link
farm. Rather, it's there to support the book "JavaScript & Ajax for the Web:
Visual QuickStart Guide, 7th edition" (by Tom Negrino and myself).

Numerous colleges & universities courses use our book as a required textbook.
Consequently, many teachers & professors link to it from course websites to
let their students know where to download the book's code examples. Yes, that
probably helps the site's Google rank, but that was never one of our goals.

As a result, when you search for certain JavaScript examples, you may run
across my site. And in your particular case, my site doesn't help because you
don't own our book. But just because it doesn't solve your particular problem
doesn't mean that the site isn't useful for other people.

I'm not sure why you thought it was a link farm. Perhaps you should have taken
a closer look at the site before using it as a bad example?

------
paul9290
A recent Google search for the Walmart being built 2 miles away from me led me
nowhere. Google listed 2 pages of job sites listing open positions at this
wal-mart. I almost gave up my search, but decided to search twitter and in
doing so I found what I was looking for!

~~~
Matt_Cutts
I'd be curious to know where the Walmart is and what searches you tried.

~~~
paul9290
Jan. 15th I was curious if the walmart was going to be a supercenter so I
searched, "fallston md walmart," & "will fallston md walmart be a
supercenter." As you can see either result does not show the most relevant and
recent info about my query, rather it's littered by prominent job sites.
Though on a search for "will fallston md walmart be a supercenter," i see the
six result to be [http://www.belairnewsandviews.com/2011/01/job-ads-out-
this-w...](http://www.belairnewsandviews.com/2011/01/job-ads-out-this-week-
for-fallston-walmart-indicate-it-will-be-a-supercenter-with-groceries.html),
yet that information to me was weak;. it deduces from all the job listings
that yes this will be a Super Walmart. Since I found the results/info lacking
I immediately went to Twitter and found an even more detailed
article([http://belair.patch.com/articles/fallston-walmart-may-
open-b...](http://belair.patch.com/articles/fallston-walmart-may-open-by-mid-
february#photo-4446601)) that had pictures that I think should have appeared
in one of my queries. As you see that article was published Jan. 14th and
Google did not point me to the most recent/up to date info rather it favored
prominent job sites' listings, yet Twitter did.

Pardon, if this seems nitpicky, but just wanted to share one of my recent
experiences where Google failed me(i have another example too but search was
personal), while Twitter did not.

~~~
Matt_Cutts
This is interesting. Part of the problem is answering questions when there's
not much content on the web. I think a lot of the job sites showed up because
Walmart is hiring for that location in preparation for opening.

I'm just now testing this query, but when I did [fallston md walmart], there
is a really comprehensive article at
[http://www.exploreharford.com/news/3074/work-start-
fallston-...](http://www.exploreharford.com/news/3074/work-start-fallston-
walmart/) that shows up at #5. That article mentions that "The new road will
lead into a 147,465-square-foot Walmart, which has been planned with a
possible 57,628-square-foot expansion" Then I did the search [walmart
supercenter size square feet]. The #1 and #2 results are both pretty good,
e.g. the page at <http://walmartstores.com/aboutus/7606.aspx> says that an
average store is 108K square feet, and supercenters average 185K square feet.

As a human, I can deduce that 147K square feet (with 57K square feet of
potential expansion) implies that it will probably be a supercenter, but it's
practically impossible for a search engine to figure out this sort of multi-
step computation right now. My guess is that in a couple months when the store
opens up, this information will be more findable on the web--for example,
somewhere on Walmart's web site. But for now, this is a really hard search
because there's just not much info on the web.

I appreciate you sharing this search. I think it's a pretty fair illustration
of how peoples' expectations of Google have grown exponentially over the
years. :)

------
Benvie
For the love of god just give us the tools for effective persona blacklists.
With Google's constant changes to the search site and the difficulty in
efficiently and effectively monitoring live search results via browser
extensions, it's been at best hit or miss. Whether that comes in the form of
some API that can be tapped to make a good extension or having it built into
the browser, I don't care.

Google of all companies I would have thought would understand and respect the
important of giving people the power over their own technology experiences.

------
spiffworks
I can't help but think that this is almost an exact parallel of the iPhone
antenna problem. Both companies had minor to medium problems in their flagship
products, both problems were vastly overblown by the media, both problems
spurred an unbelievable spate of batshit conspiracy theories, and to take the
cake both companies responded with the same "This is not a problem, but here's
the solution." Good problems to have.

------
antirez
What sounds odd of all this is that I think the spam sites and "content farms"
are generating a lot of ad clicks for Google. Will they really take
appropriate actions against this sites if this will mean a significant cut on
the earnings?

I played with adsense a lot in the past, and if you did too you should now how
spam sites generate a lot more clicks than sites where the user is actually
focused on reading content...

------
tomotomo
Why don't we collaboratively blacklist or push down domains from our Google
results? This could be a stopgap measure until Google incorporates such a
feature including the collaborative database or magically puts an end to spam.

A proposal: [http://www.saigonist.com/content/google-spam-content-farm-
fi...](http://www.saigonist.com/content/google-spam-content-farm-filter)

------
jacoblyles
In other industries, the government regulates that there must be a "Chinese
Wall" (no communication or shared personel) between segments of a company that
have conflicts of interest. In this case, search and advertising qualify. I
expect to see proposals for this kind of regulation in the next 10 years as
the FCC begins regulating the internet.

~~~
Klinky
Good luck, see the Comcast & NBC merger. The FCC is impotent.

------
PaulHoule
This reads just like the nutrition label on Nestle products: the ones that
boast that the product has "3 vitamins AND minerals", and, on a distant part
of the package, lists two minerals and one vitamin and what the health
benefits of them are.

They'd be a lot more credible without the corpo-speak junk in the first
paragraph.

~~~
Matt_Cutts
I've got the results of a test run of search results I did in October 2000
sitting on my computer. The Google search results from today are much better
than from October 2000. I do think it's fair to make the point that overall
Google is much better--and overall spam is much lower than 5 or 10 years ago--
before we drilled down into responding to the blog posts from the last couple
months.

~~~
PaulHoule
Matt,

    
    
         I'm sure that Google does better than it did in October 2000.  They'd be something horribly wrong if that wasn't the case.  That's not relevant to the specific concerns people have about what's happening here and now.

~~~
Matt_Cutts
Sorry, I was responding to the "They'd be a lot more credible without the
corpo-speak junk in the first paragraph." by trying to defend that the
statements in the first paragraph were true.

Regarding the specific concerns that people have been raising in the last
couple months, I think the blog post tried to acknowledge a recent uptick in
spam and then describe some ways we plan to tackle the issue.

------
mbesto
What are people's thoughts on companies who do create content farms? From the
perspective as being a successful company rather than "I hate the spam and I
hope they all DIAF".

Personally I think any type of "scheming" in technology will eventually get
caught and then all of a sudden there goes your business model.

~~~
tristanperry
It depends on the quality of the content, in my opinion. Ultimately magazines
are 'content farms'. Various game review sites are 'content farms', since
they're both designed to 'churn out' content and articles. (To give a couple
of quick - albeit silly - examples). The difference is that the content in
these two cases is high quality, usually containing images and possibly
videos, and with lots of unique ideas and opinions given.

So I guess it depends. If a 'content farm' produces good, helpful content,
then that's great and should be encouraged (even if it is done on a massive
scale). But when it comes to a case like WiseGeek where content is spat out en
masse, even if the content is crappy and really short, then it becomes a
problem.

------
rhwd2003
@Matt_Cutts starting around page 4 results for the term [loans] what is with
all the .edu sites? This seems to be an error on the ranking side of things
for all these .edu sites for a financial related term such as [loans]? All the
results are .edu after page 4?

~~~
Matt_Cutts
Typically those .edu sites have been hacked at some point.

------
jrmg
The problem isn't just rank though. After I've seen the original source of
duplicated content, I don't just want sites that copy it to rank below it, I
want to not see them at all, so that the rest of my results are filled with
/different/ things.

------
fzk390
I also have one spam site example.
<http://www.google.com/search?q=internet+phone+service> Look at 3rd result for
internetphoneguide.org

------
franksinatra2
I was hoping to see something in the latest blog post from Google about
spammers using Exact Match Domains to get huge algorithmic boosts. EMD + tools
like xrumer usually = first page results :(

------
Darin81
What about sites like ezinearticles? Where they do not pay for content, isn't
that more like organic content from experts? They are not content machines.

------
dcdan
What is the technical difference between Google being able to accurately
measure the volume of spam, and Google removing the spam?

------
huertanix
tl;dr: Our algorithms already stop resultspam. Y'all are trippin'.

------
klbarry
I just want to thank Matt Cutts for always being classy. His blog
posts/comment are always just set at a high bar.

------
ddemchuk
It doesn't appear that Cutts & Co. are looking to address any of the more
popular blackhat link building methods that all popular SEO bloggers
continually say "work but you shouldn't use them yourself because they're
bad".

Until the keyword "buy viagra" isn't littered with forum link and comment spam
and parasite pages, Google's algo is still not "fixed"

~~~
Matt_Cutts
People can make tens of thousands of dollars a month if they rank highly for
certain phrases, so tons of SEOs (and spammers) are trying to rank for phrases
like that. Other search engines might hard-code the results for [buy viagra],
but Google's first instinct is to use algorithms in these cases, and if you
pay attention, the results for queries like that can fluctuate a lot.

With a billion searches a day, I won't claim that we'll get every query right.
But just because we don't "solve" one query to your satisfaction doesn't mean
we're not working really hard on the problem. It's not an easy problem. :)

~~~
tristanperry
How about the more general case of gray and black hat link building
techniques? Especially in internet marketing and certain SEO communities,
everything from forum profile backlinking to blog comments and mass article
submissions are done daily to rank sites in all niches, not just the really
spammy viagra-type sites.

There's a growing concern/consensus that in loads of non-spammy niches, the
only way to get decent rankings is to build links.

I know this is naturally something that would be very hard for Google to
tackle (since if 'junk' backlinks = domain penalty, black hatters would simply
spam their competitors' sites with backlinks and get their competitors
deindexed), but are there active efforts being made so that gray and black hat
link building campaigns aren't (in some cases) pretty essential to a site
getting good rankings?

~~~
kyle6884
Not to mention link building via satellite site networks. A competitor of ours
has hundreds of keyword-niche sites that link back to their URL but also link
to several other legitimate domains on each page. How could Google penalize
the domain responsible without penalizing sites that have nothing to do with
it?

------
hessenwolf
According to our metrics we are great; pity-about-you, unless you can "Please
tell us how we can do a better job."

I expected more. It reads like content farm.

~~~
brown9-2
Did you read past the first paragraph? They mention some recent changes
they've made:

 _To respond to that challenge, we recently launched a redesigned document-
level classifier that makes it harder for spammy on-page content to rank
highly. ... We’ve also radically improved our ability to detect hacked sites,
which were a major source of spam in 2010. And we’re evaluating multiple
changes that should help drive spam levels even lower, including one change
that primarily affects sites that copy others’ content and sites with low
levels of original content._

And they are saying pretty clearly they think they can do a better job on
content-farms:

Nonetheless, we hear the feedback from the web loud and clear: people are
asking for even stronger action on content farms and sites that consist
primarily of spammy or low-quality content. ... The fact is that we’re not
perfect, and combined with users’ skyrocketing expectations of Google, these
imperfections get magnified in perception. However, we can and should do
better. _

~~~
hessenwolf
To respond to that challenge, we recently launched a redesigned document-level
classifier that makes it harder for spammy on-page content to rank highly

When is recently?

We’ve also radically improved our ability to detect hacked sites

Since when; all the complaints I have read are from the last few weeks, and
when I looked up a medical complaint this week?

And we’re evaluating multiple changes that should help drive spam levels even
lower

Oh, are you now? Of course... Translation 'We are looking into it'. D'uh.

that copy others’ content and sites with low levels of original content

This last part is the only datum I got from the article - they explicitly
respond to stack overflow, etc... It is still fluff though.

What I would have preferred: I made a change on 2011-01-15 and you should see
it here, here, and here. 'Here' can be broadly defined.

