
Well-intentioned websites get caught in Google’s algorithmic penalty box - kyle6884
http://www.seerinteractive.com/blog/googles-panda-algorithm-creating-endangered-species/
======
greglindahl
The client site,
[http://www.autoaccessoriesgarage.com](http://www.autoaccessoriesgarage.com),
is engaging in cloaking.

Go to [http://www.autoaccessoriesgarage.com/Seat-
Covers/](http://www.autoaccessoriesgarage.com/Seat-Covers/)

Use the picker to pick a particular make and model:
[http://www.autoaccessoriesgarage.com/Seat-Covers/_Acura-
RDX?...](http://www.autoaccessoriesgarage.com/Seat-Covers/_Acura-
RDX?year=2008)

So far so good, no problem. Your browser now has a cookie that says you're
interested in just this make and model. Now for the problem: use the nav links
to go to "Cargo trunk liners", and where do you land?

[http://www.autoaccessoriesgarage.com/Cargo-Trunk-
Liners](http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners)

That's cloaking -- it's not showing you all of the liners, just the ones
relevant to the make and model you picked earlier. Instead, the site should
add _Acura-RDX?year=2008 to the url, just like before.

Why do search engines care about this stuff? Now imagine you type in [auto
accessories cargo trunk liners] into your favorite search engine, and the
result is [http://www.autoaccessoriesgarage.com/Cargo-Trunk-
Liners](http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners) ... what does
the search engine think you'll see? It has no idea, really.

~~~
TomAnthony
Google disagree with your assessment:

[https://productforums.google.com/forum/#!searchin/en/cookie$...](https://productforums.google.com/forum/#!searchin/en/cookie$20cloaking/webmasters/6fQVXZ3ERVY/z4LyhfaGpCUJ)
(see the response is marked best answer by Matt Cutts - head of web spam at
Google).

If I have never been to the site I'd land on the unfiltered page that would be
a good result, and if I had a cookie (which seems to be a session cookie from
a quick look) then it is likely I was recently at the site and so the filters
are likely relevant but if not they are easy to change.

'Cloaking' has negative connotations and is more of a concern when there is an
attempt to mislead search engine. In this instance, there is a big problem
with your suggested fix -- the Panda algorithm would see many very similar
pages which might actually make things worse (which I agree is silly, as your
solution would otherwise have some upsides, but there is often a trade off in
these situations).

~~~
greglindahl
That's a simplistic way of thinking about the problem -- as a search engine
professional (not SEO), I'd never recommend something that depends on
GoogleBot figuring out that I'm not really cloaking.

The duplicate content problem you describe is fixable (edit: and is already a
problem, I'm only recommending changing links, not adding any pages to the
site.)

And by the way, there are plenty of websites that force crawlers to use
cookies in order to crawl the site. I don't know how GoogleBot deals with
that, but I bet it involves crawling with cookies... no matter what the forum
post says.

~~~
TomAnthony
Yeah - I don't disagree that there isn't possibly some level of risk. But if
your concern is "GoogleBot figuring out that I'm not really cloaking" based on
the presence of cookies then I'd challenge (what I think is) your implication
that having cookies on your site means Googlebot might suspect you of
crawling.

As to Googlebot's use of cookies - there is debate and folklore, but in the
tests I have run I have not seen Googlebot ever send back a cookie that I have
sent it.

Google do manual reviews of pages, and I am confident the site in this example
(for the case in question, at least) would pass that without a problem.

I'm (genuinely) interested in your proposed solution for dealing with the
duplicate content problem. The problem with the Panda algorithm is tends to be
a bit touchy and it seems easy to fall foul of it even with innocent
situations like this one.

~~~
greglindahl
That's not my implication, nor what I said! I said that this website should
choose a link method which is unambiguiously not cloaking. Then there's no
chance that you'll confuse search engine bots.

The duplicate content issue is not in play for my suggestion; as my edit above
states, I'm only recommending changing links, not creating any new urls.

------
johng
This is a great article and I wish it would get more attention. Very often a
site gets penalized without any rhyme or reason and the end user has no way to
find out why or how. It's just darkness, and Google doesn't care.

~~~
Kalium
I submit that they do care, but that the costs associated with telling people
how the penalty system works are higher than the benefits of telling people
how the penalty system works.

~~~
cgingrich
I agree there Kalium...Google is a company of scale and they think in broad
strokes (part of why they are so successful). I don't think I'm asking them to
tell us everything about how it works but they managed to create a messaging
system with Penguin penalties so what is it about Panda that makes that a
harder task? I'm definitely more attached to this than most so I know it might
be an anomaly but as someone who is encouraging clients and sites to create
the type of content Google wants and values, it's hard to see a site trying to
do it right, potentially making mis-steps and being completely lost on what to
do next.

~~~
Kalium
Perhaps they learned from previous messaging that it helps spammers more than
it helps people like you.

------
bhartzer
I think the problem is that the "Panda" algorithm seems to essentially have
the same criteria for all sites. Depending on the topic of the site (i.e.,
ecommerce site selling products or a ticket site selling tickets), lots of
pages tend to have the same content. Some product pages have the same product
in a different color, or the ticket site is selling the same tickets some
other site on the web is selling.

I get that Panda can be helpful, help identify "low quality" content. But the
true definition of "low quality" changes depending the industry and the
category of products being sold.

A good algorithm should be able to distinguish between the various sites or
topics of sites, and apply said algorithm differently, right?

~~~
cgingrich
For sure - I've seen eCommerce sites intentionally bloat category pages with
tons of written content for the sake of not appearing thin. That's definitely
not the experience that users want. I will say though that I've tended to see
a little bit more of objectivity specifically on eComm sites with the
algorithm being more forgiving on sheer volume of content (though not in this
case).

It's definitely hard and algorithms will get better with time as Google
understands more and more of the web, but in the meantime, give us a heads up
so we can fix it :)

~~~
bhartzer
Intentionally bloating category pages with tons of written content,
unfortunately, is the result of Panda. Sure it's not good for users, but
that's what happened, simply because those ecommerce sites had to adapt if
they were going to compete or have a leg up over their competition.

And I'm not sure if Google took that into consideration when they launched
Panda. If they had, we wouldn't be seeing the intentional bloat.

~~~
bhartzer
The same exact thing happened when Google launched Penguin. I bet Google
didn't realize that they would start a whole new industry, the shady industry
of making site owners pay money to get low quality links removed. But I
digress...

------
josephjrobison
@cgingrich - Perhaps you guys are only consulting on this penalty, but found
it strange that for all the focus on SEO basics, on the home page there's no
H1 tag (maybe seen as deprecated and not important by your team) and the main
slider is all text hidden in the image, no plain text for the crawler to
crawl!

But good write up overall, love these case studies.

------
Animats
Google has, in the sense that quantitative finance people use the term,
"burned" their data. That is, they're using statistical methods to extract
signal from noise, and they've done this so much that they're nearing the
noise threshold. When a data set is over-analyzed in this way, the impact of
irrelevant data items becomes excessive. That's what's happening here.

Search spam detection has improved over the years, but it's fundamentally
aimed at detecting sites that "look like spam". In response, search engine
optimization has become more about making clickbait sites look less like spam,
even to humans. It's now hard to tell a clickbait journalism site, one filled
by low-paid article rewriters, from one that has actual reporters. (Business
Insider is owned by the founder of DoubleClick.) Looking at the superficial
properties of a site is no longer a reliable spam indicator.

The big search indicator used to be links. That's what "PageRank" was about.
Links stopped working because most links to business sites now come from
social media and blogs, and those are really easy to spam. Anyone who runs a
blog now can watch the phony signups and posts come in. There's a whole
industry selling phony Google and Facebook accounts for SEO purposes. Google
has responded by disallowing many sources of links, with the result that the
remaining link data is sparse for many sites.

Google isn't looking at the business behind the web site. Here, Auto
Accessories Garage sells auto parts. Find the business behind their web site,
and you can verify that they are in the auto parts business. Their site is
full of auto parts. Therefore, not spam. Google doesn't do that. That's why
they failed Auto Accessories Group.

At SiteTruth, we look at the business behind the web site. Here's what we're
able to find out for Auto Accessories Garage.[1] This is the internal details
page; users rarely look at this. We give them a good rating. We didn't,
unfortunately, get a proper match to corporate records because their corporate
name is Overstock Garage, Inc. (We don't have a full D/B/A business name
database for dealing with such problems yet.) SiteTruth picked up the Better
Business Bureau seal of approval on the site, cross-checked it with the BBB
for validity, and noted the "A+" rating there. Not a spam site.

The process is completely transparent. The link below lets you see all the
data SiteTruth looked at for Auto Accessories Garage. Because it's checking
against hard data from external sources the site can't control, there's no
need to be mysterious about how it works. There's a vast amount of data
available on businesses. If you tap into Dun and Bradstreet (we can do this,
but can't turn it on for public viewing by free users) you get in-depth
financial data on companies. That allows real supplier evaluation, far beyond
what Google can do.

The SiteTruth approach does a good job on real businesses that sell real
stuff. There are objective measures for such businesses - revenue, years in
business, BBB ratings, even credit data. Google doesn't use those, and Google
fails real-world businesses because they don't.

If you want to try looking at SiteTruth ratings, try our browser add-on from
"sitetruth.com". We put those ratings on search results from Google, Bing,
Yahoo, DuckDuckGo, etc. Now on Firefox for Android, too. End self-promotion.

[1]
[http://www.sitetruth.com/fcgi/ratingdetails.fcgi?url=www.aut...](http://www.sitetruth.com/fcgi/ratingdetails.fcgi?url=www.autoaccessoriesgarage.com&details=true)

~~~
CPLX
How come your site thinks this company is the same thing as AutoZone? The
original blog post above identifies it as an independent family owned
retailer.

~~~
Animats
Those are possible matches. None of them matched on address, but they matched
on partial business name, city, state, and ZIP. We didn't get a solid match
because they use a different D/B/A name than their company name, and we don't
have a US D/B/A name list. We're using a marketing-quality database of US
businesses for free demo purposes. The high-quality database to get this
consistently right costs about $800K a year with daily updates.

------
mahouse
Of course. Computer algorithms are not perfect.

~~~
cgingrich
No doubt....I think the main argument though is how can Google encourage and
help out well-meaning site owners when the algorithm gets it wrong?

~~~
matthucke
Even just re-running it regularly (and letting site owners know that it's been
re-run) would be helpful - as the issues pointed out in your article were
largely fixed many months ago, the lack of any sort of movement is unfair and
disheartening.

------
crxgames
I've come to the same exact conclusion in the same vertical. My sites were hit
in almost the exact same way.

