
Digg banned from Google? - trevin
http://martinmacdonald.net/digg-banned-from-google/
======
Matt_Cutts
This has nothing to do with Reader. We were tackling a spammer and
inadvertently took action on the root page of digg.com.

Here's the official statement from Google: "We're sorry about the
inconvenience this morning to people trying to search for Digg. In the process
of removing a spammy submitted link on Digg.com, we inadvertently applied the
webspam action to the whole site. We're correcting this, and the fix should be
deployed shortly."

From talking to the relevant engineer, I think digg.com should be fully back
in our results within 15 minutes or so. After that, we'll be looking into what
protections or process improvements would make this less likely to happen in
the future.

 _Added_ : I believe Digg is fully back now.

~~~
relix
If this would happen to a less popular site, what chances does a site-owner
have of getting attention to this problem, and getting it fixed?

~~~
Matt_Cutts
Hey relix, it took an unfortunate chain of corner cases for this to happen,
and for this situation it was actually more likely for the corner cases to hit
a larger site rather than a less popular site.

In general, when a member of the webspam team directly applies a manual
webspam action against a site, we also drop a note to the site owner at
<http://google.com/webmasters/> . That helps the site owner tell whether
something is going on with manual spam vs. just algorithmic ranking. Then any
site can do a reconsideration request at the same place or post in our
webmaster forum at <https://productforums.google.com/forum/#!forum/webmasters>
.

People like to scrutinize Google, so I've noticed that writing a "Google
unfairly penalized me" blog post typically makes its way to us pretty often.

~~~
WalterGR
Hi Matt,

That doesn't match my experience. Could you explain the penalty against
onlineslangdictionary.com?

Showing citations of slang use[1] caused what appears to be an algorithmic
penalty. The correlation between showing citations and the presence of a
penalty is apparent:

[http://onlineslangdictionary.com/static/images/panda/overvie...](http://onlineslangdictionary.com/static/images/panda/overview-
charts.png)

Missing from those 3 charts is the one showing that citations were once again
removed over 120 days ago, yet the penalty remains. It would appear that the
algorithmic penalty was turned into a manual penalty.

I've followed all procedures including those listed in your comment, without
resolution.

[1] By citations of slang use, I mean short (1-3 sentence) attributed excerpts
of published works, shown within the appropriate definitions, as evidence of
the correctness of those definitions. All citations were gathered and posted
by hand.

~~~
Matt_Cutts
Hi Walter, the only manual webspam action I see regarding
onlineslangdictionary.com is from several years ago (are you familiar with a
company called Web Build Pages or someone named Jim Boykin?), but that no
longer applies here.

You're affected by a couple algorithms in our general web ranking. The first
is our page layout algorithm. See
[http://googlewebmastercentral.blogspot.com/2012/01/page-
layo...](http://googlewebmastercentral.blogspot.com/2012/01/page-layout-
algorithm-improvement.html) or [http://searchengineland.com/google-may-
penalize-ad-heavy-pag...](http://searchengineland.com/google-may-penalize-ad-
heavy-pages-100601) for more context on that. In particular, comparing a page
like <http://onlineslangdictionary.com/meaning-definition-of/compy> to a page
like
[http://www.urbandictionary.com/define.php?term=Pepperazzi...](http://www.urbandictionary.com/define.php?term=Pepperazzi&defid=6571433)
, your site has much more prominent ads above the fold compared to Urban
Dictionary.

Your site is also affected by our Panda algorithm. Here's a blog post we wrote
to give guidance to sites that are affected by Panda:
[http://googlewebmastercentral.blogspot.com/2011/05/more-
guid...](http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-
building-high-quality.html)

~~~
Matt_Cutts
P.S. One other quick thing. I saw you sending me tweets, but the tweets looked
fairly repetitive, and you hadn't chosen a Twitter avatar. I get a lot of
tweets from bots, and this looked fairly close to bot-like to me:
[https://twitter.com/mattcutts/status/315232934040846337/phot...](https://twitter.com/mattcutts/status/315232934040846337/photo/1)
That (plus the fact that the site had no current manual webspam actions, plus
the fact that I wasn't sure what you meant by citations) meant that I didn't
reply. Hope that helps.

~~~
WalterGR
Makes sense. All the tweets were by hand. I tried to tweet every weekday but
missed some, then eventually gave up.

I just didn't know what to do upon getting no feedback from you guys after
posting to the Google Webmaster forums, filing reconsideration requests,
contacting friends at Google, posting to and commenting on Reddit about it,
commenting on HN about it, posting to Facebook, blogging, and tweeting about
it, and putting a yellow box at the top of all pages on the site mentioning
the penalty and linking to a page with the details.

Thanks again for talking with me about this. (I'd still like to hear about Web
Build Pages / Jim Boykin and the rest -
<https://news.ycombinator.com/item?id=5444996> ...)

------
blauwbilgorgel
Doing a site:digg.com/news/ search on Bing shows a lot of pages like these:

<http://digg.com/news/gaming/ing_bank_i_ilanlar>

and even more duplicate tag and rss pages for "site:digg.com/tag/" and
"site:digg.com rss".

These /news/ pages 302 redirect to many different sites (some are bound to
contain spam or be of lower quality).

302 redirects for these links is bad practice. Some link shorteners (ab)use
302 Found (instead of 301 Moved Permanently) to hoard content that doesn't
belong to them. The content for these links can't be found on digg.com, so
they too use the wrong redirect and associate themselves with all pages they
link to.

Besides that: Digg.com acts like a single page webapp for most of its content.
There are no discussion pages or detail pages for the stories. The content
that does appear is near duplicate to other content on the web, especially
with popular stories, where many blogs just copy the title and the first intro
paragraph.

~~~
ChuckMcM
I saw a similar pattern in the links that remain, it's sad to see the link
authority of a site get plundered, and it seems like something inside Google's
indexer realized that was going on and deleted it.

As a search engine its something you have to do if you want to consider site
authority in your ranking model.

------
MiguelHudnandez
The article is down, but here is the text from Google's cache:

    
    
        --snip-- 
    

_Something interesting has just come across one of my networks (hat tip to
datadial), just a few days after Digg have announced that they are building a
replacement for the much loved Google Reader, they have (coincidentally?)
disappeared from the primary google index.

[image unavailable]

Is it an SEO penalty for links? That seems to be the number one reason that
brands are getting booted from google’s index these days… Some conspiracy
theorists will no doubt be proclaiming its something to do with their
announcement to build a replica of the now defunct Google reader, but
personally I really cant see that having any effect. Could there?

Doing a site: search for Digg certainly demonstrates that they are no longer
in the index:

[image unavailable]

Its likely (only if it is link based however) that it would be down to what
individuals who submit content do after the fact – ie. sending spammy links at
their posts to try and build the pagerank, and create “authority” which they
then pass back to their own sites. Digg has long been listed in every
“linkwheel” sellers handbook, and if that is the reason then what does it mean
for every community site on the internet?

Will we have to manually aprove all new links soon at this rate? Come on
Google – WTF – let the internet know what you’re doing please._

    
    
        --end snip--
    

Without the images, maybe I am missing some context. But it seems hyperbolic,
considering digg is not serving a robots.txt anymore [1]. It is probably just
a blunder on Digg's part.

[1]: At the time of my comment, digg was serving an xhtml document with status
404 at /robots.txt. Now it appears to be a valid robots file.

PS: I am enjoying the irony of having a copy of the article, even though the
site is down, because of Google's cache. Need to pontificate about how Google
is potentially evil but can't keep your server running? Don't worry, people
can read it via Google's cache.

~~~
perlgeek
I just tried the digg.com/robots.txt, and for it me it says:

User-agent: * Disallow:

Which explains everything without conspiracy theories

~~~
philip1209
Yes, this seems to explain it.

<http://www.robotstxt.org/robotstxt.html>

~~~
dredge
No, it doesn't. Your linked example contains:

    
    
      User-agent: *
      Disallow: /
    

While the (new) digg.com/robots.txt contains:

    
    
      User-agent: *
      Disallow:
    

Those are very different. The former essentially disallows bots from crawling
the entire site, while the latter disallows nothing - effectively allowing
everything. The syntax is unusual, granted, for historical reasons.

------
Alex3917
It's possible (though not necessarily likely) that they finally got banned for
their toolbar, which is basically designed to scam Google.

~~~
sjs382
I'm not familiar with their toolbar. How does it scam Google?

~~~
Alex3917
It makes it look like you're still on Digg even when you click the a link to
go to another site. So essentially it makes it look like people are spending a
lot more time on Digg than they actually are.

------
niggler
I'd wait for an official Digg statement, because Digg may have requested to be
removed from google search results (the net result would be the same)

------
dredge
From my experience, when Google de-indexes a site, they also suppress any
PageRank the Google Toolbar would have shown for it; that doesn't appear to be
the case here, digg.com is still PR8.

Having toolbar PageRank[1] and yet no cached page[2] is not something I've
seen before.

[1]
[http://toolbarqueries.google.com/tbr?features=Rank&sourc...](http://toolbarqueries.google.com/tbr?features=Rank&sourceid=navclient-
ff&client=navclient-auto-
ff&iqrn=UgnC&ch=8f25a5d62&q=info:http%3A%2F%2Fdigg.com)

[2] <https://www.google.com/search?q=cache:digg.com>

~~~
jasongill
Your experience may be limited, because this is very common. PageRank and
indexed status operate independently of each other; it's not uncommon to see a
site that was deindexed still maintain PR for quite some time (and often,
indefinitely). However, if your site is deindexed, PR means "nothing" because
your links no longer provide juice.

Valid PR while being deindexed is one of the (many) tricks that Google has
added in the last couple years to try to reduce the usefulness of getting PR
for blackhat purposes

------
jcampbell1
Maybe they nuked themselves:

<http://digg.com/robots.txt>

~~~
highace
It's cloaked so only google can see it - try changing your user agent to match
googlebot's.

~~~
cpeterso
Using the googlebot User-Agent string gives me:

    
    
      User-agent: *
      Disallow:
    

This robots.txt should allow all bots to search the entire website. However, I
think Google also penalizes websites that serve different content to googlebot
than to non-bot user agents.

~~~
charonn0
This is the same as what I see, using a standard-issue Firefox UA string.

------
joetek
Google admitted the mistake: “We’re sorry about the inconvenience this morning
to people trying to search for Digg. In the process of removing a spammy link
on Digg.com, we inadvertently applied the webspam action to the whole site.
We’re correcting this, and the fix should be deployed shortly.”

[http://thenextweb.com/google/2013/03/20/google-seems-to-
have...](http://thenextweb.com/google/2013/03/20/google-seems-to-have-de-
indexed-digg/)

------
qompiler
Google has rules concerning SEO. A warning for the rest of us not to try to
cheat our way to a better ranking.

------
rodion_89
Their robots.txt file clearly asks to not be crawled at all.

    
    
      User-agent: *
      Disallow:
    

<http://digg.com/robots.txt>

~~~
blauwbilgorgel
Incorrect. To configure your robots.txt to not be crawled at all use:

    
    
      User-agent: *
      Disallow: /
    

Allow indexing of everything with:

    
    
      User-agent: *
      Disallow:
    

It seems they are at this very moment struggling to change things.

~~~
rodion_89
You are totally right. Disregard everything I said.

------
Karunamon
Betteridge's law of headlines strikes again.

------
kiallmacinnes
<http://digg.com/robots.txt> <\-- 404 Not Found

I know a robots.txt 404 shouldn't de-list you, but I would have expected digg
to have one?

Maybe they requested de-listing and removed their robots.txt? Who knows!

~~~
randomdata
There is one there now:

    
    
      User-agent: *
      Disallow:

~~~
kiallmacinnes
Yup - It's back now.. But it 100% didn't exist when I posted earlier :)

------
eiderv
What I love most about this situation is the multiple theories, when in
essence Google puts it very simply: WE F __*ED UP. Sorry!

------
nicholassmith
If you assume conspiracy then you're most likely ignoring the simple fact of
life in the technology world. Someone made a screw up.

------
AznHisoka
It doesn't matter if they were banned - they didn't care about SEO in the
first place.

All those old URLs they had for years? All disappeared as they wanted to start
fresh.

------
hugbox
And nothing of value was lost.

