

Tell HN: The front page of Hacker News has been deindexed from Google - Roedou

You can confirm this by searching for &#x27;hacker news&#x27; in Google; the #1 ranking URL is &#x2F;newest, rather than the front page. This isn&#x27;t term specific - the site doesn&#x27;t appear for other terms that it usually ranks well for, such as &quot;news.ycombinator.com&quot; or &quot;hn&quot;.<p>I&#x27;ve checked the usual technical reasons (html head canonical&#x2F;robots meta tag, http headers, robots.txt issues) but I don&#x27;t see anything untoward.<p>I&#x27;ll keep looking into it, but I&#x27;m posting this here in case the admins&#x2F;mods have made any changes recently that could have had an effect. There&#x27;s a possibility that the URL has been removed by Google for some particular reason, though I can&#x27;t think of many pages that deserve it less than HN.<p>I&#x27;ll update this thread if I see anything, but hopefully someone else will post an answer before I figure it out....
======
Matt_Cutts
It's not that PG has a grudge against Google (or vice versa) or anything like
that. I believe that search engine bots crawl Hacker News hard enough that PG
blocks most crawling by bots. In the case of Google, he does allow us to crawl
from some IP addresses, but it's true that Google isn't able to crawl/index
every page on Hacker News.

Here's a link where I answered the same question about three weeks ago:
[https://news.ycombinator.com/item?id=5837004](https://news.ycombinator.com/item?id=5837004)
, so this isn't a new issue. In fact, PG has been blocking various bots since
2011 or so;
[https://news.ycombinator.com/item?id=3277661](https://news.ycombinator.com/item?id=3277661)
is one of the original discussions about this.

And to show this isn't a Google-specific issue, note that Bing's #1 result for
the search [hacker news] is a completely different site, thehackernews.com:
[http://www.bing.com/search?q=hacker+news](http://www.bing.com/search?q=hacker+news)

In general, I think PG's priority is to have a useful, interesting site for
hackers. That takes precedence and is the reason why I believe PG blocks most
bots: so that crawling doesn't overload the site.

~~~
Roedou
Thanks for that Matt; I didn't see that recent post or your comment, so sorry
for dragging you back here to repeat yourself.

Looks like I'm going to have to stop relying on searching 'hn' when using a
different computer, and start typing in the full URL. First world problems are
such a burden.

~~~
Matt_Cutts
No worries at all. I don't think the HN thread from three weeks ago made it to
the front page (I happened to see it while browsing on /newest). I figured
someone would notice and ask about this, so I'm happy to have the chance to
explain.

~~~
mkbrody
Hey Matt,

I'm sorry to reach out to you directly on a public forum like this, but my
company's website encountered a major negative SEO attack last month and we
were hit with a manual penalty by Google today. I thought you might be
interested to hear about what happened, and I of course I would like to
resolve it as I do my best to always keep my company's SEO efforts within
Google's guidelines. Please reach out via email to me at mbrody@myclean.com if
we can help each other fix this! Thanks again for everything you do to help
make the web a better place, and in advance I understand if you're too busy to
respond.

Best regards,

Mike B.

~~~
nicholasreed
Don't apologize to just Matt, you're pseudo-apology and better-sent-as-an-
email question pissed me off. Why would you take up three extra lines with a
BS platitude and a signature? Please keep a personal request for assistance to
better-suited channels.

~~~
karolist
Just downvote and move along, why the hostility.

------
jlgreco
Mmm, seems kind of like a feature. In fact, maybe PG should robots.txt google
entirely. It seems like HN has been getting mentions in other media with
increasing frequency. If you can't find the site just because google doesn't
doesn't list it, then I have to wonder what you are actually doing here. This
wouldn't be the first way that HN sets a bar for new users either; the "Create
Account" form is already hidden under "submit".

HNSearch works great for HN specific searches anyway.

------
JoeCortopassi
This has happened before, and usually has a non-pitchforky reasoning (e.g. PG
pulled it temporarily because of network/server issue). I'm sure it will be
back soon, and we will have a rather reasonable answer as to why. There are
way to many google employees, that frequent and enjoy HN, for it to be banned
for some arbitrary reason

~~~
AsymetricCom
And of course, the network has specific functions for censorship as required
by child protection laws. "Just a network error" really doesn't guarantee that
the network wasn't doing something nefarious itself.

------
gee_totes
If you are using DuckDuckGo, you can use the !hn bang to send your query to
hnsearch.com

~~~
eliben
This is trivial to do in any modern browser without DDG.

------
Roedou
I found this old thread, where pg had blocked most of the Google bots, and it
caused Google to think the site was down:

[https://news.ycombinator.com/item?id=3277661](https://news.ycombinator.com/item?id=3277661)

Could be a similar issue? I'll take a look.

~~~
Roedou
pg also commented that he doesn't want traffic from Google anyway:
[https://news.ycombinator.com/item?id=5808990](https://news.ycombinator.com/item?id=5808990)

In which case, he should add: <meta name="googlebot" content="noindex"> to the
html head of every page.

(I have to say, that's a smart way of avoiding any Eternal Septembering, but
it'd be a shame. I often use Google to find old HN threads that I vaguely
remember from months or years ago.)

~~~
dnautics
you may want to consider using hnsearch.

~~~
saurik
(Google's search is often better for this purpose as it has features like
synonym/typo fixing and as it indexes entire pages lets you match keywords
across entire threads: hnsearch is myopic on individual posts.)

~~~
snogglethorpe
Yup...

Even sites with good search functions are often still way outclassed by google
search with "site:..."—and most sites don't have good search functions...

------
glitch273
Matt Cutts browses this site. Maybe he knows the reason why?

~~~
jffry
Indeed, he showed up:
[https://news.ycombinator.com/item?id=5955374](https://news.ycombinator.com/item?id=5955374)

------
mattparlane
The .org site hasn't been delisted, so it's obviously not based on content:

[https://www.google.co.nz/search?q=site:news.ycombinator.org](https://www.google.co.nz/search?q=site:news.ycombinator.org)

------
meritt
This is most likely the same reason digg's frontpage was deindexed. There's no
"content" per se, it's just links. Someone will notice, add an exception, and
all is well.

Unlike Digg, HN has a substantial amount of content in the comments pages
though, which are heavily indexed.

Edit - All the comment pages are still indexed just fine. It's /only/ the
front-page. Which, imo, doesn't really matter anyway.

~~~
aidscholar
Sounds like overaggressive spam detection.

~~~
joepawl
This sounds like the case. Google is getting aggressive with its Panda
updates, and as a previous commenter noted, the HN homepage is just links.
Since that triggers Panda, it's a good bet that Google went a little overboard
(not unprecedented).

~~~
mahranch
> the HN homepage is just links. Since that triggers Panda

To be more specific, Panda is triggered by low quality/duplicate content.
'Penguin' is triggered by spammy/bad backlinks.

I'm not saying you're wrong (a page of links would look pretty low quality to
google's algo), I just wanted to add on for clarity's sake.

~~~
joepawl
Yes, I see where I was unclear. It's not the links themselves, but the lack of
original, robust content.

------
eli
_Please don 't post on HN to ask or tell us something (e.g. to ask us
questions about Y Combinator, or to ask or complain about moderation). If you
want to say something to us, please send it to info@ycombinator.com._

[http://ycombinator.com/newsguidelines.html](http://ycombinator.com/newsguidelines.html)

------
malandrew
I too had noticed this. It's unfortunate because searching via Google with
site:news.ycombinator.com in the query is much better than HN's own search
when you have a good idea what you're looking for (spearfishing search vs BFS)

------
chacham15
This isnt the first time this has happened and I suspect that it wont be the
last.

------
gscott
The pagerank has fallen from a 6 to a 3 as well.

------
godgod
Google is evil. Screw them. I refuse to use Google or their services. Make the
switch. They deindex a lot of sites they don't agree with. Not saying that is
the case here but they've been known to do it.

~~~
quantumpotato_
Link backing up your claims?

