
Suddenly, Hacker News is not the first result for 'Hacker News' - AretNCarlsen
https://www.google.com/search?q=hacker+news
======
Matt_Cutts
I think I know what the problem is; we're detecting HN as a dead page. It's
unclear whether this happened on the HN side or on Google's side, but I'm
pinging the right people to ask whether we can get this fixed pretty quickly.

Added: Looks like HN has been blocking Googlebot, so our automated systems
started to think that HN was dead. I dropped an email to PG to ask what he'd
like us to do.

~~~
pg
I sent you an email about this.

(A couple weeks ago I banned all Google crawler IPs except one. Crawlers are
disproportionately bad for HN's performance because HN is optimized to serve
recent stuff, which is usually in memory.)

~~~
pierrefar
Hi Paul,

A site can be crawled from any number of Googlebot IP addresses, and so
blocking all except one doesn't help in throttling crawling.

If you verify the site in Webmaster Tools, we have a tool you can use to set a
slower crawl rate for Googlebot, regardless of which specific IP address ends
up crawling the site.

Let me know if you need more help.

 _Edit_ Detailed instructions to set a custom crawl rate:

1\. Verify the site in Webmaster Tools.

2\. On the site's dashboard, the left hand side menu has an entry called Site
Settings. Expand that and choose the Settings submenu.

3\. The page there has a crawl rate setting (last one). It defaults to " Let
Google determine my crawl rate (recommended)". Select "Set custom crawl rate"
instead.

4\. That opens up a form and choose his desired crawl rate in crawls per
second.

If there is a specific problem with Googlebot, you can reach the team as
follows:

1\. To the right hand side of the Crawl Rate setting is a link called "Learn
More". Click that to open a yellow box. 2\. In the box is a link called Report
a problem with Googlebot which will take you to form you can fill out with
full details.

Thanks!

Pierre

~~~
aw3c2
I would like to set that crawl rate but do not see why I must register at
Google to do so. Why can't Google support the Crawl-Delay directive in
robots.txt for this?

~~~
Matt_Cutts
My plane is about to take off, but very briefly: people sometimes shoot
themselves in the foot and get it way, way wrong. Like "crawl a single page
from my website every five years" wrong.

Crawl-Delay is (in my opinion) not the best measure. We tend to talk about
"hostload," which is the inverse: the number of simultaneous connections that
are allowed.

~~~
officemonkey
I would think that the number of people who (a) know how to create a valid
robot.txt file, (b) have some idea of how to use the "crawl-delay" directive
and (c) write a "shoot-themselves-in-the-foot" worthy error is vanishingly
small.

~~~
Matt_Cutts
I alluded to some of the ways that I've seen people shoot themselves in the
foot in a blog post a few years ago: [http://www.mattcutts.com/blog/the-web-
is-a-fuzz-test-patch-y...](http://www.mattcutts.com/blog/the-web-is-a-fuzz-
test-patch-your-browser-and-your-web-server/)

"You would not believe the sort of weird, random, ill-formed stuff that some
people put up on the web: everything from tables nested to infinity and
beyond, to web documents with a filetype of exe, to executables returned as
text documents. In a 1996 paper titled "An Investigation of Documents from the
World Wide Web," Inktomi Eric Brewer and colleagues discovered that over 40%
of web pages had at least one syntax error".

We can often figure out the intent of the site owner, but mistakes do happen.

~~~
adbge
The number of webpages with HTML that's just plain wrong (and renders fine!)
is staggering. I often wonder what the web would be like if web browsers threw
an error upon encountering a syntax error rather than making a best effort to
render.

If you're writing HTML, you should be validating it:
<http://validator.w3.org/>

~~~
MatthewPhillips
Google.com has 39 errors and 2 warnings. Among other things, they don't close
their body or html tags.

Is there any real downside to having syntax errors?

~~~
thristian
The downside is maintainability. If your website follows the rules, you can be
pretty confident that any weird behaviour you see is a problem with the
browser (which is additional context you can use when googling for a
solution). If your website requires browsers to quietly patch it into a
working state, you have no guarantees that they'll all do it the same way and
you'll probably spend a bunch of time working around the differing behaviour.

Obviously, that's not a problem if you already know exactly how different
browsers will treat your code, or you're using parsing errors so elemental
that they must be patched up identically for the page to work. For example, on
the Google homepage, they don't escape ampersands that appear in URLs (like
href="[http://example.com/?foo=bar&baz=qux](http://example.com/?foo=bar&baz=qux)
— the & should be &amp;). That's a syntax error, but one that maybe 80% of the
web commits, so any browser that couldn't handle it wouldn't be very useful.

~~~
yuhong
Particularly before HTML5.

------
hahla
Does anyone else find it disturbing that google employees are bending over for
pg/hn? Seriously, if any other webmaster blocked googles bots they wouldn't
change their algorithms to accommodate, or see how they could use less of our
resources.

Its pg's fault not googles, and I dont see why they should care. Maybe from
their standpoint it would be more beneficial to google users who are used to
typing in 'hacker news' to visit this site, but since when did that matter to
google?

Also don't get me wrong I love both google and hackernews. I just find whats
going on in this thread interesting..

~~~
AznHisoka
Yes, it's VERY disturbing, no question about it. Matt can sound all helpful,
and cuddly, and oxytocin-inducing all he wants, but he's basically stepping in
and helping 1 site over another, when the whole process should NOT favor
anyone. There are lots of webmasters out there that do stupid stuff like
override a robots.txt accidentally, yet I don't see Matt sending them an
email, asking them to check up on it.. Come'n... this is lame!

~~~
apetresc
Are you kidding? Matt is basically a superhero who helps anyone he can find.
He's solved problems for people complaining on Twitter, on Google+, he hosts
regular video office hours when anyone can come to him with problems, etc. It
may be the first time YOU'VE seen him rush in to help someone, but this is
Matt Cutts' standard MO.

~~~
AznHisoka
No, he's not a super hero.. and no he doesn't help everyone that asks for it.
I've brought up many issues over the years to him via Twitter, blog comments,
emails, etc, and none of them.. count them.. ZERO have been addressed by him.
He just gives preferential treatment to those sites that make a lot of noise,
and might give Google or Google's quality team a lot of negative press. That's
all he does. I've even brought up 100% apparent, clear spammy sites and he
doesn't respond, and doesn't do anything about them. They're still there.

Lots of folks here like to give Matt Cutts this aura of an angel, or some sort
of saint. Those who have known his actions over the past 5-10 years know
better. Just ask guys like Aaron Wall and Rand. Matt Cutts is just a pawn that
is there to do damage control for Google. That's all he is.

~~~
Matt_Cutts
Hi AznHisoka. A lot of people send me spam reports, and I'm happy to pass them
on. But normally we investigate the spam reports without sending back specific
feedback of what action we took in response.

P.S. Just curious--is this the same AznHisoka from BlackHat World, the
"Blackhat SEO forum"? [http://www.blackhatworld.com/blackhat-
seo/members/137345-azn...](http://www.blackhatworld.com/blackhat-
seo/members/137345-aznhisoka.html)

~~~
code_duck
If it is, apparently the person in question doesn't enjoy the Blackhat World
forum very much, having made one post total.

~~~
Matt_Cutts
And yet AznHisoka still seems to have time to recommend a colon cleansing
product on Yahoo Answers to lots of different people:
<http://answers.yahoo.com/activity?show=EYIYH5R8aa> :)

~~~
code_duck
There's no hiding from Matt Cutts!

------
pierrefar
Hi

I work at Google helping webmasters.

It seems something has been blocking Googlebot from crawling HN, and so our
algorithms think the site is dead. A very common cause is a firewall.

I realize that pg has been cracking down on crawlers recently. Maybe there was
an unexpected configuration change? If Googlebot is crawling too fast, you can
slow it down in Webmaster Tools.

I'm happy to answer any questions. This is a common issue.

Pierre

~~~
joelhaasnoot
Why is it that "our algorithms think the site is dead" so soon, yet when I
search for a keyword, I still get sites that our dead or no longer contain the
given keyword? Bad algorithms or is this "thinking dead" a time delayed thing?

~~~
Matt_Cutts
Normally there's a lag between when a site goes dead and when we crawl the
site to see that it's dead.

It's also tricky because you don't want a single transient glitch to cause a
site to be removed from Google, so normally our systems try to give a little
wiggle room in case individual sites are just under unusual load or the site
is down only temporarily.

------
mmaunder
PG: welcome to the woes of being an alexa top 1000 site with over 1 million
pages of dynamic content.

HN has roughly 1.3 million pages indexed by google.

1.3M pages at 43k per page is 53 gigs to cache static versions of all pages on
the site. Quadruple that for a worst case scenario and it'll still easily fit
on a single drive.

When your site gets this popular you tend to have to re-architect your
application to solve perf issues. You could serve googlebot UA's 1 week old
cached pages for example.

I'd encourage you to start thinking of yourself as a utility providing a
valuable and necessary resource to the Net and take the time and energy to
solve this properly.

~~~
resnamen
I'd like him to fix the experience for non-bot users first, namely the
"Unknown or expired link" issue.

------
markerdmann
Has Google been making significant changes to the search ranking algorithms in
the past couple months? I've noticed a significant decline in the quality of
results, to the point that (for the first time in years), I've bounced over to
DuckDuckGo or Bing to try my luck there. I love Google as a company, so (if
this isn't just in my head) I'd love to see things get better again.

EDIT: Looks like the change to HN's ranking is related to a change that pg
made, so my comment is now less relevant to the parent post. I still stand by
it, though. :-)

~~~
Matt_Cutts
If you can look back in your search history to find the specific searches that
didn't do well for you, we're always happy to get really concrete examples.

The best reports look like "I did a search for [Bavarian red widgets] and the
results weren't good because e.g. you were missing a specific page X that you
used to return or should return, or you returned Austrian red widgets" or
whatever.

Lots of Googlers are clearly hanging out on HN over Thanksgiving while they're
stuck at relatives' houses. :)

~~~
rumpelstiltskin
Here's a search that tripped me up -

I'm in Houston, TX. When I searched for 'windshield repair houston', the 1st
result - wwwDOThoustonwindshieldrepairDOTnet - looked promising so I made an
appointment with them to get my windshield repaired. When I went to their
place of business, it was just a guy in a pickup truck in the parking lot of a
strip mall, with a 'windshield repair' sign on the back of the truck.

Turns out he's running a scam where he gets ppl to file claims with their
insurance, and when they pay him, he would kickback 50% to the customer.
Having insurance pay for damages is common enough, plenty of businesses do it.
But this guy was trying to get me to file claims for damages that I didn't
even have. When I asked for a cash price just to repair the damage I did have,
he refused saying that 'it wasn't enough money'.

I was pissed. Not only is this illegal, I couldn't believe he was ranked #1.
When I did some digging, _it turns out the guy is gaming google with a ton of
paid backlinks_. For example,
[http://www.searchpicks.com/business/automotive/patsco-
windsh...](http://www.searchpicks.com/business/automotive/patsco-windshield-
repair-l1519.html) (click 'suggest listing' for the price).

I'm sure plenty of other searchers ended up wasting their time with this SERP
just like I did.

~~~
Matt_Cutts
We'll check into this--thanks for the detailed report.

~~~
rumpelstiltskin
You're welcome. I did a little more digging and
wwwDOTpatscowindshieldrepairDOTcom is a mirror site with the same info that's
ranking for the same terms, and the backlink profile for this site also shows
lots of paid backlinks.

------
lhnz
I actually think this might be positive for the quality of the conversation on
this site.

~~~
_ndrw
True. No reason not to keep HN on the hush hush, eh?

~~~
billpatrianakos
I disagree. I mean, we don't want to keep it a secret but at the same time
this site has always been really close knit and the people all really just
"get it". If HN grows too large I fear for it. The web is full of Cretans and
people who think they're thoughts are so worthy they must be heard. Lots of
ignorance out there. I've seen some it here already. I learned the culture and
studied the etiquette and what qualifies as a useful submission or comment
before I really joined in. Not everyone is so thoughtful and it could really
sour a once great community. You're obviously new around here so I'm not
surprised you'd say that. I mean no disrespect - I do think you'll eventually
change your mind though.

------
w1ntermute
Out of curiosity, how did the OP even find out that this was going on? He
appears to be a long-time user of the site, and therefore would have no reason
to Google "Hacker News".

The most common other reason would be that some people use Google as their URL
bar - instead of typing "hackerne.ws" or "news.ycombinator.com" into the URL
bar, they type "Hacker News" into Google and click on the first result.
However, I would've thought that the types of people using HN would have the
tech savvy to use a keyword bookmark, or at least the URL.

~~~
pbhjpbhj
Sometimes I want to find HN but am using a different device, sometimes that
device is also a touchscreen and typing is a pain. Usually Google is very
close to hand and the autocomplete saves typing out a full address or full set
of keywords.

So I'd only need to type "hacke" in google and click "I'm feeling lucky" in
the drop down rather than type "<http://news.ycombinator.com/> into an address
bar.

Didn't know about the hackerne.ws domain name however.

------
ludwigvan
A few months ago, I was talking about the Stanford ai/ml-classes to one of my
friends. He asked me if there were any more classes and I said "Yes, take a
look at the front page in Hacker News, there are some links. I visit that site
frequently, it's very helpful."

There was one little issue though.

The poor guy didn't know what hackernews was, so found that site
(hackernews.com). He then scanned the Twitter stream over that site several
times to find those links and started visiting the site for several days to
find those other helpful links.

When I saw him again a few days later, he told me: "What a silly site
HackerNews is! And I couldn't find the links to those classes over there."

He also told me that he was disappointed of me for visiting such a silly site.

Now, can you guess the look on his face when I told him that he was visiting
the wrong site for the last few days?

~~~
tillk
great story, bro

------
lowglow
Matt Cutts is my hero. I've never seen anyone from google (or any company) be
as proactive in explaining, interacting, and helping this community. Thanks.

------
cletus
Raised here already:

<http://news.ycombinator.com/item?id=3277365>

FWIW I've raised this issue.

~~~
Matt_Cutts
Thanks, cletus. I'll file one too, just to help make sure people check it out.

------
test5625
Don't link to Google search results!

It's personalized - everyone sees different results. Even if you don't have a
Google account.

For me <http://news.ycombinator.com> is the top page. But when I use TOR,
<http://www.hackernews.com> and <http://thehackernews.com/> are on top.

I don't think it's possible to get a real "invariant" result page. It all
depends on which computer you use (cookies, language setting, ip address).

~~~
waitwhat
[http://googlesystem.blogspot.com/2007/04/how-to-disable-
goog...](http://googlesystem.blogspot.com/2007/04/how-to-disable-google-
personalized.html)

------
AretNCarlsen
Or even on the first page. (I notice because I always get to HN this way.
Today is the first time I have _ever_ seen it below #1 or #2.)

------
deepkut
Screw it--don't crawl HN. Keep the community small and passionate :)

------
bezza1
Bottom line is that if you don't trust Google & use their tools (GWT) then you
can end up in sticky situations like this one.

My experience with the Crawl Rate feature via GWT is that they do honour it
pretty strictly, but for large sites Gbot can cause a lot of extra load even
if pages are static.

A good CDN and stateless cache server will help but for sites as large as HN
every request adds up!

------
alpb
I remember times a few months ago when Hacker News was not the first result
again. Just wanted to point this out.

------
gaza3g
I think it makes a a lot of sense for Matt Cutts to intervene in this case
since:

1) Matt browses HN. 2) HN is a high-volume site and whatever suggestions that
were discussed and implemented here can be noted and learned by everyone else.

------
yllus
I definitely didn't expect to see my own blog as the last result on page one.
Or is that because I've shared it via Google Plus and it's being injected into
my personal results?

------
kizel
Yeah, I usually get here that way, but today was the first time I haven't seen
it on the front page, let alone first result.

------
williamle8300
I want to "Learn ethical hacking training."

~~~
braga
Stay away from venture capitalists ;)

------
BadassFractal
I don't think it's all bad, we don't want this community to turn into another
reddit.

------
rplnt
Nor second or anywhere in the top five (seven it is).

------
baby
It has never been the first result for me.

------
skillachie
lol, i checked google twice, then checked to make sure it was actually google
that I was searching "Hacker News" in

------
chorola
Yes.What's wrong?

------
Craiggybear
Still number one via DuckDuckGo.

Google is now officially useless.

~~~
noduerme
DDG is using Bing for most general web results. My site mysteriously
disappeared from Bing and DDG a few days ago at the same time; but Gabriel
said if I'm not cool enough to get a top ranking on HN for six hours to
complain about it, I'll just have to go through the normal process Bing sets
up for webmasters.

------
TastyFish
Oh shit, now I'm gonna have to remember the url.

------
tpr1m
That website sucks too :\

------
StatHacking
This is an example of why Goog's search algorithms (and others') should be
open: <http://news.ycombinator.com/item?id=3268371>

A subtle attack may be by making bots stop indexing it or using SEO practices
to lower it enough so it would become unsearchable, and therefore, non-
existent.

Or just crack into Google...

~~~
StatHacking
UPDATE: [http://www.itworld.com/software/228393/free-software-
activis...](http://www.itworld.com/software/228393/free-software-activists-
take-google-new-free-search-engine)

