Hacker Newsnew | comments | ask | jobs | submitlogin
mjwalshe 699 days ago | link | parent

I have also seen a messed up (404 erroring) robots.txt file cause a site to get deindexed out of the blue


pierrefar 698 days ago | link

That's a misconception. A 404 on robots.txt will not have any effect on crawling as it's treated the same as an empty robots.txt file allowing all crawling.

But it's different for 5xx HTTP errors for the robots.txt file. As Googlebot is currently configured, it will halt all crawling of the site if the site’s robots.txt file returns a 5xx status code for robots.txt. This crawling block will continue until Googlebot sees an acceptable status code for robots.txt fetches (HTTP 200 or 404).

-----

mjwalshe 698 days ago | link

interesting that needs to go into the webmaster guidelines I was not seeing 500's or having it reported in GWT as errors on the site that it happened to

-----

pierrefar 698 days ago | link

It doesn't belong in the guidelines but it is described in the relevant section of the Help Center:

http://support.google.com/webmasters/bin/answer.py?hl=en&...

In summary: If for any reason we cannot reach the robots.txt due to an error (e.g a firewall blocking Googlebot or a 5xx error code when fetching) Googlebot stops its crawling and it's reported in Webmaster Tools as a crawl error. That Help Center article above is about the error message shown in Webmaster Tools.

Given that you said you did not see errors being reported, That suggests there was something else going on. If you need more help, our forums are a great place to ask.

-----

mjwalshe 698 days ago | link

Chears I am of on leave for a week ill get this put into our best practice guide for our devs and IS guys when I am back.

Funny thing was I tried resubmitting the main page in GWT an all the traffic came back almost instantly.

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library

Search: