Hacker News new | comments | show | ask | jobs | submit login

As I said in my post below, spiders that request article_url/reddit.png or article_url/google-analytics.com/ga.gs do not get a 301 from me because they're not looking at an href of an <a> tag. They're guessing at a URL that never existed. They are legitimate 404 responses.

I feel sorry for John_Onion. The comments here are coming from people who have never looked at server logs (e.g. 'how do you know those are all spiders?'). Looking at my logs I would say at least 75% of my hits (juliusdavies.ca) are spiders. They come and visit every single page a few times a year to see if it's changed. For my own purposes I mirror some open source manuals and specifications (http://juliusdavies.ca/webdocs/). These have been on my site, unchanged, for at least 3 years, and the spiders come every couple months and check every page.

These hits will never (and should never!) translate into even a single real user in my case.

Wouldn't exclusion via robots.txt be appropriate in this case?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact