Hacker News new | comments | show | ask | jobs | submit login

I feel sorry for John_Onion. The comments here are coming from people who have never looked at server logs (e.g. 'how do you know those are all spiders?'). Looking at my logs I would say at least 75% of my hits (juliusdavies.ca) are spiders. They come and visit every single page a few times a year to see if it's changed. For my own purposes I mirror some open source manuals and specifications (http://juliusdavies.ca/webdocs/). These have been on my site, unchanged, for at least 3 years, and the spiders come every couple months and check every page.

These hits will never (and should never!) translate into even a single real user in my case.

Wouldn't exclusion via robots.txt be appropriate in this case?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact