

Ask HN: A good list of bots/crawlers ip addresses? - rksprst

I have a URL shortener and I don't want to count visits by bots, is there a good comprehensive list of bots/crawlers?<p>Possible in a CSV format?<p>Either ip addresses, user-agents, or both.
======
madhouse
Another option would be to use robots.txt to stop bots from accessing a
particular URL (for example, an 1x1 image or somesuch). Hide that somewhere in
every page, and only count visits where the image was shown.

This does require that the url expansion works as a display + redirect, so an
intermediate page is shown. If it doesn't work like that...

Well, you can simply exclude the bots and crawlers with robots.txt. The
downside of that is that then they won't index your shortened links either,
which may or may not be a problem.

------
jedberg
Lists like this aren't generally shared, because then the nefarious bots would
know they had been caught.

Well behaved bots tend to use useragents that make themselves fairly obvious.

The best bet is to watch your logs for an IP or agent that seems to hit more
URLs than anyone else, and then investigate by hand.

