Hacker News new | past | comments | ask | show | jobs | submit login

There is no silver bullet for stopping web scrapers. If you just try to block User Agents or IPs, all you are doing is putting a small hurdle in their way. You have to employ a lot of different tools to be able to make the wall high enough that they actually stop trying to scrape you. Some of the key things you need to do are:

behavioral modeling - rate limiting, bandwidth restrictions, etc

identity verifications - make sure they are running the browser they say they are, allow google and other search engins by whitelisting their IPs, block others that are pretending to be google, etc

code obfuscation - make it hard for them to scrape your code. Change up the CSS, etc.

OR you can use an automated service to do all this for you. Check out www.distil.it. Full disclosure, I'm the CEO of Distil.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: