Hacker News new | past | comments | ask | show | jobs | submit login

rather than blocking a bot, it would make much more sense to CAPTCHA an ip that is producing a lot of traffic in a short time. Scaping has always been part of the web, and one should not have the belief that the information on a website is only going to be available on said website.



This approach only stops the most basic and laziest scrapers. Some people have tens of thousands of diverse IP addressed to utilize for scraping. Many of them will not give a shit about your bandwidth or server constraints and will cause your server to hit bottlenecks, making it slow and useless for everyone.


I guess the best approach would be to captcha everything until we've captcha'd ourselves back into dial-up times for content delivery. /s


You ever used tor lately?


> it would make much more sense to CAPTCHA an ip that is producing a lot of traffic in a short time.

CAPTCHAs are useful, but they're an X/Y problem in the same way that this headless-detection is: trying to detect human vs bot, when the real solution would be to slow down (a portion of) the traffic.

Hashcash would seem like a better solution, since that doesn't lock anybody out (human or bot), it just slows them down to reduce server load. If some clients are higher priority than others (e.g. human users vs poorly-programmed bots) then use info like IP, cookies, etc. to slow down the low priority requests, or even adjust the difficulty depending on how likely the client is to be causing load.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: