What if you're scraping with ajax? Wouldn't each individual user's IP take the h...

NathanKP · on July 24, 2012

You can't scrape with AJAX because of cross domain security restrictions.

One potential solution to obey robots.txt might be to spawn multiple small EC2 instances with different IP's and have them coordinate with each other to share the crawling without individually running over the limits. (This is also useful for scraping from sites that have rate limits)

typpo · on July 25, 2012

robots.txt doesn't enforce itself so there is no IP limitation; this is still a violation and no better than simply lowering the delay on a single scraper.

dchuk · on July 24, 2012

The ajax request looks to be getting proxied through this guy's server. You have to do something like that because of cross domain ajax issues