For those who want to use a java based solution, I invite you to check out my open source block tolerant (IP Blocking) web scraper that runs on top of aws and rackspace, called Tales. Tales is designed to be easy to deploy, configure, and manage. With Tales you can scrape 10s or even 100s of domains concurrently.
https://github.com/calufa/tales-core