I like this article but for its discussion of these libraries. On another note...
Am I the only one who dislikes Scrapy? I think it's basically the iOS of scraping tools: It's incredibly easy to setup and use, and then as soon as you need to do something even minutely non-standard it reveals itself to be frustratingly inflexible.
I do a lot of scraping specific pages and often have to auth, form-fill, refresh, recurse, use a custom SSL/TLS adapter, etc., in order to get what I'm after. I'm sure Scrapy would be great if I just had a giant queue of GET requests. Also, don't get me started on the Reactor.
Am I the only one who dislikes Scrapy? I think it's basically the iOS of scraping tools: It's incredibly easy to setup and use, and then as soon as you need to do something even minutely non-standard it reveals itself to be frustratingly inflexible.