Hacker News new | past | comments | ask | show | jobs | submit login

I like this article but for its discussion of these libraries. On another note...

Am I the only one who dislikes Scrapy? I think it's basically the iOS of scraping tools: It's incredibly easy to setup and use, and then as soon as you need to do something even minutely non-standard it reveals itself to be frustratingly inflexible.




Scrapy is about as flexible and extensible as you can get... Care to elaborate on "frustratingly inflexible"?


I do a lot of scraping specific pages and often have to auth, form-fill, refresh, recurse, use a custom SSL/TLS adapter, etc., in order to get what I'm after. I'm sure Scrapy would be great if I just had a giant queue of GET requests. Also, don't get me started on the Reactor.


Image pipeline converts to jpg automatically. Cannot be disabled without writing all the image code yourself


I don't really know what you mean. Scrapy is in Python. You can do whatever you want.


Haha, what specifically are you talking about?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: