Ask HN: Writing an 80App

codepoet · on Nov 17, 2009

http://news.ycombinator.com/item?id=840244

Crawling "only" 120k pages can be done easily with a pure Python solution over a normal home / office internet connection. The packages urllib, urllib2, robotexclusionrulesparser and lxml are a good start.

Important: Don't forget to implement a crawl rate limit.

jdrock · on Nov 18, 2009

80legs automatically handles the crawl rate limits for you.

codepoet · on Nov 19, 2009

That's probably not the primary reason to use 80legs - but avoiding to implement a whole crawler.

zeynel1 · on Nov 19, 2009

Thank you for posting that link to previous HN discussion. They mention Scrapy http://scrapy.org/ and I looked at it. I liked the fact that it is Python based and the tutorial is very good. They even have a shell to test HPath Selectors. Now I have a better understanding of the process. Of course, it is not like filling a form as the case with 80legs, but I am having fun working through the tutorial. I also ran a couple of small jobs with 80legs but I am unable to see the results. I guess 80legs would be good for huge projects. In any case, I will try to work with both. Thanks again.

Another discussion about scrapy http://news.ycombinator.com/item?id=411733

zeynel1 · on Nov 17, 2009

http://www.80legs.com/

http://80legs.pbworks.com/80Apps