Hacker News new | past | comments | ask | show | jobs | submit login

I can't say for sure that there are none, but I believe that I've done quite a bit of research. If there really was an excellent web crawling framework it should have bubbled up to the top.

I don't remember the names of all projects that I've looked at, but the main ones were Nutch, Hetrix, scrapy and crawler4j. I've come across several companies/startups that have built their crawlers in-house for the same reasons (e.g. http://blog.semantics3.com/how-we-built-our-almost-distribut...).

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact