

Ask HN: Open source focused crawler? - cookerware

Is there an open source crawler&#x2F;library that will recursively follow only links under a certain xpath and ignore the rest?<p>I don&#x27;t want to do an exhaustive crawl of every single link, I want something that will only follow links under a main content area.
======
sheraz
I highly recommend Scrapy ([http://www.scrapy.org](http://www.scrapy.org)).

From their site:

Scrapy is a fast high-level screen scraping and web crawling framework, used
to crawl websites and extract structured data from their pages. It can be used
for a wide range of purposes, from data mining to monitoring and automated
testing.

------
techaddict009
Check this out : [http://commoncrawl.org/](http://commoncrawl.org/)

Its not exactly what you are looking for but might help you.

------
forkrulassail
Have you tried BeautifulSoup?

