Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Plugin Based, Batteries Included, Web Scraper (github.com/get-set-fetch)
5 points by a1sabau on Jan 20, 2021 | hide | past | favorite | 1 comment

I've started this library as a means to isolate my own scraping logic from all the boilerplate a scraping project requires.

- Start with a working set of built-in plugins capable of identifying, scraping (based on CSS selectors) and storing binary or html web resources. - Insert your own plugins containing your own custom logic. - Query the scraped data directly from one of the supported databases or export it as csv or zip.

The initial github commit from two months ago is misleading, if you take the look at the first project (now obsolete) under the github get-set-fetch organization it all started a few years back. If I don't run out of steam, I'll port the features from the extension next (yes, there's also a browser extension :)), mainly the ability to click and scrape dynamic javascript pages.

What functionality would you like to see in such a project? More built-in plugins covering a wider range of scraping scenarios? More storage options? Additional browser support?

Any feedback is appreciated. Thank you :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact