
Show HN: Apify SDK – A scalable web crawling and scraping library for JavaScript - jancurn
https://github.com/apifytech/apify-js
======
jancurn
Hey guys, today we’re showing HN a new open-source library that we have been
working on for almost a year. It incorporates lessons learned from scraping of
thousands of websites over the last 4 years. We figured there was no such
universal library for JavaScript, while for example Python has one
([https://scrapy.org/](https://scrapy.org/)). That wasn’t fair, because
JavaScript is THE language of the web :)

Anyway, we hope you’ll give it a shot and we’re really looking forward to hear
what you think about it. All feedback welcome!

------
rajangdavis
I wish I could upvote this more. This solves a huge problem for me and will
definitely be taking a peek at this over the weekend.

Thank you so much for making and sharing this!

------
darekkay
Thanks, this looks solid, with a really extensive documentation. I will give
it a try for my next crawling/bot project :)

~~~
jancurn
Awesome, looking forward to hear what you think :)

------
raitom
This comes just in time when I needed to replace an old scraper!

Does it have to run on an instance or can we also use a serverless
environnement?

~~~
jancurn
The SDK runs anywhere where you have Node running. And if you can run headless
Chrome with Puppeteer there too, than you can use it in the SDK too. This
might require several libraries and configuration settings. If I’m not
mistaken, Google Cloud Functions support Puppeteer by default, AWS Lambda does
not. With any Docker-based serverless platform such as Zeit Now or Apify Cloud
you just need to use the right Docker image.

------
pdxandi
I'm a huge fan of Apify and look forward to exploring this new SDK. Thanks
y'all.

~~~
jancurn
Thank you so much!

