
Show HN: Run Puppeteer on AWS Lambda - alixaxel
https://github.com/alixaxel/chrome-aws-lambda
======
sriram_iyengar
What is the advantage of running automated tests as lambda ? Typically when
automation tests are run, they are long running processes and lambda execution
may not be suitable. The cold start times of lambda is another challenge. What
is a good practice/model for running suites ? One test per lambda or, one spec
per lambda ? Still inclined to have ec2 instances created and destroyed via
devops tools like terraform to run automation. Thoughts please.

~~~
alixaxel
Well, puppeteer can be used for more than test suites (think screenshots, PDF
rendering, proxified APIs, ...).

But for running long automated tests, I'd probably look into alternatives like
Fargate, where the billing model is per-second with a one minute minimum.
Terraform + EC2 spot instances works too, obviously. :)

~~~
sriram_iyengar
Thanks. The example you’ve mentioned will be useful.

------
tnolet
I run a ton of Puppeteer jobs (300k in the last month), currently on EC2 and
Digital Ocean VM's, mostly due to the subtle difficulties of running Puppeteer
on Lambda.

Will certainly have a look at this project and contribute where possible.

My main concerns are not so much cold start time, as for my use case this is
not really a huge issue, but mainly the performance of Chrome on AWS Lambda
boxes. The rendering, navigation etc. needs to be snappy.

~~~
thesandlord
Google App Engine and Google Cloud Functions got native support for Puppeteer
a few months ago as well. Let me know what you think if you try it out.

[https://news.ycombinator.com/item?id=17795626](https://news.ycombinator.com/item?id=17795626)

(I work for Google Cloud)

~~~
alixaxel
The performance of puppeteer is super bad on CGF (you can read more about it
here
[https://github.com/GoogleChrome/puppeteer/issues/3120](https://github.com/GoogleChrome/puppeteer/issues/3120)).
It would actually be great to have someone really improve this situation
instead of dismissing it as a weird IO problem.

~~~
thesandlord
Did some research internally, this is being tracked but still no root cause
AFAIK :(

------
dschep
Here's another alternative lambda layer containing headless chrome with and
puppeteer example: [https://github.com/RafalWilinski/serverless-puppeteer-
layers](https://github.com/RafalWilinski/serverless-puppeteer-layers)

------
nailer
This is fantastic.

\- I'm just getting started with Lambda so pardon if this is ignorant, but
what's the cold start time of Chromium? Or can you warm start it somehow?

\- Since scraping often depends on state, wouldn't you hit a timeout doing
longer scraping joba?

~~~
alixaxel
Thanks!

So usually with Lambda, you want your jobs to be as atomic/quick as possible,
as Lambda is stateless and has a maximum duration of 15 minutes.

As for the warm up times, the decompression of Chromium with Brotli takes
about 700ms on a 1.5GB Lambda (this is faster than Gzip/Zip). Launching
Chromium itself and opening a new tab takes another 400ms or so. If you keep
your Lambdas warm (by registering a scheduled ClowdWatch event every 15
minutes for instance) your startup time will effective be those 400ms.

~~~
jpambrun
If you keep your Lambda warm, shouldn't you just use something like
browserless
([https://github.com/joelgriffith/browserless](https://github.com/joelgriffith/browserless))?

~~~
e1g
I run browserless Docker container on-prem and it works very well for us.
Fire&forget, +1.

