Will certainly have a look at this project and contribute where possible.
My main concerns are not so much cold start time, as for my use case this is not really a huge issue, but mainly the performance of Chrome on AWS Lambda boxes. The rendering, navigation etc. needs to be snappy.
(I work for Google Cloud)
Granted, there are some issues with rendering (fonts, emojis and whatnot) but meanwhile there are solutions available that could be explored.
Feel free to try it out and share your specific challenges on GitHub, I'll do my best to come up with solutions for them.
out of curiosity, what is it that you do that demands so many sessions? Just webscraping?
But for running long automated tests, I'd probably look into alternatives like Fargate, where the billing model is per-second with a one minute minimum. Terraform + EC2 spot instances works too, obviously. :)
The advantage is that you can run a lot of tests concurrently at a relatively cheap cost.
The company  I work for offers VMs that are created/destroyed automatically after each test. There's no cold start, and no time limit. Plus you can choose to run headless like Puppeteer or test in an actual OS like Win/Mac.
It's been quite difficult to get it right, with Puppeteer being young etc. but it chugs along nicely now at around 10k-15k runs per day spread over 4 regions.
Meaning you’ll get at least 10 to 14 minutes of Headless testing.
As for recommandations on how to do so , unless your testing is super long you should just run the entire thing in one function . Otherwise decouple your testing based on the various modules of your app ( i.e one module per function )
- I'm just getting started with Lambda so pardon if this is ignorant, but what's the cold start time of Chromium? Or can you warm start it somehow?
- Since scraping often depends on state, wouldn't you hit a timeout doing longer scraping joba?
So usually with Lambda, you want your jobs to be as atomic/quick as possible, as Lambda is stateless and has a maximum duration of 15 minutes.
As for the warm up times, the decompression of Chromium with Brotli takes about 700ms on a 1.5GB Lambda (this is faster than Gzip/Zip). Launching Chromium itself and opening a new tab takes another 400ms or so. If you keep your Lambdas warm (by registering a scheduled ClowdWatch event every 15 minutes for instance) your startup time will effective be those 400ms.
Depending on your use case you can also disable security and open as many iframes as you want in a single tab. Not sure how this compares to multiple tabs though.
Of course you'll run into cold start again when lambda has to scale.
I have been considering moving my pool of chromium workers to lambda functions so we can avoid api slowdowns due to a high number of parsings at the same time.
Are there any other side effects of running chromium headless in a lambda function?