
Headless Chrome support in Cloud Functions and App Engine - idoco
https://cloud.google.com/blog/products/gcp/introducing-headless-chrome-support-in-cloud-functions-and-app-engine
======
rmdashrfroot
I wrote a collection of Dockerfiles for images running Python 2.7 or Python
3.6 + Selenium with either Chrome or Firefox and using Xvfb for the X display
(necessary for running Selenium headlessly).

[https://github.com/seanpianka/docker-python-xvfb-selenium-
ch...](https://github.com/seanpianka/docker-python-xvfb-selenium-chrome-
firefox)

Using this, in conjunction with AWS Step Functions, Lambda, and ECS, it became
merely cents a month to run a headless scraper task in the cloud.

~~~
lars_francke
Can you elaborate a bit? Sounds interesting. I had never heard of AWS Step
Functions before.

What does your workflow look like?

~~~
tjbiddle
Not OP, but to elaborate on AWS Step Functionss:

In short - this gives you the ability to pass the output of one lambda
function to the input of another lambda function.

An example of one that I've written to regularly create a new copy of our
Production RDS database in Ireland as a Staging RDS database in Oregon.

1\. Cloudwatch Event starts Step Function on the 15th

2\. Copy last Production snapshot from Ireland to Oregon

3\. Restore this snapshot as a new RDS instance (It will fail until the
snapshot is available and retry with exponential backoff - this is a step
function feature)

4\. In parallel:

    
    
      - Add tags to the instance (Once it's available)
    
      - Delete the snapshot copy (When finished restoring)
    
      - Modify the new instance with security groups and subnets
    
         - In parallel:
    
            - Run a SQL query to anonymize all of PII columns for GDPR compliance as data has now left the EU.
    
            - Call out to the Cloudflare API to update our DNS entry with the new RDS endpoint.
    
               - Delete the old Staging database instance

~~~
ocdnix
Do you run into problems with Lambda's 5-minute maximum execution time for
those kinds of operations? I'd like to do something similar to this for both
RDS and DynamoDB, but the execution time will often surpass 5 minutes, meaning
I'd have to run a Step Functions worker on EC2 or ECS. That opens up a whole
bunch of complexity with managing the worker code and its deployment, which
I'd rather avoid if possible.

~~~
tjbiddle
With the current implementation; no problems hitting the limit. As mentioned
in my below comment, our query for anonymization would be the heaviest - but
it's designed to be quick as we don't care about unique values for most data.

If we did though - Fargate is a great solution for it, but you wouldn't be
able to feed data back into the next step without some additional complexity -
Maybe have the next step pull an SQS queue, or an S3 file, or look for a
database entry, etc. as it's next bit of data that it needs - and just fail
until it finds it, and once the Fargate (Or whatever) has done it's job and
placed it in your method of choice, then it could continue.

------
mostlystatic
I tried to use Headless Chrome on Cloud Functions for a project I'm working
on, but even on the fastest instances loading pages was sometimes really slow
(pages timing out after waiting for 60s).

It seems sometimes JS execution was taking a long time, so I guess that was
preventing requests from being made. In a single CPU cloud function you have
network requests, JavaScript execution, rendering, and the Node process
controlling the browser all competing for resources.

That being said, it was super simple to get started!

~~~
python999
Do simple hello-world HTML pages render ok? I found that rendering was slow-
ish but totally acceptable for reasonably heavy HTML pages, so long as we
weren't flooding page re-renders (eg by using React without any optimisation
of render calls)

~~~
mostlystatic
Yes those were fine.

------
iamjustlooking
Among other things with puppeteer we do screenshot generation using GKE on
Google Cloud @ [https://screenshots.cloud/](https://screenshots.cloud/)
scaling up and down running instances depending on demand. We keep browser
instances running constantly as the startup time is significant. I will be
interested to see what the startup time is for puppeteer on this, will
definitely be giving it a try.

~~~
exikyut
Nice reference immediately above "used by us" :)

One completely unrelated thing. On Chrome 68.0.3440.84, I noticed the large
icons (particularly the Kubernetes one) looked "weird", with jagged edges that
didn't make any sense. Some poking with the devtools revealed that 'backface-
visibility: hidden' seems to be disabling antialiasing.

Suggest opening the following in new tabs so you can flip back and forth
between them:

\- As is right now:
[https://i.imgur.com/nYzsukI.jpg](https://i.imgur.com/nYzsukI.jpg)

\- Nicer-looking:
[https://i.imgur.com/GNlvx7Z.jpg](https://i.imgur.com/GNlvx7Z.jpg)

I noticed disabling this has an effect on the animation at the top (the edges
of the moving webpage slides don't have constantly-moving jaggies).

There may well be a valid reason you have this enabled, perhaps for added
performance. Or perhaps React added it in for you? :P

~~~
iamjustlooking
Thanks for spending time to let me know because I don't think I would have
noticed it otherwise! I can't see it on retina but I can on my non retina
display. We'll have to make the CSS rule more specific. As to why it's there I
believe Firefox 57 or around that version had an issue with the sliding
animation on the top of the page causing images to tear or not render at all
when they scrolled in. This bug must have been solved recently because
disabling backface-visibility on the image doesn't cause the same tearing.

~~~
exikyut
I was very curious what was causing the non-antialiasing, it was fun.

And you can repro :) cool. Makes a lot of sense you can't see it on retina.

Interesting FF bug you hit. CSS3 GPU-accelerated animations are incredibly
complex... heh, adding the rule fixed Firefox ~57, but now Chrome 68 is
glitching out _because_ the rule is there. I wonder if Google realizes yet.
_Ponders complexity of creating minimal testcase, versus waiting for someone
else to notice :P_

------
bergie
Nice to see this concept get into the big cloud platforms. We built something
similar couple of years ago, primarily to get a sandbox for some compute jobs
we were running on Heroku:

[https://github.com/flowhub/jsjob](https://github.com/flowhub/jsjob)

------
k__
Does this mean that HC is preinstalled in the runtime?

Becauee as far as I know you can already run HC with other FaaS solutions, but
having this out of the box would be really nice.

~~~
tnolet
Running Headless Chrome / Chromium is a bit of a hassle on AWS Lambda and
other FAAS providers. Chrome requires some specific bindings/binaries to work.
I think the Chrome guys and girls convinced their Cloud coworkers to provides
these in the underlying Linux machines that run the Cloud Functions.

~~~
mylesborins
exactly this. The base operating system comes with the system libraries
necessary to support headless chrome out of the box.

(disclaimer: I work for Google Cloud)

~~~
giancarlostoro
What's to really stop other providers from doing the same thing though? :)

~~~
manigandham
Nothing. It's about developer convenience, not technical possibility.

------
mrskitch
If Google cloud ain’t your jam then checkout browserless
([https://browserless.io/](https://browserless.io/)). It can be considerably
cheaper under certain situations, and we’ve been up and running for almost a
year. Happy to answer questions if anyone has any.

EDIT: We’ve got stuff on GH:
[https://github.com/joelgriffith/browserless](https://github.com/joelgriffith/browserless),
and startup is under 100ms most of the time. Fonts and other things “just
work” as well, plus there’s a slew of REST APIs for common stuff as well.
Selenium webdriver support landing soon!

~~~
jotto
...and if browserless ain't your jam, checkout
[https://www.prerender.cloud/](https://www.prerender.cloud/)

cheap, because we optimized for just 3 things:

pre-rendering, screenshots, or PDFs for $0.000365 per API request

    
    
       curl https://service.prerender.cloud/screenshot/https://google.com/ > out.jpg
    
       curl https://service.prerender.cloud/pdf/https://google.com/ > out.pdf
    
       curl https://service.prerender.cloud/https://google.com/ > out.html

~~~
Operyl
Pricing is in tiers, not per request as you seem to imply?

~~~
jotto
Correct, only the final tier is variable rate; apologies for the confusion.

Under 20,000 monthly requests = $9 flat rate ($0.00045)

Under 100,000 monthly requests = $40 flat rate ($0.0004)

>= 100,000 monthly requests = variable rate @ $0.00036/req

~~~
Operyl
Yeah, that's completely different than what you were trying to imply. Still a
cool service, though.

------
schappim
I wish Google would support Ruby!

If you do too checkout the petition over at [https://www.serverless-
ruby.org](https://www.serverless-ruby.org)

~~~
mrskitch
Something somewhat related: I think more Cloud providers need to start doing
Docker-functions. It’s always going to be a waiting game for runtimes and
upgrades. Docker is portable and can run just about anything, so why not
support that?

[https://zeit.co/blog/serverless-docker](https://zeit.co/blog/serverless-
docker) is an example of what I mean

~~~
hugelgupf
See the "Serverless Containers" section on [https://www.google.com/amp/s/gweb-
cloudblog-publish.appspot....](https://www.google.com/amp/s/gweb-cloudblog-
publish.appspot.com/products/gcp/cloud-functions-serverless-platform-is-
generally-available/amp/)

It's still in alpha, so Google is not advertising it wide. It was announced
about a week before Zeit's Serverless docker at GCP Next.

Disclaimer: I work at Google and I've had some involvement in the tech behind
this.

~~~
exikyut
That's an interesting domain: '[http://gweb-cloudblog-
publish.appspot.com/'](http://gweb-cloudblog-publish.appspot.com/')

(it just redirects to cloud.google.com)

------
antoncohen
I think it would be pretty cool to use Cloud Functions as a Selenium Grid,
sort of like Zalenium
([https://github.com/zalando/zalenium](https://github.com/zalando/zalenium))
does with Kubernetes. If you could parallelize end-to-end tests enough, you
could get massive burstable capacity to run parallel tests.

------
wslh
Not perfect but HtmlUnit anyone? I used it for scraping in the past with mixed
experiences.

~~~
ksahin
HtmlUnit API is really cool. It's fine for most use cases. But obviously the
Javascript support is not perfect. [Shameless plug] I wrote a book about web
scraping where I talk about HtmlUnit & headless chrome with Java:
[https://www.javawebscrapinghandbook.com](https://www.javawebscrapinghandbook.com)

------
tnolet
I've been screwing around with running Headless Chrome & Puppeteer on
Lambda/Serverless/FAAS solutions. It's all a bit of a mixed bag. You CAN run
Headless Chrome on AWS Lambda, but the cost involved is pretty crazy as you
need ~1500Mb in RAM to comfortably run any code with Chrome.

Google Cloud of course has "inside knowledge" and I would love to switch to
them for my SaaS [https://checklyhq.com](https://checklyhq.com), were it not
that Google Cloud Functions is just offered in four (!) regions...

~~~
mrskitch
Great to see you here Tim, love that chrome extension! We should chat a bit
more sometime. I’d love to back checkly’s infrastructure.

------
pwaai
that's it...im moving to GCP

sorry but Rekognition rekt it for any type of computer vision on AWS.

Great infrastructure...after all I do have an AWS Solution Architect Associate
certification....which means jack shit

Great move by GCP, I'm also very pleased with Firebase and it's integration
with cloud functions....

BUT my biggest reservation still in 2018 when it comes to serverless is the
cold start up time...

I built a token based API on AWS Lambda and registering, signing up took
forever when the app was not at peak. that was 2014 tho.

~~~
aviv
We use GCP heavily. It's really great.

Latency between Google Cloud Functions and other GCP products improved
significantly in the past week, as did start up and execution time.

However we only use GCFs for background tasks and not for any web API/micro
services, etc. Still too much latency for that use case.

------
guiomie
Ok, I was just looking into this 1 week ago and was gonna spin up a VM to do
headless. Now I get to keep my firebase project in cloud functions only. Much
cleaner architecture.

------
defied
My company provides a similar service, with both Chrome and FireFox headless
support for Automated Testing/Screenshots:
[https://testingbot.com/support/getting-
started/headless.html](https://testingbot.com/support/getting-
started/headless.html)

We run each test in a new VM, running on our own private cloud (dedicated
servers).

Note: we use the Selenium protocol for this, not yet Puppeteer.

------
patd
I'm currently using a headless Chrome for my latest project www.blockedby.com
(still in alpha stage, looking for feedback)

I've been looking at a non-local solution. I'm using Python and this article
hints that Puppeteer is not the only way to invoke this. But I don't see any
documentation on the DevTools protocol.

Anyone knows if it's supported ? Or any providers that do ?

~~~
transreal
You should just need to get a handle on the host and port to use Devtools
protocol to talk to the launcher headless chrome instance. If you use node.js
to start it, you can get the port like this:
[https://github.com/GoogleChrome/puppeteer/blob/master/docs/a...](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#browserwsendpoint)

Then you can use PyChromeDevtools to connect to that host/port:

[https://github.com/marty90/PyChromeDevTools/blob/master/READ...](https://github.com/marty90/PyChromeDevTools/blob/master/README.md)

------
kwerk
I’ve tested HC on GCF and GAE standard with the IO launch. Sadly they’re 3-10x
slower than App Engine Flex (same vm size on App Engine Flex vs Standard).
Even a screenshot of google.com takes 6+ seconds on GCF / GAE Standard vs 2
seconds for Flex. I hope they fix this as spinning to zero is important for me
but the latency is too high right now.

------
nojvek
I don’t know what this means for browserless.io but I hope he still retains a
strong niche and has a desirable product.

------
eknkc
Tried HC just yesterday on cloud functions. For some reason, it runs extremely
slow. Did some comparisons to AWS lambda with similar memory / cpu sizes and
basic “load page and screenshot” jobs would take 2x - 3x more time on google
cloud functions.

I’ll dig deeper soon but this is a bad start.

~~~
leesalminen
Is it still within say 5 seconds?

Do you get the impression there’s some config tuning to do or something you
can’t control?

------
dakom
Can someone please explain what the advantage of running a snapshot service
via GCF would be vs. AppEngine Standard (w/ node)?

~~~
wereHamster
Lower costs. With AppEngine you pay for a whole instance, regardless of how
much utilisation it gets. With cloud functions you only pay for the time the
function is being executed. If your code isn't being executed often, GCF are
much cheaper.

~~~
dakom
Oh I see... AppEngine is also "pay for what you use" but it seems it's rounded
to 15 mins rather than CloudFunctions which is 100ms. Thanks!

------
isuckatcoding
I wonder how feature/pricing compete with Browserless?

~~~
mrskitch
Our small instance ($30/month) is roughly similar to their $44.38 instance on
App Engine. If you were to run a full 10 concurrent sessions constantly, which
a small browserless instance can max-out at, this would cost roughly $427 a
month in Functions. So depends on your use-case

~~~
fefb
For reference, how did you get $427? Thanks in advance

~~~
mrskitch
Running 10 concurrent Google Functions at 1GB/1CPU. Assumes 30 days in a month

~~~
fefb
I am getting a different price from GCP calculator. Almost $2000. In your
example, 10functions por Second, so each function is taking 1000ms to finish.
2592000s in a month * 10invocation = 25920000 invocations month, running with
a 1GB function and 500kb of network bandwidth (out).
[https://cloud.google.com/products/calculator/#id=555e1af7-c8...](https://cloud.google.com/products/calculator/#id=555e1af7-c858-4573-9d8a-a7d87475c236)

~~~
mrskitch
Ah, I wasn’t calculating their other fees like invocations and networking
costs. Crazy it’s almost an order-of-magnitude

------
techsin101
How Google is behaving recently... I can feel it becoming Oracle. I want to
stay 10 miles far from it. Learned my lesson with Google maps.

------
benatkin
I'm not buying it, Google Cloud just moved to node 8 earlier this year as the
post says, but now it's node 10. It's just not good tech, it's unnecessary
lock-in. Docker on anything is better, this is similar:
[https://zeit.co/blog/serverless-docker](https://zeit.co/blog/serverless-
docker)

~~~
chrisabrams
It’s wonderful tech that has saved us thousands a month. The lock in is very
little as it’s running a JS function. We’ve written our functions so that the
export to the GC function merely passes in arguments to another function that
is required. We could move to AWS, Zeit, or anywhere else with little
friction.

~~~
benatkin
Please do, then, it's better for everyone in the node ecosystem if people
aren't running node 4.x in 2018.

~~~
antonvs
Would you be happier if people switched to some other language for their cloud
functions? Node is not that important.

------
jancurn
It's nice to see serverless platforms adding support for headless Chrome. But
there's still one problem with AWS Lambda / Cloud Functions / Zeit Now - the
run time is limited to a few minutes. If you want to run any longer job, e.g.
a web crawler, you need to either spin up the instances yourself or use
platform like Apify, which allows running arbitrary-long jobs, provides pre-
built Docker images for headless Chrome or XVFB, and provides SDK to simplify
state persistence, access to proxies etc.

For example, a simple actor to convert HTML to PDF looks like this:

[https://www.apify.com/jancurn/url-to-pdf](https://www.apify.com/jancurn/url-
to-pdf)

More info:

[https://www.apify.com/docs/actor](https://www.apify.com/docs/actor)

[https://www.apify.com/docs/sdk/apify-runtime-
js/latest](https://www.apify.com/docs/sdk/apify-runtime-js/latest)

[https://www.apify.com/library?type=acts](https://www.apify.com/library?type=acts)

Disclaimer: I'm a co-founder of Apify

