Hacker News new | comments | show | ask | jobs | submit login
Puppeteer: Headless Chrome Node API (github.com)
406 points by uptown on Aug 16, 2017 | hide | past | web | favorite | 100 comments



One of the biggest wins here is this little tidbit:

> When you install Puppeteer, it downloads a recent version of Chromium (~71Mb Mac, ~90Mb Linux, ~110Mb Win) that is guaranteed to work with the API.

A lot of the chrome interface libs about at the moment require you to maintain your own instance of chrome/chromium and launch the headless server with your command line, or require a pre compiled version, that can quickly get out of date (https://github.com/adieuadieu/serverless-chrome/tree/master/...). Having this taken care of is a blessing.


There is complexity here you may not be seeing, namely if you are on a platform without X, it will not work.

Normal chromium requires a bunch of X libraries be present. It doesn't use them, but for things like headless testing, it's a massive pain, since the apt-get (or equivalent) is generally many hundreds of megabytes.


X: The First Fully Modular Software Disaster

http://www.art.net/~hopkins/Don/unix-haters/x-windows/disast...


> If you're using the Motif self-abuse kit

hehehehehe


(Hi Tim, I use your GitHub corner :) thanks for making it) - There are still some differences between Chromium and Chrome, for example the case of playing back MP4 video due to paid licensing. But Puppeteer being easily configurable to use Canary or an existing Chrome installation addresses this gap.


Just grab the most recent Chrome/Chromium .deb from the dev/canary/stable channel via a cron job. Piece of cake. We do it for our services.

That said, headless still has a mountain of bugs open against it, so most parties will be best suited by sticking with ChromeDriver for now.


... nightmarejs did that for you, too.


Nighmarejs is based on Electron.


This is great, but it's sobering to see how hard it is to get a nice, "complete" PDF screenshot out of a modern site.

Here's a quick hack: https://gist.github.com/rcarmo/cf698b52832d0ec356c147cf9c9ad...

I'm using The Verge for testing because it lazy loads images, and am being clumsy about the scrolling, but it mostly works - I can get 90% of the images to show on the finished PDF.

What I can't seem to get right, though, is creating a single-page PDF where the page height matches the document height perfectly - it always seems to be off by a bit, at least on this site (mine works fine, _sometimes_).

Anyone got an idea of why this is so?


If I'm reading [1] right, you want to use getBoundingClientRect() instead of clientHeight, which does not account for borders and margins.

1: https://developer.mozilla.org/en-US/docs/Web/API/CSS_Object_...


Well, no, really. Seems like this is due to the way Chrome deals with print CSS and PDF generation rather than any kind of calculations I do prior to invoking PDF generation...


So first there was Selenium's JSON Wire protocol, then came the W3C WebDriver spec and now we're back to browser-specific implementations? As someone who's tried/is trying to automate Firefox/Chrome/Safari/IE in a consistent fashion, my only question is: WHY?


Based on a quick read of the API, my interpretation is that this is not targeting people who are trying to automate every browser, but those who need to automate any browser.

In that context, it's dead-simple to use, and someone with very little experience should be able to get a working prototype in under 5 minutes.

For my use case, it's closer to "wget/curl with JS processing" than "automating a user's browsing experience". I don't particularly care which browser is doing the emulation, with the ease-of-use of the API making the biggest difference.

It seems very similar to PhantomJS, but to be honest, it's more attractive from an ongoing-support standpoint simply because it's an official Chrome project.


Exactly, this looks perfect for taking a screenshot of a page[1], or converting a page to a PDF[2] in just a few lines of code.

If you have an existing web service, this appears suitable for actual production usage to deliver features like PDF invoices and receipts, on-demand exports to multiple file formats (PNG/SVG/PDF) etc., which has quite different requirements compared to an automated testing framework.

[1] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...

[2] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...


+1 to this

The primary use-case for headless-chrome is to support stuff like scraping/crawling JavaScript-dependent sites and services, and emulating user workflows to retrieve data or trigger side effects that couldn't otherwise be achieved with something more low-level (curl, manual HTTP requests w/ Node's HTTP/S API etc).

headless-chrome would be used for the functionality of a server-side microservice, rather than for automated testing of UI/UX, there are already more appropriate projects to achieve that.


If you just need wget/curl with js, you can actually do that with the chrome CLI now. Just run your chrome binary with the arguments --headless --disable-gpu --dump-dom <url>


Chrome DevTools Protocol (CDP) is a more advanced API, and I've been leading an effort on getting multiple browsers to look at CDP under https://remotedebug.org


Get them to actually version their protocol first, maybe?

As someone with a bunch of tooling around the dev-tools API, it's a huge pain in the ass to not be able to tell what functions the remote browser supports. There's a version number in the protocol description, but it's literally never been incremented as far as I've seen.

https://github.com/ChromeDevTools/devtools-protocol/issues/6


File a bug for this.

It should be possible to get the running chrome version and to do feature detection over this protocol.

If it isn't, it's a bug


Did.... did you notice the link to the bug report in my comment?


This looks great, never seen this before. Thanks for your work.


Because WebDriver just doesn't do.

For an example of an impossible task, try to retrieve request headers using Selenium. Browser automation getting more and more complicated the more cases you're trying to cover and in my impression WebDriver is definitely not enough. Who knows, perhaps some new version of WebDriver that I never heard of it will catch up once the functionality gets properly defined.


Yes, I wholeheartedly agree, it was a stupid decision by the Selenium devs not to make request headers etc. accessible. But why throw all the standardisation efforts overboard?


I think this is more "selenium is actively antagonistic to it's major use-case", then trying to throw everything away. There have been multiple attempts to convince the selenium people to revisit their decision W.R.T. headers, and they're completely unwilling.

Given that the selenium leadership is apparently uninterested in improvements, and it's many limitations, trying to improve there is more effort then it's worth.


I didn't follow that development. Can you share why Selenium maintainers chose not to implement headers? Is it that they want to restrict the tool to simulate what a normal user can do with a browser and not hacks such as overriding headers? Thanks in advance!


"Something something something normal users can't do that something something".

Basically, they're still stuck in this idea that they're ONLY for emulating user-available input (and the dev-tools don't exist).

In reality, there is tremendous interest in more complex programmatic interfaces, but apparently they're unwilling to see that, and are instead only interested in their implementations "user-only" ideological purity.


For an example of an impossible task, try to retrieve request headers using Selenium.

Selenium can't do that itself but Selenium can drive a browser that uses a proxy and you can retrieve everything about the request from that. It's a lot more challenging if you're testing things that use SSL but it's not impossible with a decent proxy app (eg Charles).


(I don't have the full history, but from my experience...) When I first started using Selenium, it used browser-specific drivers that communicated directly with bespoke extensions/plugins/add-ons/what-have-yous. Then came Selenium Server which allowed for testing on remote machines. Then Grid allowed for simultaneous testing of many remote machines.

Due to the nature of browser plugins, they could only access information that the browser made available. Also, the dev's have consistently been of the opinion that they only care to simulate an end-user experience. (End user's only care about the webpage's presentation, not which headers were present per transaction.) Although, I suspect that early restrictions influenced that viewpoint.

It wasn't until much later that browsers started implementing their own automation channels; Chrome's Automation Extension and Debugging Protocol, Firefox's Marionette, etc. At the same time, browsers started putting additional security measures around plugins making it even more difficult to have consistent features across Selenium's drivers.

Which is why the WebDriver became an open specification instead various driver implementations. I believe, Microsoft was the first to implement their own driver, InternetExplorerDriver, for IE7+. Then ChromeDriver (powered by Chrome Automation Extension), GeckoDriver (Firefox translation to their Marionette driver), and SafariDriver is now baked into Safari 10.

Marionette is possibly the most interesting, but I lack experience with it. TMK it allows for automating the entire browser; both the webpage and the 'chrome' interface. Whereas Selenium, at best, could only simulate actions - like the back button - through Javascript. But, even with the Marionette's feature richness, you still don't get access to request and response information.

I think for most developers ("devs", "qa", "scrapers", etc.) there's very little appeal in moving away from Selenium because it would require maintaining multiple test suites. It gives consistent results and just works. If you want lower level information, it's fairly simple to either 1) just use a CLI client (curl, wget, etc.), or 2) a library libcurl, Requests(Python), Net::HTTP(Ruby), etc. etc. etc., or 3) setup proxy server. I do all of the above, each has it's own downsides clients and libs don't do any rendering themselves and proxies tend to rewrite transactions (ex. strip compression and add/remove/alter headers 'Content-Type: gzip').


Puppeteer automates headless / visible Chrome today. However, the foundational way it does this is through DevTools Protocol. I believe it might be browser agnostic in time to come. There are various (some successful, some failed) adapters for bridging DevTools Protocol to IE, Edge, Firefox, Safari etc. So I really do think supporting cross-browsers is not an impossibility.

But resonated with your point that there are just so much codebase / automation assets already written. Usually exploring new tools happens when a new project happens, rather than recoding entirely an existing project. For the existing code base, unless contributors from the community writes a parser to translate those to Puppeteer API?


I do a lot of web scraping that requires a full browser for some websites. This sounds perfect for me. I often use Selenium, but it's a lot of complexity and quite buggy if I just want to run chrome (or any one browser, but not all browsers).


Have you tried TestCafe? I have had good experience with TestCafe on a recent project. The development team are very responsive to bug fixing and stability.


Hadn't heard of it before, I have an upcoming project that's going to need some JS scraping, so I'll give this a shot as well as Puppeteer. Thanks!


I think this can be ok. It's pretty obvious that the Webdriver protocol was insufficient. Let browsers "take it home" for remodelling and see if a new standardization effort will happen down the line.


I've found Chromeless to work great for that task. But then again, if this is from the Chrome DevTools team, it's a good guarantee to have.


There's also a docker image for headless chrome automatically updated from the trunk if you're looking to test against the latest -

https://hub.docker.com/r/alpeware/chrome-headless-trunk/

Full disclosure: I'm maintaining the image.


I'm really loving headless chrome so far. I have around 650 tests which are mostly dealing with iframes and popup windows, and they run flawlessly. The first release seemed to have a memory leak which wasn't present in non-headless chrome, but that seems to have been fixed in version 60.

Even sped things up a little versus phantom.


Hi bluepnume, if you are ok, can you share how you manage iframes and popups? I'm using DevTools protocol through websocket directly to communicate with Chrome. Have to do a lot of context handling (frames) and using Target.sendMessageToTarget (popup windows) to deal with them. These 2 features seem to be the harder to handle parts of the interaction layer when doing automation with DevTools Protocol. Thanks in advance!


I'm a little concerned about the confusion in the market of another product in the automation space called "Puppet"-something.

The project itself looks exciting.


I can imagine using Chrome Puppeteer in conjunction with Google Puppeteer https://github.com/google/puppeteer

The latter having been renamed from "puppet" when Google open-sourced it.


I believe they're referring to https://puppet.com

EDIT: Left the tab open too long, beaten to the punch by 20 minutes.


I can imagine provisioning the testing boxes for that whole thing with Puppet. https://puppet.com/


It's funny how open the market for browser automation is. Everyone needs to switch off phantomJS in the next few years and is looking for the easiest option.

Sadly, there's probably no money on the line. But you will get your buggy software used by huge corporations for years to come!


I'm not super familiar with this space, can you explain further why everyone needs to switch from PhantomJS?


Core developer stepped down: https://news.ycombinator.com/item?id=14105489


There was still some commits in early July but haven't seen new updates till date. Maintaining a browser project is mammoth task. Even for an established startup such as Segment you see that NightmareJS has lots of issues unreplied for months. I guess it's primarily because of a potentially large user base and all kinds of edge cases requirements from different users. And add on to that, suppose to work on all OSes. Nightmare.


The important bit:

> Who maintains Puppeteer?

>

> The Chrome DevTools team maintains the library


Does anyone know if it's possible to take a screenshot of a specific DOM element? (instead of the entire page)


Yes using the `clip` option of screenshot(), you can get these dimensions from an element's getBoundingClientRect().


full example of screenshoting a DOM element:

https://github.com/GoogleChrome/puppeteer/issues/306#issueco...


Thank you. That's perfect.


Currently only PDF printing of the entire page view-port is available. File an issue with your use-case so the team can assess whether or not to look into including that kind of behavior.


It is already supported, as Paul Irish said


Looks like the ChromeDevToolsProtocol (which this library uses) can, see the docs here:

https://chromedevtools.github.io/devtools-protocol/tot/Page/...


The API looks nice and clean, but I'm puzzled by this from the FAQ:

> Puppeteer works only with Chrome. However, many teams only run unit tests with a single browser (e.g. PhantomJS).

Is this true? Do teams write unit tests but only test them in a single browser? With test runners like Karma and Testem, running tests concurrently in multiple browsers is easy. You'd be throwing away huge value if for some reason you only decided to test in one vendor's browser.


From my experience most teams only test against a single browser, yes. It's however nice to have the option to switch to another browser when debugging a browser-specific bug.

From what I've seen in practice, for a lot of teams and projects, running tests concurrently across multiple browsers isn't easy as you claim. Once you have a database involved and not a perfectly clean setup, concurrent testing can become a hassle and is rarely worth the effort of fixing.


It's also quite easy to run Chrome and/or Firefox for free in CircleCI or TravisCI. Safari and Edge are non-trivial to test in those environments. (I think you can get Safari if you tell CircleCI you're an iOS app and hack your way over to Safari from there, but I don't know if you can even test Edge in the popular CI stacks.)


For unit tests, yes. (In fact, we use Jest which uses JSDom and Node). For integration or end-to-end automation, we normally run them in a wider suite of browsers.


Has anyone found something similar but for Python? The few I found all seemed to be abandoned or too limited in capability.


I've been working on something similar (Headless Chrome via DevTools protocol) called Webfriend. It is a Python wrapper to the DevTools protocol, as well as a simplified imperative scripting environment which is specifically built for ease of use by people with a technical-but-not-programming background (lovingly called Friendscript).

It's by no means done, but it is functional and I'm hoping to see the project grow beyond a toy if there's interest in the community.

Things to note:

- Documentation is there, but there are gaps (especially w.r.t. the Python API.) I'll eventually get around to wrestling with Sphinx, et. al., but have not as of yet.

- Targets Google Chrome / Chromium 58.x - 60.x. No testing outside of those versions has occurred.

- Is Chrome only for the moment, but may evolve to work with WebDriver and other browser APIs in the future.

https://github.com/ghetzel/webfriend

Also, PRs and issues are welcome, but my time to work on this is limited at the moment (which does speak to the point made elsewhere about corporate backing vs. individual maintainers, but this was built largely to scratch an itch.)


Thanks ghetzel for sharing, nice work! I saw that you've made your own scripting language to make it user-friendly :)

Just went through your Friendscript syntax, the amount of work going into defining that language is impressive....



> Crawl a SPA and generate pre-rendered content (i.e. "SSR").

I've been maintaining (thanks to this team and Headless Chrome) a convenience API based on this feature. Some additional features:

  * React checksums for v14 and v15 (v16 no longer uses checksums)
  * preboot integration for clean Angular server->client transition
  * support for Webpack code splitting
  * automatic caching of XHR content
https://www.prerender.cloud/

and for the crawling or to delegate your single-page app hosting and server-side rendering entirely https://www.roast.io/


Great move! One of the biggest advantage of PhantomJS was easy to use high level API.


How can this work on Cloud Functions/Amazon Lambda/Azure Functions without installing the dependencies each time it has to run?

Haven't done anything before with those serverless approaches.


It probably will not run on AWS Lambda as the version of Node required by Puppeteer is too recent.

From puppeteer readme[1]

> Puppeteer requires Node version 7.10 or greater

From AWS website[2]

> AWS Lambda supports the following runtime versions:

> Node.js – v4.3.2 and 6.10.3

[1] https://github.com/GoogleChrome/puppeteer/blob/master/README...

[2] http://docs.aws.amazon.com/lambda/latest/dg/current-supporte...


sight... I checked Azure Functions and Cloud Functions and they are running 6.x versions of node as well.


AWS Lambda presents a base image ready to be provisioned as multiple instances.

If you are familiar with Docker you can think about Lambda image as a small Docker image, while real work will be done in instances (Docker containers) created from this image.

Usual scenario is to provide ready to go image (with all source code, npm packages being installed, with Chrome Headless plugin, etc). Then AWS/Azure will run VM instances based on the image for almost every function request. Most of the time spinning such lite VMs takes no more than a couple of seconds.


Are there any new python drivers for headless chrome? Selenium seems to still be lacking several features exposed by the DevTools protocol (e.g. print PDF)


Has headless Chrome enabled file downloads yet? (I don't mean navigating to a known url and saving content, but when a site pops open a save dialog)


This issue is being tracked here and moving along fine - https://bugs.chromium.org/p/chromium/issues/detail?id=696481

Default behavior by design is to block automated headless downloads for security reasons. But above issue tries to address this important use case.


This's the Puppeteer issue tracking downloads - https://github.com/GoogleChrome/puppeteer/issues/299


Using puppeteer to call google seach the keywords "chrome puppeteer will return link to this HN page itself. :-)

This github page was generated by a markdown file created by a test.js running in a Puppeteer docker container:

https://ontouchstart.github.io/170817/home/puppeteer-test/


Is there an equivalent of this for C#? I use Selenium a lot and find it annoying. Be interested to see if this is much better.


Under the covers this uses the DevTools protocol which has a number of projects around it: https://github.com/ChromeDevTools/awesome-chrome-devtools/#c...

A C# binding was added just recently: https://github.com/BaristaLabs/chrome-dev-tools


Yes, I understand it's just a protocol but it's a pain writing it by hand especially when I just want to test it out on some existing projects Vs selenium.

Thanks for the binding link. Will check it out this weekend.


Not this, but on the C# front there's https://github.com/lefthandedgoat/canopy to ease Selenium use (Designed for F#, but exposes a C# API too)


Pretty psyched to see this. A lot of my automated testing headaches came from the intersection of using PhantomJS with transpiled code - which makes sense, since the Phantom team was always forced to play catch-up with the browsers being emulated.


Awesome! This will make the Boozang/CI integration much smoother. I was using npm-headless-chromium before, but the Xvfb dependency was error-prone and difficult to setup properly. Thank you very much!


Would be nice if it was possible to change the proxy settings for each request or for each session. Last time I checked it was only possible with the C API.


The request interception API might be able to satisfy your usecases: https://github.com/GoogleChrome/puppeteer/blob/master/docs/a...

Outside of that, Chrome doesn't currently support changing a network proxy dynamically, let alone for individual (and overlapping) requests.


Im looking forward to seeing the pdf printing features develop. They are lacking a few of the formatting features of wkhtmltopdf.


Has anyone found something similar than Selenium Grid, but for managing headless versions of Chrome? Thanks a lot!


This is really nice. Thanks for your work.


Has anyone been able to get headless mode to work with socks5 proxy, yet?


Does anyone know if it supports screen capture for webgl?


Awesome! This will simplify the Boozang Jenkins integration significantly (I was using npm-headless-chromium before, but there were many manual steps, and the Xvfb dependency was annoying). Thank you very much!


I wonder if anyone know a Ruby equivalent?


Very cool. In a previous job we ran our JS unit tests on PhantomJS, using a tool that allowed arbitrary code to be piped to Phantom for execution and redirected Phantom's console to stdout. (We did all this so our tests ran on something resembling a real browser, as opposed to jsdom.) Something similar should be possible with this.


I think simple-headless-chrome is more far along https://github.com/LucianoGanga/simple-headless-chrome


We're fans of LucianoGanga's project (and of Chromeless, Doffy, Chrominator, Chromy, Navalia). I can tell you from personal experience that dealing with the raw DevTools Protocol isn't ideal for a developer writing an automation script, so it's clear there's demand for libraries with this higher-level API.

Would love to know if there's a feature parity concern you have or what you'd like to see from puppeteer (or any of these projects). (My personal feature unicorn: generate not a static screenshot but a proper video of a page session)


Hi @paulirish! I started with your unicorn haha, I wrote only a few lines of code, to use the screencast events to save the frames of a page. Here's the branch with that: https://github.com/LucianoGanga/simple-headless-chrome/tree/...

And here is the thread I'm gonna use to discuss this: https://github.com/LucianoGanga/simple-headless-chrome/issue...

If someone has an idea of what the "ideal implementation" would look like, I'd really appreciate it!


Chromeless allows for the setting of cookies. I noticed in an issue for Puppeteer that it wasn't seen as a high priority. Our use-case (session based cookies) rely on a cookie being set in order for our PDF screenshots to work.

Any chance you could give us some insight on when this may be available?


One of the puppeteer devs tells me he started on cookies last night. You'll see a PR soon. :)


Awesome! So stoked on this. After some Phantom headaches we're really excited to transition to Chrome Headless.


We're mainly using these things for screenshots of existing corporate intranet tools that are then surfaced to slack. Headless chrome is finally giving us screenshots that are actually authentically what you would actually see on the screen. PhantomJS came close but still has issues.


Your unicorn would be lovely though :D


Can you help with one of the existing API's that does the same thing instead of inventing a new one?


Actually the release of Puppeteer is a really exciting development. I've been waiting for some time for something like this to happen. We've seen what happened to PhantomJS (almost 2k open issues and main maintainer stepping down without a successor), NightmareJS (lots of unreplied issues for months, probably the project is not a strategic part of Segment) and so on. In theory it is great for an individual or an established startup to drive a web browser automation project. But in reality, the scope of web browser automation simply gets out of hand very quickly. There are just too much edge cases to support for a fast-changing domain.

Being driven by a large commercial entity actually has a chance of making it work out. With the browser automation tool and the browser dev team being one team, there can be synergies not possible otherwise. When I spoke to CasperJS creator some time ago, I can understand why there will be burnout. Referring to the popular Chromeless project launched less than a month ago, there are already 150+ new issues and 100+ still open, and they already have enough pipelines for a few releases ahead. It can be a nightmare to manage.

There's just too many needs from a large user-base for such projects. I'm speaking from the context of test automation and general browser automation.


Yep. Pat Meenan's herculean / ultramarathon support of WebPageTest is a remarkable exception. (Speaking of WPT, after just a cursory glance at this thread on my phone prior to thumbing this comment, it surprisingly hasn't been mentioned yet? shrug)


I just recently got to know WebPageTest. It even has scripting abilities! I'm just surprised why the project didn't enter into mainstream (in the sense that an average test automation guy like me will know).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: