Hacker News new | past | comments | ask | show | jobs | submit login

So first there was Selenium's JSON Wire protocol, then came the W3C WebDriver spec and now we're back to browser-specific implementations? As someone who's tried/is trying to automate Firefox/Chrome/Safari/IE in a consistent fashion, my only question is: WHY?



Based on a quick read of the API, my interpretation is that this is not targeting people who are trying to automate every browser, but those who need to automate any browser.

In that context, it's dead-simple to use, and someone with very little experience should be able to get a working prototype in under 5 minutes.

For my use case, it's closer to "wget/curl with JS processing" than "automating a user's browsing experience". I don't particularly care which browser is doing the emulation, with the ease-of-use of the API making the biggest difference.

It seems very similar to PhantomJS, but to be honest, it's more attractive from an ongoing-support standpoint simply because it's an official Chrome project.


Exactly, this looks perfect for taking a screenshot of a page[1], or converting a page to a PDF[2] in just a few lines of code.

If you have an existing web service, this appears suitable for actual production usage to deliver features like PDF invoices and receipts, on-demand exports to multiple file formats (PNG/SVG/PDF) etc., which has quite different requirements compared to an automated testing framework.

[1] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...

[2] https://github.com/GoogleChrome/puppeteer/blob/master/exampl...


+1 to this

The primary use-case for headless-chrome is to support stuff like scraping/crawling JavaScript-dependent sites and services, and emulating user workflows to retrieve data or trigger side effects that couldn't otherwise be achieved with something more low-level (curl, manual HTTP requests w/ Node's HTTP/S API etc).

headless-chrome would be used for the functionality of a server-side microservice, rather than for automated testing of UI/UX, there are already more appropriate projects to achieve that.


If you just need wget/curl with js, you can actually do that with the chrome CLI now. Just run your chrome binary with the arguments --headless --disable-gpu --dump-dom <url>


Chrome DevTools Protocol (CDP) is a more advanced API, and I've been leading an effort on getting multiple browsers to look at CDP under https://remotedebug.org


Get them to actually version their protocol first, maybe?

As someone with a bunch of tooling around the dev-tools API, it's a huge pain in the ass to not be able to tell what functions the remote browser supports. There's a version number in the protocol description, but it's literally never been incremented as far as I've seen.

https://github.com/ChromeDevTools/devtools-protocol/issues/6


File a bug for this.

It should be possible to get the running chrome version and to do feature detection over this protocol.

If it isn't, it's a bug


Did.... did you notice the link to the bug report in my comment?


This looks great, never seen this before. Thanks for your work.


Because WebDriver just doesn't do.

For an example of an impossible task, try to retrieve request headers using Selenium. Browser automation getting more and more complicated the more cases you're trying to cover and in my impression WebDriver is definitely not enough. Who knows, perhaps some new version of WebDriver that I never heard of it will catch up once the functionality gets properly defined.


Yes, I wholeheartedly agree, it was a stupid decision by the Selenium devs not to make request headers etc. accessible. But why throw all the standardisation efforts overboard?


I think this is more "selenium is actively antagonistic to it's major use-case", then trying to throw everything away. There have been multiple attempts to convince the selenium people to revisit their decision W.R.T. headers, and they're completely unwilling.

Given that the selenium leadership is apparently uninterested in improvements, and it's many limitations, trying to improve there is more effort then it's worth.


I didn't follow that development. Can you share why Selenium maintainers chose not to implement headers? Is it that they want to restrict the tool to simulate what a normal user can do with a browser and not hacks such as overriding headers? Thanks in advance!


"Something something something normal users can't do that something something".

Basically, they're still stuck in this idea that they're ONLY for emulating user-available input (and the dev-tools don't exist).

In reality, there is tremendous interest in more complex programmatic interfaces, but apparently they're unwilling to see that, and are instead only interested in their implementations "user-only" ideological purity.


For an example of an impossible task, try to retrieve request headers using Selenium.

Selenium can't do that itself but Selenium can drive a browser that uses a proxy and you can retrieve everything about the request from that. It's a lot more challenging if you're testing things that use SSL but it's not impossible with a decent proxy app (eg Charles).


(I don't have the full history, but from my experience...) When I first started using Selenium, it used browser-specific drivers that communicated directly with bespoke extensions/plugins/add-ons/what-have-yous. Then came Selenium Server which allowed for testing on remote machines. Then Grid allowed for simultaneous testing of many remote machines.

Due to the nature of browser plugins, they could only access information that the browser made available. Also, the dev's have consistently been of the opinion that they only care to simulate an end-user experience. (End user's only care about the webpage's presentation, not which headers were present per transaction.) Although, I suspect that early restrictions influenced that viewpoint.

It wasn't until much later that browsers started implementing their own automation channels; Chrome's Automation Extension and Debugging Protocol, Firefox's Marionette, etc. At the same time, browsers started putting additional security measures around plugins making it even more difficult to have consistent features across Selenium's drivers.

Which is why the WebDriver became an open specification instead various driver implementations. I believe, Microsoft was the first to implement their own driver, InternetExplorerDriver, for IE7+. Then ChromeDriver (powered by Chrome Automation Extension), GeckoDriver (Firefox translation to their Marionette driver), and SafariDriver is now baked into Safari 10.

Marionette is possibly the most interesting, but I lack experience with it. TMK it allows for automating the entire browser; both the webpage and the 'chrome' interface. Whereas Selenium, at best, could only simulate actions - like the back button - through Javascript. But, even with the Marionette's feature richness, you still don't get access to request and response information.

I think for most developers ("devs", "qa", "scrapers", etc.) there's very little appeal in moving away from Selenium because it would require maintaining multiple test suites. It gives consistent results and just works. If you want lower level information, it's fairly simple to either 1) just use a CLI client (curl, wget, etc.), or 2) a library libcurl, Requests(Python), Net::HTTP(Ruby), etc. etc. etc., or 3) setup proxy server. I do all of the above, each has it's own downsides clients and libs don't do any rendering themselves and proxies tend to rewrite transactions (ex. strip compression and add/remove/alter headers 'Content-Type: gzip').


Puppeteer automates headless / visible Chrome today. However, the foundational way it does this is through DevTools Protocol. I believe it might be browser agnostic in time to come. There are various (some successful, some failed) adapters for bridging DevTools Protocol to IE, Edge, Firefox, Safari etc. So I really do think supporting cross-browsers is not an impossibility.

But resonated with your point that there are just so much codebase / automation assets already written. Usually exploring new tools happens when a new project happens, rather than recoding entirely an existing project. For the existing code base, unless contributors from the community writes a parser to translate those to Puppeteer API?


I do a lot of web scraping that requires a full browser for some websites. This sounds perfect for me. I often use Selenium, but it's a lot of complexity and quite buggy if I just want to run chrome (or any one browser, but not all browsers).


Have you tried TestCafe? I have had good experience with TestCafe on a recent project. The development team are very responsive to bug fixing and stability.


Hadn't heard of it before, I have an upcoming project that's going to need some JS scraping, so I'll give this a shot as well as Puppeteer. Thanks!


I think this can be ok. It's pretty obvious that the Webdriver protocol was insufficient. Let browsers "take it home" for remodelling and see if a new standardization effort will happen down the line.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: