Hacker News new | comments | show | ask | jobs | submit login
Remote-Browser – A browser automation framework based on the Web Extensions API (github.com)
66 points by foob 3 months ago | hide | past | web | favorite | 11 comments

"The central idea behind Remote Browser is that there's no need to reinvent the wheel when modern browsers already ship with an extremely powerful cross-browser compatible API that's suitable for automation tasks."

Thats already what the webdriver API accomplishes, and its a w3c standard. https://www.w3.org/TR/webdriver/

People still call it 'selenium' for whatever reason, but chromedriver/geckodriver/edgedriver are all implementations of the protocol for each existing browser (safari's might be built in?).

Granted the web extensions API would likely allow for some additional powerful options, but at the cost of some missing features.

Selenium is just a tool built on top of the WebDriver API. One of its main disadvantages is needing to run a complicated proxy program (like geckodriver, ChromeDriver, etc.) built individually for each browser in order to drive your instance. As a result, users sometimes suffer from hard to debug edge cases and other pain points.

They also make interacting with JavaScript on the page a bit painful. For example, injecting JavaScript into the browser with Selenium can be quite an ordeal [1], so you're somewhat limited in what you can do by what Selenium's developers decided to focus on. It also complicates deployments by adding another moving part to the overall equation.

In contrast, the Web Extension API is now part of all major browsers, and makes interacting with different page contexts effortless. To give a sense of the project, we wrote an interactive tour of Remote Browser which runs browser instances on our backend.

[1]: https://intoli.com/blog/javascript-injection/

[2]: https://intoli.com/tour/1

You're writing that users suffer from edge cases from the individual webdriver implementations. My experience as someone working with gecko and chromedriver on a daily basis is that the number of edge cases stemming from browser behavior (such as moving to, clicking, and focusing on elements) is a much more frequent pain than differences in the webdriver implementation.

The Web Extension APi doesn't seem too suitable for browser automation as it doesn't have support for simulating key presses or mouse movements (other than HTMLElement's click method).

Yeah, you can simulate events, but that can be a lot of work (e.g. typing an 'a' key might require you to simulate all of keydown, keypress, and keyup and set various non-standardized properties on them). And that won't even work in a standard text input as isTrusted is set to False on events you generate. Simulating something like a Tab key press will require you writing code to try to replicate your browser's logic in determining what the next element should be.

Why would this be preferable over something like Selenium?

The Remote Browser framework is designed to be an API where interactions occur at a much lower level than they do in a library like Selenium or Puppeteer. In many ways, it's more similar to the Chrome DevTools Protocol than it is to Puppeteer. It's more a tool for building libraries with user-friendly interfaces than it is a user-friendly interface in and of itself. Like you mention, simulating user interactions with the DOM is relatively easy. The idea here isn't that you would implement that sort of thing yourself, it's that higher level libraries will be built which incorporate this sort of functionality for you. We have several projects in the works here at Intoli that fall into this category, and we're looking forward to rolling them out in the near future.

But we can just create/use a JS library that simulates typing. It would work for an app that needs it, or on your automation environment because it's just using the DOM API. Is selenium doing something special beyond triggering those events?

The point is you can't accurately simulate typing without something like Selenium. Neither the standard DOM API nor the additional features provided by the Web Extension API allow you to do so.

You can simulate some parts of typing by mimicking browser behavior on a case by case basis, but there are places where this will be strictly impossible. For example, if you are working with anything that checks the isTrusted event bit, you're out of luck as you there is no mechanism for you to set that to true.

Selenium on the other hand is actually triggering events as if a person physically triggered them. So, for instance, the isTrusted bit will be set to true when you use the send_keys method in Selenium.

Interesting. Do you know how selenium does that?

Looks like it's a feature of the WebDriver API: https://www.w3.org/TR/webdriver/#element-interaction

Cool idea - Web Extensions API are very powerful.

How would you go about installing the extension in a CI/CD setup? Can it be installed on headless Chrome?

Thanks! You can use the extension in a CI setup by first installing the remote-browser extension, and then using it in your tests. You can check out remote-browser's own tests for an example of integrating the project with CircleCI [1].

[1] - https://github.com/intoli/remote-browser/blob/master/.circle...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact