Hacker News new | past | comments | ask | show | jobs | submit login
How to make Selenium tests reliable, scalable, and maintainable (lucidchart.com)
115 points by mjswensen on July 21, 2015 | hide | past | favorite | 67 comments



Selenium tests are inherently slow, unreliable and flappy. They have been the bane of developers for every employer I've had. Do yourself a favor and write React and test your components without a browser driver in good ol' JS with the occasional JSDom shim. It removes almost the entire need for Selenium, which should be reserved for only the faintest of smoke tests. And please, if you have to use Selenium, use headless Firefox, because PhantomJS is very bad software.


I had a Rails consultancy (Makandra) recently work on a JS-heavy application that I happen to own, and they got Selenium singing on it, which had been beyond my capabilities for years. One of their tricks, which you can inspect the implementation of in their (public) utilities library [+], is using basically a vendored Firefox per project and VNCing into that Firefox to drive things around. It is thus off-screen and out of the way when you're using it, but apparently is more true-to-reality than headless.

The test suite they wrote has about ~600 tests and while they're slower than I'd like (2~3 minutes) they've been bulletproof since we got my dev environment configured properly. It includes some fairly complicated interactions, most relevantly around our calendar interface.

[+] https://github.com/makandra/geordi


I had been using Firefox driver in Xvfb but wasn't happy with the performance/stability. So I built a Selenium driver out of Java only (using JavaFX's embedded WebKit) and used a headless JRE windowing toolkit (Monocle). My project is still a pre-release but the headless capability, Java-only system requirement, and its ajax handling might make it useful to some people currently: https://github.com/MachinePublishers/jBrowserDriver


This looks very interesting!

1. How does it compare with phantomJS?

2. What's the current webkit version?

3. How often does the javaFX webkit update?


1. Not quite sure. I've only used PhantomJS via Selenium Ghost Driver. From that usage they're similar. The main difference is that my driver uses only Java so under the hood the JRE is launching WebKit through JNI and everything runs in the same JRE process.

2. Current WebKit version depends on the JRE used. Oracle Java 1.8.0_45 has WebKit version 537.44.

3. Java maintainers will update WebKit periodically, including within a major version. E.g., here they update WebKit for the 1.8.0_60 JRE: http://openjdk.java.net/jeps/239 ... Other than that I'm not sure.


Neat, but Affero public license? Ick.


To clarify--this was something I was interested in helping with, until I saw the license.


Is there any flavor of GPL license you would prefer more? I don't plan on BSD or Apache license in the foreseeable future.


Honestly, if it was useful, I would probably use this my side project which is commercial, but in no way competes with what you're doing (load testing). I might make changes or improvements, and I would generally contribute those back. Affero doesn't "mix well" so it is pretty much a non-starter for me.

I've found this is true for a lot of projects and it seems like restrictive licenses prevent projects from going mainstream.


I have trouble with Selenium's failure rate too. End up writing a test engine in Javascript. It handles async js func call with js function callback when finished to get rid of all the sleep-wait-retry type logic in Selenium.

Works very well. I can run the 100+ test cases in all IE/FF/Chrome/Safari, ios/android browser without change one line of JS/Test code. Runs fine with desktop with wire connect to cellphone browser on cell connection.

It tests out all the app backend db logic also. The time/pass/fail info are submitted back to the test backend db.


Are you planning to open source it?

Can you give an example of how it works? Say navigate to a page, fill in a form, click submit and verify that some text is present after submitting the form


Does it really? Does it test for whether a button you thought was present isn't actually clickable?

If you're going to write tests, I think it makes an insane amount of sense to emulate real world conditions as much as feasibly possible (making judgement calls on things that don't matter like speed of the mouse).


Most Selenium tests don't test that a button is actually clickable though, they find things through the DOM and if the button is offscreen or hidden they won't realise it.


In my experience, an exception is thrown in my Selenium test if a button is hidden or not clickable and I try to click it.


This has been our experience as well. We invested a lot of time and money in making sure Selenium tests run reliably for our clients. Despite this, the best reliability we managed to achieve was 90% with tests that run for 40 minutes, which is obviously not acceptable.

We have compiled a few tips we learned along the way in our blog post - http://novoit.eu/blog/05-5-tips-when-writing-Selenium-browse...


>the best reliability we managed to achieve was 90% with tests that run for 40 minutes, which is obviously not acceptable.

What was actually going wrong during that 10%?

I get something closer to 100% reliability, so I'm feeling a little perplexed by all of this.

Do you make heavy use of sleeps?


Mostly these would be cases where the browser would, seemingly at random, end up in an unpredictable state and all proceeding test scenarios would fail because of this. (Page is white, or a completely unrelated website gets opened. We have seen lots of weird situations so far)

This might be exacerbated by the fact that we use the remote Browserstack Selenium hosting service so that the tests can be executed automatically as a part of our deployment process.


White page and randomly ending up on an unrelated website both sound like bugs.


>tests that run for 40 minutes

This is pretty good actually. It sucks if you're relying on Selenium testing for verifying your code as you're writing it, but before and after deploys to staging and production? This isn't bad at all.


40 minutes from clicking a button to deploy is actually abysmal, especially when you need to worry about things like rollbacks, or deploying at the end of the day, or releasing quick hotfixes to users. In modern build processes, even 10 minutes seems too long.


For testing Mithril.js, I wrote a mock window object, which allow you to do things like simulate requestAnimationFrame clicks, JSON-P calls and browser quirks from non-browser environments (e.g. from a Node.js script). So to test, you simply swap `window` with the mock and you can drive your fake browser however you wish.

http://lhorie.github.io/mithril/mithril.deps.html

You can cover a lot of ground with that approach and make an extremely fast test suite that is suitable for a save-refresh-test workflow and then you can put trickier tests in a secondary test suite that you only run once in a while (e.g. before a commit)


Can you expand on this?

The extent of the testing I'm currently interested in is "load a page, does the JS on that page run without error"? It won't execute from a CLI and everyone I talked to pointed me at Selenium.


Testing for a page load without JS error is a fine use case, and is an example of what I meant by the "faintest of smoke tests." It's a test that has very little chance to flap, fail, or force you to write hacky commands around Selenium's unreliable API.


I pretty much agree with your thoughts but what makes you say PhantomJS is bad software?


I currently manage a rather large test suite (around 700 different tests) using Selenium, which is all written in Ruby and Rspec (although I've also used Cucumber), and uses the gems Capybara (an abstraction layer for querying and manipulating the web browser via the Selenium driver) and SitePrism (for managing page objects and organizing re-usable sections).

The entire suite runs in around 10 minutes on CircleCI, using 8 parallel threads (each running an instance of the Firefox Selenium driver), and it is rock solid stable.

It took us a while to get to this point, though.

The hard part is handling timing due to Javascript race conditions on the front-end. I had to write my own helper methods like "wait_for_ajax" that I sprinkle in various page object methods to wait for any jQuery AJAX requests to complete. I also use a "wait_until_true" method that can evaluate a block of code over and over until a time limit has been reached before throwing an exception. Once you figure out ways to solve those types of issues, testing things with Selenium becomes a lot more stable and easy.

I have also used the exact same techniques (page objects, custom waiter methods for race conditions, etc) to test mobile apps on iOS and Android with Selenium.

It can be a challenge, but once you have a system down and you know what you are doing, it's not so bad.


The most annoying thing I found with Selenium was that it wouldn't wait for the browser to respond to click events and rerender.

The approach in the blog post (and I think elsewhere ... not sure) is to poll the DOM with a timeout.

Is there a better solution to be add with something like `executeScript`? You could run `requestAnimationFrame`, and then poll for an indicator that the click, etc. handler has indeed finished. That way if it fails, you know about it pretty soon, without the need for long timeouts. This is all just a guess though.


>Is there a better solution

Yes. And it's pretty simple:

    WebDriver driver = new FirefoxDriver();
    driver.get("http://somedomain/url_that_delays_loading");
    WebElement myDynamicElement = (new WebDriverWait(driver, 10))
  .until(ExpectedConditions.presenceOfElementLocated(By.id("myDynamicElement")));
From : http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp


I'm not sure this satisfies your parent poster's requirement of: "if it fails, you know about it pretty soon, without the need for long timeouts."


Well, you need to have a timeout.

You can make the timeout shorter when running the test on a dev environment, though, so you get quicker feedback about errors.


Ruby's Capybara encapsulates Selenium and waits until elements appear on the page (the default timeout is 2 seconds). So you can write simple sequential code like

    click_link('bar')
    expect(page).to have_content('baz')
and it will work even if the baz element is injected into the page by an Ajax request to the server triggered by clicking on bar. I've been using it for many years but I didn't check how they implement it. Maybe a callback from a MutationObserver? https://developer.mozilla.org/en-US/docs/Web/API/MutationObs...

Documentation at https://github.com/jnicklas/capybara#asynchronous-javascript...


According to that documentation you linked, it just polls until `default_max_wait_time` (which defaults to 2 seconds).


I have had some good results using the F# canopy library(http://lefthandedgoat.github.io/canopy/) for working with selenium. It handles (most) all the waits for you so you don't have to scatter a bunch of sleeps in your tests and is pretty easy to work with.


There's some utility methods like FluentWait, but ultimately they're just convenience wrappers for polling the DOM and waiting.


Nice rundown, wish I had read this a year ago!

> One developer designed a way to take a screenshot of our main drawing canvas and store it in Amazon’s S3 service. This was then integrated with a screenshot comparison tool to do image comparison tests.

I would also take a look at Applitools https://applitools.com/ — they have Selenium webdriver-compatible libraries that do this screenshot taking/upload and offer a nice interface for comparing screenshot differences (and for adding ignore areas). Way fewer false failures than typical pdiff/imagemagick comparisons.


If using Selenium's Python bindings, you can take a screenshot from Selenium and convert it to OpenCV format like this:

    cv2.imdecode(
        numpy.asarray(
            bytearray(base64.decodestring(driver.get_screenshot_as_base64())),
            dtype=numpy.uint8),
        cv2.CV_LOAD_IMAGE_UNCHANGED)
(where `driver` is your WebDriver object, e.g. `WebDriver.Chrome()`).

Then to match that frame against a previously-captured "template" image, you can use stb-tester's[1] "match" function[2] which allows you to specify things like the region to ignore and tweak the matching sensitivity.

[1] http://stb-tester.com [2] http://stb-tester.com/stb-tester-one/rev2015.1/python-api#st...


Everyone in the blogosphere (and at my own company) writing non-app-specific layers on top of selenium suggests that there is scope for a higher level framework that can be used on top of selenium. Or that the selenium api is too thin a layer over webdriver.

Does anyone know of such a project?


I did the exact opposite. I ripped out some robot framework tests and replaced the code with python using selenium webdriver. Works great.


can I ask why you decided to do this? The tests were just flakey while in robot framework world?


I absolutely hated the robot framework. The DSL was just horrible to use. It had weird, unnecessary syntax quirks and gave you the minimal amount of information if something failed (wouldn't tell you which line number it failed on, for instance).

The tests were also flaky as hell but that was more to do with poor environment management. That, admittedly, was also easier to fix in python.


You might find Site Prism interesting: https://github.com/natritmeyer/site_prism (there are alternatives such as http://watirwebdriver.com/page-objects/ and https://github.com/sensiolabs/BehatPageObjectExtension, but I have no experience with them).

It provides a "page object model" implementation on top of Capybara, so you can define a model for each page you want to test, which stores the page's relative URL, and has references to all the elements on the page you care about, and methods for all the interactions you want to do with that page.

So for example, you might have a "LoginPage" model, which contains the following:

  class LoginPage < SitePrism::Page
    set_url "/login"

    element :username_input, '.username-input'
    element :password_input, '.password-input'
    element :submit, '.submit-button'

    def login(username, password)
      load # Load the page URL in the Selenium instance
      username_input.set(username) # Fill in username
      password_input.set(password) # Fill in password
      submit.click # Click submit
    end
  end
Then whenever you want to login from one of your steps, you can just do:

  login_page = LoginPage.new
  login_page.login('whoever', 'what3v3r')
I think it's a nice abstraction as it allows more experienced test automation developers to build the page model while less experienced ones can write steps just calling the methods. You still have to pay a lot of attention to things like appropriate use of "wait for element to appear" rather than "sleep", and ensuring tests use isolated data, to get it working reliably, but we've got it working pretty well at my current place.

I should write up how we have it set up at some point as we have our own app-specific framework on top of SitePrism which provides some useful abstractions to make it quicker to develop tests.


I'm just getting into Play Framework development, and they ship with FluentLenium, which seems to add some a more friendly API and convenience functions. Nothing too fancy, but just looking at the pure-Selenium coffee examples people have posted here shows how dramatic the effect can be.

The one downside is that the developers only seem to tag official releases once in a blue moon; despite the github repo being well updated, the last push to Maven was more than half a year ago, and so depends on a rather old version of Selenium.


I just write my own layers on top of Selenium (with python)

This one is a rough test automation, mostly used for filling in forms etc during development http://kopy.io/LMBKt (old one but to hand) handy to be able to open, login and fill in a form in a few seconds that by hand would take minutes.

I find that way works as the abstraction is only one level removed and I can just throw in methods that relate to that project.


Yes!! http://heliumhq.com (commercial; I'm one of the people behind it)


Have you looked into BDD tools like Behat/Behave/Cucumber ?


Here's the presentation the post is based on: https://www.youtube.com/watch?v=5K6bwikZulI


The PageObjects tip is a really good one. Previously using Selenium you end up with a complete maintainability nightmare.

I used Geb on a recent project, and I actually felt that the tests I built demonstrated a passable level of engineering discipline. However, Geb was really hard to learn (partly the error messages were really confusing/missing) and you're still on top of Selenium so you still get wacky exceptions and edge cases.


Also, by switching from Java to Ruby ecosystem is one way to improve your selenium tests. For start, use watir-webdriver and page-object gems.


Improve them how though? Speed? Reliability? If it's just a nicer API, that's all well and good, but until the key problems I face with Selenium are solved (slow and non-deterministic tests) then a nicer API to it is just rearranging deck-chairs on the Titanic.


It seems that you try to use selenium 2, or webdriver, in order to run your unit tests. Selenium is for browser test, and by its nature it can not run in milliseconds. Its execution time is in seconds. Even when use phantomjs webdriver. It is integration testing approach because it combines execution of several javascript modules. That run in real browser. Selenium has its purpose, but fast test execution is not one of them.


> It seems that you try to use selenium 2, or webdriver, in order to run your unit tests.

Nope. Integration tests. But integration tests that start a Firefox instance from scratch and have to be rerun multiple times to pass due to non-determinism are slow.


Could you please provide one example of non-determinism? I would like to understand what exactly do YOU mean by that term.


Using Capybara alone one gets most of the stuff they had to implement (page, with, retries, ...) but I'll look into those gems you suggest.

Maybe the Scala ecosystem is still immature on the side of integration testing. They could implement them in Ruby if they are familiar with the language. I don't feel OK about using two languages but at least it could enforce strict separation between integration testing and the application.


Some very good information in this article. It is true that Selenium has its quirks, retrying a failed test can sometimes result in a passing test.

Disclaimer: I work for https://testingbot.com : at my work we offer our customers automatic retries when a test fails. Writing a Selenium test does take its time, but once you run it in parallel across hundreds of browser and os combinations, it's worth it.


>retrying a failed test can sometimes result in a passing test.

This is usually a sign of either a buggy test or buggy code.


I wonder if there are stories about running Selenium tests in production. Something in the lines of semantic monitoring (http://www.thoughtworks.com/radar/techniques/semantic-monito...)


Great to see a HN post on testing, they seem few and far between to me!


BrowserMob, that was a sweet service (based on selenium). Does anyone know what happened to those guys after they sold? I've always wanted to learn more about their story.


I don't know about the entire team, but Patrick and Ivan were at Neustar for a while. They're both at NewRelic right now.


I do find Selenium a overly complicated so thanks for the post.


There is a nice presentation at the bottom with some code examples, too.


tldr; have developers help maintain automation tests


Summarizing away a technical article makes for a not very useful summary.


using it right now for my latest project, it is a nightmare. I have 1100 tests that have to run per night. I'm using PhantomJS. It is such a mess ! ! !


  > getWithRetry takes a function with a return value
  > 
  >   def numberOfChildren(implicit user: LucidUser): Int = {
  >    getWithRetry() {
  >      user.driver.getCssElement(visibleCss).children.size
  >    }
  >   }
  > 
  > predicateWithRetry takes function that returns a boolean and will retry on any false values
  > 
  >   def onPage(implicit user: LucidUser): Boolean = {
  >    predicateWithRetry() {
  >      user.driver.getCurrentUrl.contains(pageUrl)
  >    }
  >   }
At first I didn't get the difference between `getWithRetry` and `predicateWithRetry`, but then I noticed that the former throws an exception whereas the latter returns false. I infer that `getWithRetry` will handle exceptions thrown by the retried function.

In stb-tester[1] (a UI tool/framework targeted more at consumer electronics devices where the only access you have to the system-under-test is an HDMI output) after a few years we've settled on a `wait_until` function, which waits until the retried function returns a "truthy" value. `wait_until` returns whatever the retried function returns:

  def miniguide_is_up():
      return match("miniguide.png")

  press(Key.INFO)
  assert wait_until(miniguide_is_up)
  # or:
  if wait_until(miniguide_is_up): ...
(This is Python code.)

Since we use `assert` instead of throwing exceptions in our retried function, `wait_until` seems to fill both the roles of `getWithRetry` and `predicateWithRetry`. I suppose that you've chosen to go with 2 separate functions because so many of the APIs provided by Selenium throw exceptions instead of returning true/false.

  > doWithRetry takes a function with no return type
  >
  >   def clickFillColorWell(implicit user: LucidUser) {
  >    doWithRetry() {
  >      user.clickElementByCss("#fill-colorwell-color-well-wrapper")
  >    }
Unlike Selenium, when testing the UI of an external device we have no way of noticing whether an action failed, other than by checking the device's video output. For example we have `press` to send an infrared signal ("press a button on the remote control"), but that will never throw unless you've forgotten to plug in your infrared emitter. I haven't come up with a really natural way of specifying the retry of actions. We have `press_until_match`, but that's not very general. The best I have come up with is `do_until`, which takes two functions: The action to do, and the predicate to say whether the action succeeded.

  do_until(
      lambda: press(Key.INFO),
      miniguide_is_up)
It's not ideal, given the limitations around Python's lambdas (anonymous functions). Using Python's normal looping constructs is also not ideal:

  # Could get into an infinite loop if the system-under-test fails
  while not miniguide_is_up():
      press(Key.INFO)

  # This is very verbose, and it uses an obscure Python feature: `for...else`[2]
  for _ in range(10):
      press(Key.INFO)
      if miniguide_is_up():
          break
  else:
      assert False, "Miniguide didn't appear after pressing INFO 10 times"
Thanks for the article, I enjoyed it and it has reminded me to write up more of my experiences with UI testing. I take it that the article's sample code is Scala? I like its syntax for anonymous functions.

[1] http://stb-tester.com [2] https://docs.python.org/2/reference/compound_stmts.html#the-...


Thanks for the comment. We actually originally had a waitUntil function that was basically used for all three of the cases I mentioned above. In some sections of the code, it was just there to eat errors, other sections get some text, and yet others it was wrapped in an assert and needed to return a boolean. This led to chronic misuse around the code (I found 4-5 tests that simply forgot to wrap it in an assert effectively rendering the test completely worthless). The main benefit we got from splitting the methods out was making it clear to developers what it did. Catching all the exceptions thrown by Selenium instead of returning booleans was just an added benefit.

And you are correct, we are using Scala. There are some really cool things about the language, case classes, pattern matching, first order functions, and traits just to name a few.


> This led to chronic misuse around the code (I found 4-5 tests that simply forgot to wrap it in an assert effectively rendering the test completely worthless).

Yes, I've been bitten by that too -- it's too easy to forget the "assert". This morning it occurred to me that I could write a pylint (static analysis) checker to catch that, so I've done just that: https://github.com/stb-tester/stb-tester/commit/5e5bdbb


I'm working for a startup that addresses this by means of a simpla wrapper API: http://heliumhq.com. Human-readable tests with no more HTML IDs, CSS selectors, XPaths or other implementation details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: