Hacker News new | past | comments | ask | show | jobs | submit login
Consistent Selenium Testing in Python (chrxs.net)
61 points by cdubzzz on Sept 2, 2017 | hide | past | favorite | 25 comments

FWIW the most consistent (as in non-flaky) selenium tests I've worked with used hidden DOM elements to synchronize the selenium driver with the JS code:


Basically every JS ajax call (or JS promise, though we didn't use any of those at the time) would update a hidden DOM element's inner text, e.g. the `#pendingAjaxRequests` inner text would go to 1, or 2, then back to 0 when the AJAX response came back (ideally you do this instrumentation in a single place in your app).

Then on the selenium side, after every button click/other action, we'd wait until those counters went back down to zero.

This post-action waiting (waiting for the app to "settle" after invoking actions) vs. pre-action waiting (...well, this the element I want to click on available yet? Hm...no...) was very reliable.

What do you think of this approach instead:


When I did this type of workaround (I have not needed to write one of these for a while) I found it was a problem for control flow that I needed to know if the action actually started and finished, not just whether it was currently in progress.

I used an .ajax-processing class that I added at the start of the request, and removed when the request came back with a callback.

While I can see the advantage of pendingAjaxRequests instead (you could have more than one going at a given time, and my .ajax-processing breaks down there), it still seems like you could have a problem as the browser JS is not running in the same process as your server or your test executor.

Your next line of code might run before the browser increments the inner text, or starts the XHR. You could pass control flow on thinking that the event has already completed, when in reality it hasn't even started yet. The 2013 article I linked seems to have a more robust model for tracking requests, it's just two variables, but it would seem to handle this problem with perfect reliability.

The approach you linked to looks good to me. I hadn't seen that before; thanks for sharing it!

> Your next line of code might run before the browser increments the inner text, or starts the XHR.

Yeah, that is true in theory, but we never had a problem with that--my mental model was that if the Selenium process issued a click, and it got sent to the browser, any "onclick" behavior would fire within that immediate event loop (and for our app that's where we issued any AJAX calls, immediately within that event loop), tick up the hidden DOM count, before the webdriver code that's sitting within the browser sends back the response to Selenium client.

I've not examined the webdriver impls of each various browser, but AFAICT my theory matches what we saw at the time (in google/firefox with native events enabled).

I'd be very interested to hear from someone authoritatively on whether this theory is actually correct.

Also, if you have a JS-side persistence layer that does AJAX calls in separate event loops, e.g. with setTimeout, then yeah I could see this being an issue and would necessitate a more deliberate approach (like your linked post, which I like).

If your client talks to server, which talks to API, and your server is running in a different process/machine than your client integration tests (in our case, with Capybara.run_server=false) you will find this happening all the time.

The next server operation isn't done until both client-server and server-API communications have finished. And in any case API server is likely to need to confer with a database server before it can return its response, so it will take longer.

There are many good reasons not to run this way, but so far we have not found any better way for our Windows developers to do tests locally than with Capybara.run_server=false set in their environments. (They are using Vagrant and a process from 3+ years ago, and our developer envs need a serious refresh for 2017 imho...) And it helps for the argument that in real life, your servers and your users will not be in the same thread.

If your stack is tightly coupled and your tests run strictly with both client and server locked against a GIL, you will in all likelihood absolutely never hit this case. Even if your server does actually hit remote services, but the server and client thread in testing are joined by a lock, your client thread will wait for the server to return and you will still probably never hit this issue.

It is not a production-facing issue, in any case the kind of errors that you can hit if you arrange your testing infrastructure in a way to expose these problems... are not problems that you will ever see a user complaining about in production. If you don't have this problem, don't go looking for it in other words, because it's a pain! It's been a pain for us, but not by any means impossible to work past.

Our ruby developers on Windows found this happened a lot more frequently than when I ran my tests on MacOS (where I did not need run_server=false.)

I'm running against a local selenium webdriver that was spawned by Capybara's selenium driver natively. (In other words, it's cool when everything is housed in a single process-thread. The way that any sane person would do their testing deployment. But for my Windows users with Vagrant boxes, another way is needed of course...)

We set our Jenkins server up to run similarly decoupling the stack from the client selenium/test integration and by applying resource constraints to make the right parts slow, for us it was absolutely 100% reproducible.

I've done this twice. It's the only way I've ever gotten any sanity out of Selenium.

> I've done this twice. It's the only way I've ever gotten any sanity out of Selenium.

What are the benefits of this approach vs. explicitly waiting for the things that should appear in the page?

>There are a variety of ways of doing this, but you basically need a choke point in your application where all AJAX requests go through.

Does wrapping the XMLHTTPrequest object to get this functionality still count as sane?

Yep! We technically had our own layer on top of XMLHTTPRequest to use a centralized slice point instead, but AFAIK wrapping it directly would be great too.

Interesting! Thanks for the reference. I think all the waiting I have dealt with is server (vs. animation) as well. I'll have to play around with this idea...

sounds like an interesting pattern.

it would be nice to have a browser extension that abstracts the pattern in the browser.

you should skip this post. it is full of anti-patterns.


(1) implicit wait. this will create subtle differences in behavior between drivers. it will also create long pauses when you test for negative conditions.

(2) clear. selenium has built in support for clear[1]. in addition to clear you can send the null key[2] if you want to clear the input midway through a sequence of characters.

(3) time wait. this does not make any sense to me. seems like a clever way to add time.sleep.

[1] https://www.w3.org/TR/webdriver/#element-clear

[2] https://www.w3.org/TR/webdriver/#element-send-keys

I agree that implicit wait is something of a time waster.

Regarding clear, that does appear to be exactly what the Python implementation does[0], but in my experience it just seemed to fail at random. If I remember correctly it worked most of the time with geckodriver, and then hardly ever with chromedriver. Really not sure what exactly the deal is with that...

And yes the time wait is a little pointless (:

[0] https://github.com/SeleniumHQ/selenium/blob/master/py/seleni...

clear() works for me consistently. At the same time, I had cases before when sendkeys was not sending complete string, so I had to compare it with get_attribute('value') to be sure.

I've been working with selenium a lot at work recently. There are lots of little nuances to grok if you want your tests to run completely deterministic: especially on a JavaScript-heavy site.

I try to use explicit waits whenever possible. Luckily, the Java client library provides a rich set of combinators for declaring them.

You can also define your own; they are basically functions from a WebDriver object to some other thing (usually a Boolean).


Oh wow, Java gives you a lot! Python's are actually pretty decent as well, I just came to rely almost entirely on "presence_of_element_located" with our Javascript-heavy app.

Have you ever gotten completely deterministic tests? I'm starting to think it may not be possible.

Indeed, it is possible but requires some effort. Look at this project https://github.com/Skyscanner/pages. Choosing carefully your traits will do the trick.

The best thing you can do if you are developing Selenium tests in ruby (but these lessons are probably applicable no matter what environment you're using) is learn how they changed Capybara. There was a big shift in Capybara around the time that "should" went out of the vocabulary.

What that means syntactically for developers is less important than the semantic change that came with this shift. Tons of the old cheat sheets are still based on the old syntax, and many of them if you dig deep will also give the bad advice!

So many people try browser testing and give up because in some ways it's hard to make the testing consistent. Especially if you use onClick events that load another page, it's so easy to forget that you might or might not yet have loaded that page when the next line of code runs! That browser-side JavaScript has made your integration now multi-threaded.

The hardest one to figure out is when you grab a reference to something on the current page that looks like the element you want, but it's on the previous page... it's not what you want.

Then when you get around to trying to finding nodes within it, or clicking on that node. You've loaded the page you want, but your reference is pointing to a node that is no longer present in the active DOM, because you're on the new page now!

If you're composing reusable steps (like in Cucumber) Always find a way to make sure you've already loaded the DOM of the page you think you're on, before you get any references to nodes on the page. Even if you have to put #some-target-page-node on the target page so you can

expect(page).to have_selector('#some-target-page-node')

That will prove you've landed on the target page, and if you find it sometimes takes a long time, or if it's longer than the timeout, set 'wait' to a longer number of seconds:

expect(page).to have_selector('.some-node', wait: 90)

If your page has waits that long of course, it's most likely going to start affecting your conversion rates, so do something about that...

To clarify...

    set 'wait' to a longer number of seconds:
    expect(page).to have_selector('.some-node', wait: 90)
    If your page has waits that long of course
Wait is different than sleep. Wait retries until the condition is met. This makes it a little bit confusing for "negative presence" tests where you are testing for the absence of a thing, but it ensures that you are not waiting a full 90 seconds for a thing to happen, when that thing has already happened.

This is the greatest, most noticeable effect of the "shift." No more "until".

This is something like explicit/implicit waiting described by the OP's article, although Capybara is careful now to avoid "wait until" you can still get this behavior using expect statements as I showed above. It's much cleaner in capybara[1] nowadays, and I'm actually surprised people work with Selenium WebDriver directly in Python by comparison.

You can also wait explicitly for the AJAX events to complete, although I never do this[2] because it seems to introduce a new source of error: the AJAX completes before we get around to observing that it had started.

This article seems to provide a bullet-proof way to do it though, upon review, because it's simply counting with JS variables in the DOM. You can even start multiple AJAX requests at once and wait for all of them to complete, or just go ahead and move on if you can see they already completed when you first check. No waiting. This is more reliable than adding and removing the ".ajax-processing" class to a hidden element, which you must be careful to observe so you know your AJAX has started, and then observe again as it is removed so you know it has completed. The method in [2] is a bit less "semaphore-ish" and looks more reliable than ways I've used.

Another nice trick is to remove another source of threaded racing: your animations. If you have a test that triggers an animation and you want to disable animations, run once you get to the page:

page.execute_script("$.fx.off=true") # or jquery.fx.off=true

This runs on the DOM though, so it must be run on every page that loads if it is going to be effective. There are also ways[3][4] you can get this script to run on every page rather than running it manually just wherever needed, I haven't tried this, but I'd recommend it if you don't want to spend time by hand tracing down the source of each delay.

I feel like there were about 5 things I had to learn to make my tests reliable, and I am still applying them daily. But once I had the experience of "tracking down the Heisenbug" enough times, I can avoid writing 98% of the bad tests, and tracking down the remaining 2% when they inevitably show up in my test suites as "failed once, then failure went away with no code changes" it simply hasn't been hard to track down the source of the failure anymore, as I now know really well what I'm looking for, what causes this type of failure.

I personally got a bit out of this article, but I'm not sure anyone is going to read this article and just know better how to do browser testing. I think you have to go through this experience of accidentally writing sometimes-unreliable tests, and then figuring out how to fix them. I did not learn without the help of articles like this, though.

The last great advice I found was to learn to mock API responses if your application uses external APIs, but don't over-rely on mocks. We have lots of tests that do hit our APIs and will fail if they are down. But we also have tests that depend on some difficult to replicate condition being met by the API servers. Those mock tests in some cases are more dangerously brittle than the tests that actually hit the API.

My favorite was added recently, the test that shows what happens if the API goes down in the middle of your overnight job. Hint: You want the job to stop running at the first sign that the API has gone down, and trigger alert mail or otherwise signal an error that you can find later.

With WebMock, this test was easy to write! Just send back a 500 error at a time when you know you weren't expecting it.

You can usually use factories to replicate other less unusual conditions, but when your app depends heavily on external APIs, this trick can help greatly to simplify your tests. This particular test would have been impossible to write without mocks. I would not mock more than 50% of API tests though, because it is equally valuable to find out when an unrelated change that is deployed through your CI environment to one of your APIs, will break some part of your application unexpectedly. This is why I do not mock 100% of API-related tests.

[1]: https://www.varvet.com/blog/why-wait_until-was-removed-from-...

[2]: http://cabeca.github.io/blog/2013/06/16/waiting-for-complete...

[3]: https://makandracards.com/coffeeandcode/7503-disable-jquery-...

[4]: https://gist.github.com/keithtom/8763169

... and when you're done with this beginner stuff ... use the de facto industry standard http://robotframework.org/

I wrote a shim layer over selenium using pythons test infrastructure so I can do

    self.setInputById('fizz', 'buzz')
It also maps common operations against the structure of my legacy app, its way better than using the lower level API.

Tests look like pseudo code and are way more grokkable.

Your code available anywhere? (:

It's nothing that impressive but I'll post it on Monday when I'm at work, it's really just a bunch of helper methods to reduce boilerplatr.

Yeah, I would just be curious to see how you organized. I kept meaning to step back and work on abstracting things but always got side tracked with other stuff. It gets real long and messy without that.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact