
Using Python and Selenium for automated visual regression testing - seleniumbase
https://github.com/seleniumbase/SeleniumBase/blob/master/examples/visual_testing/ReadMe.md
======
Afton
These examples show the system being very tightly coupled to the actual html.
IME this leads to very brittle tests that fail due to
restructoring/reorganizing/redesigning.

My two cents: I've never seen automated visual regression testing that wasn't
terrible to work with, and where the most common result (by a large margin) of
a test failing was someone would update the expected file/image so it would
pass with the new visuals. It's a hard problem, and one that I've personally
decided isn't worth doing for the customer facing software that I've been
involved with.

~~~
Dowwie
That's the point of testing. The test identifies a break. A person
investigates and either updates the test or fixes a problem.

~~~
Afton
I feel like you may not have read my comment very generously. I am very
familiar with the point of a test. Let me try again to explain.

There is a cost to a brittle test. UI testing suffers from this more than
other kinds because there are many 'plausible' UI arrangements, and as the
product shifts and changes, you need to distinguish

1\. "The dialog moved slightly to the left"/"We refactored the HTML, but it
still looks the same" from

2\. "The dialog is now underneath another element".

Suppose it takes 10 minutes to find, fix, get reviewed, and push the fix,
deploy the fix, and validate the fix. If the number of failures that are more
like the former are RADICALLY more than the latter type of failure, then it is
easy enough to come to the conclusion that the test is not giving you a
reasonable ROI. Perhaps the cost of releasing a latter-type-of-failure is not
that bad, if you can fix it and get it to production quickly. It might be
cheaper overall then the ongoing maintenance cost on a test that would prevent
this failure.

Also, and this is culture and product dependent, but we're talking about this
like it's a single test, when it's usually a suite of tests (or multiple
suites). If they have a 5% failure rate, and 95% of the time its really an
'update the test, this is the new expected', people will stop trusting the
tests, and will take shortcuts. So you may find that instead of spending 100%
of the maintenance costs for the suite of brittle tests, you're spending 70%,
but only getting 20% of the benefit because people become accustomed to the
failures , and once a test is failing, no one will notice that the failure
changed from a 'benign failure' to a "customer-can't use" failure. (Note:
numbers are imaginary, but not crazy).

At one company I worked for, it was so bad, that when we tried to introduce
testing/checkin rigor, multiple developers pulled me over to make me explain
"Why this dumb test is failing on my checkin attempt", and we would look at
the logs and other artifacts to uncover that "It's failing because you changed
something without updating the relevant tests". It took quite a bit of time to
re-train developers used to brittle tests, to respect and maintain non-brittle
ones.

And that is why I am against automated UI testing in general. :)

------
luhego
I remember writing tests in Selenium in the past. Writing them was a horrible
experience and some tests were not deterministic. The same test could fail or
pass randomly.

~~~
ratbeard
We invested a large amount of time in them recently, and I agree they are
terrible to write. We dare not even turn on IE or any other browser, we can't
even keep the tests green in Firefox. Really wish we'd used cypress instead as
its 100x easier to debug and we're not getting cross-browser benefits anyways.

The underlying architecture is a bad design for heavy javascript apps in my
opinion. The roundtrips between the test runner talking to selenium server
talking to selenium driver in a browser and back the other way is slow and so
much can change on the page in between steps in your tests. Cypress runs your
test code in the javascript process of your browser so I believe theres no or
minimal roundtrip lag.

We use `waitFor()` for the UI to stabilize but thats been a hard mental model
for devs to follow and as a result we have tons of unnecessary waits in the
tests which slows them down. Even things like waiting for a loading modal to
disappear before trying to interact with the UI is hard since your code:

`waitFor('.loading-modal', false) // wait for it NOT to exist`

may run BEFORE its even appeared, then fail in the next step when you try
clicking on a button and the modal is there now. You can't wait for it to
appear first to prevent that, as your code may run after its already come and
gone too.

Tons of annoyances or strange behavior like chromedriver doesn't support
emojis in inputs, ieDriver had select boxes set value broken at one point,
setValue('123') on a field actually does something like 'focus, clear, blur,
APPEND 123' so blur logic to set a default value of '0' on your field will
result in the final value being '0123' in your tests… just the worst.

~~~
2rsf
> You can't wait for it to appear first to prevent that, as your code may run
> after its already come and gone too.

While valid that's not typical for many sites- what's th e point of a very
short lived pop up? and even if it is part of your page, you can skip the
"risky" part of the test and verify it otherwise (logs ? side effects ?) or
not at all.

~~~
ratbeard
A 'saving…' or 'loading…' popup or any type of interaction preventing mask is
a common UX pattern in my opinion in javascripts heavy apps.

We didn't care about testing the popup at all, it was just breaking our other
tests in the following way.

In our UI you could click a 'save' button, then a 'saving…' popup appears,
meanwhile the 'save' button goes away and an 'edit' button appears (behind the
popup), and when the response comes back it says 'Saved.' in the ui.

A test for `$('div=Saved.').toExist()` in wdio works, it does a waitFor under
the covers and polls the UI until that text appears. It doesn't care if theres
a popup shown or not.

However moving on to the next step in the tests, `$('button=Edit').click()`
throws an error 'element is not clickable' if the popup is visible when it
happens to runs. Doing multi-command steps like 'check if popup is there, if
not click' in general doesn't work as theres so much latency between commands.
You can inject javascript in to the page that does both checks in the browsers
js process as a hacky workaround.

We did upgrade our webdriver library partly to get waitForClickable() which
based on the name at least sounded like it handle the above, but there were no
volunteers to update the 168 instances where waitForLoader() had spread in the
codebase :/

------
reggieband
I see these kind of tools once in a while and it brings out my jaded side.
About 15 years ago I recall some considerable effort was spent at a company I
worked at for the Quality Engineering team to build a visual regression tool
for a video game UI I was working on. Using some Windows api magic it could
detect buttons and click them to navigate through a UI and then take a
screenshot to do a visual diff. It was the most broken useless thing ever and
after 2 or 3 months of development it was scrapped.

Last place I worked had a team that built a similar system using Selenium and
some image diff. It worked 90% of the time. It was even integrated into their
CI pipeline and the system would email you when it failed. You could make a
change to one area of code and get an email from the system for a completely
separate area that for whatever random reason failed. I once submitted some
code to that project and when I got the email I asked one of the project
maintainers what I should do to fix it. He told me to just ignore it. Their
process was to check the output of the tool on those failures to see if there
was any legitimate problem. When I checked the output myself a large amount of
the output was garbage (about 100 application states tested and at least 10 in
a garbled state).

My own team even tried to integrate Selenium and we even found an external
vendor where you can ship your Selenium tests along with your app and they
will run it against a matrix of browsers of various versions. We barely got a
Chrome version running - no hope of Safari, Firefox or IE/Edge. It was a
blackhole of time just fighting to get an equivalent of hello world running
consistently as a test in all browsers.

One day someone will prove me wrong and get this kind of UI testing working
reliably. But after nearly 20 years seeing optimistic people die on this hill
- I do not support wasting more time on it.

~~~
Dowwie
People have moved on to headless Chrome

~~~
zelly
Puppeteer (js) in particular. But since it's based on protobuf rpc, you can
use the underlying protocol (chrome devtools protocol) from any language.

It's by far the leading player in this space and for good reason. Selenium is
unreliable and too generic.

~~~
Dowwie
There are puppeteer clones for many languages now. It's impressive how
successful the devtools protocol has been.

------
maciejgryka
Very interesting to see this! From the comments it seems that a bunch of
people were burned by Selenium-style testing in the past. There's a pretty
interesting paper/thesis [0] about this which talks abut what kind of
breakages are most common in these scenarios.

Which is why (shameless plug alert) we at Rainforest [1] are working with
crowdsourced testers - humans are still much, much better at visual diffing
and judgement than machines are. Feel free to shoot me an email if you're
interested in talking about it and figuring out how much to automate and how
much to leave to humans.

[0] Why do Record/Replay Tests of Web Applications Break?, Mouna Hammoudi,
Gregg Rothermel, Paolo Tonella

[1] [https://www.rainforestqa.com/](https://www.rainforestqa.com/)

------
forgotmypw1
I wrote a similar tool in Java, and it was effective at picking up minute and
unintended visual changes when testing with many browsers.

My approach was to capture DOM element attributes and store them in a
database, then compare snapshots.

[https://github.com/gulkily/Selenium-
Utilities/tree/master/sr...](https://github.com/gulkily/Selenium-
Utilities/tree/master/src/test/java/org/gulkily/selenium)

------
guest2143
I'd like to see some approval test infrastructure here as well, q.v.
[https://approvaltests.com/](https://approvaltests.com/)

Being able to have a customer accept what the output looks like and then
listening to when the page changes would be great for giving non-technical
people control over passing tests.

------
misiti3780
I think facebook released a similar project a few years ago that I never got a
chance to try but I cant seem to find it within their github anymore.

Anyone remember that project ?

~~~
ceocoder
Is this the one? [https://jestjs.io/](https://jestjs.io/) or
[https://github.com/facebook/jest](https://github.com/facebook/jest)

