

An Unusual Case for Cucumber - jfrisby
http://mrjoy.com/2013/09/19/an-unusual-case-for-cucumber/

======
IanCal
I kind of agree with this, but then I see the same tests I've seen elsewhere:

    
    
          Scenario: Editing the nickname of a credential does not change the version
            When the nickname is modified
            Then the edit version of the credential should not change
    

This looks like an awesome test. Simple, clear, anyone can read it. What's
wrong? One of the following:

* It's based on timing and unreliable

* It stores state and the steps can't be reused.

You simply cannot put the line "Then the edit version of the credentials
should not change" elsewhere because the step involves _the previous state of
the system_. This step can _only_ come after this step:

    
    
        Given a credential for a supported vendor
    

These tests all look like they're testing really nice general properties, but
they aren't. They're almost always just checking little properties here and
there, and rarely doing just what the step says. Which could be fine, but the
language doesn't support any idea of specified dependencies. I cannot change a
test without understanding the code behind it.

What you end up with is a big codebase based on regexes, with very heavy (non-
specified) dependencies. You have to be extremely careful not to end up like
this, although the tests look more complex afterwards (but you have re-usable
steps).

I think there's a great place for a quickcheck & cucumber combination.

~~~
jfrisby
I don't quite get how any when/then could NOT be dependent upon previous
givens.

Also, despite the job-centric nature of the .feature, it wasn't timing-
dependent because we mocked Resque to keep an in-process queue that we could
dispatch in whatever order we want. So out-of-order arrival was
deterministically achievable, etc.

And yes, that test is testing a simple property -- it's effectively a unit
test, with greater clarity of intent.

Not familiar with quickcheck, but I will look into it, thanks!

~~~
IanCal
> I don't quite get how any when/then could NOT be dependent upon previous
> givens.
    
    
        Given I am called Bob
        And I change my name to Steve
        Then my name is not Bob
    

None of these steps are dependent on the ones previous to it, and can be
reused elsewhere. The check for "my name is bob" can be used whether or not
the previous step has been called. Your test won't pass, which is fine, but it
will run and do exactly what it says. The step does what it says _even_ if
none of the others are used. It also means I can write a new test using bits
from others without worrying about what's in the source code.

A test that looks like this:

    
    
        Given I am a person
        And I change my name
        Then my name is different
    

Cannot be re-used. The "my name is different" can only be used if you know the
ruby code underneath. "my name is different" may not do what you expect unless
it comes after "I change my name".

What I meant with timing is the only way you can get around storing state is
to have the "my name is different" watch for a change in the name, which
brings in timing issues. I've seen both implementations used, and both caused
problems (the timing one for obvious reasons).

> Not familiar with quickcheck, but I will look into it, thanks!

I thoroughly recommend it. The haskell version is probably the most advanced,
but there are similar versions for most languages (and you can write your own,
I had to for AS3). The idea is you express general properties about your
system, and then it auto-generates thousands of examples (and if you have a
nice library, automatically shrink failing cases for you). For example, the
test above would be nicer as something like:

    
    
        X is a string
        Y is a string
        X =/= Y
    
        Given I am called X
        And I change my name to Y
        Then my name is not X
        And my name is Y
    

Or something like that. This would then generate examples with no-length
strings, crazy unicode characters, long strings, different mixings of RTL
sections, etc. Much more likely to drive out bugs than a test for Bob and
Steve.

Some more useful ones would look like this (the first is a test I've written
before, but not in this format):

    
    
        X is a number >= 1
        Y is an interface element
        Given I am on element Y
        When I press Tab X times
        And I press Shift-Tab X times
        Then I am on element Y
    

A similar version for navigating in a website and pressing "back" to get back
to where you were (to ensure you're not breaking the back button).

These are really simple, but powerful tests. I was most sold on the idea when
I wrote this (not in this format, but this logic):

    
    
        X is a positive integer
        Y is a positive integer
        INSTRUCTION is one of [addElement, removeElement(X), setFocus(X)]
        MENU is an interface
        When I perform Y INSTRUCTIONS
        Then MENU has one focused item or no items at all
    

This then generated thousands of menus of each valid type (vertical,
horizontal, grids, etc) and then called library functions to add or remove
elements, or move focus. Millions of tests overall. I was using this as a test
for my quickcheck implementation, and found it failed. If I set the focus,
deleted all the elements and then added a single new one it wasn't focused.

When I fixed it, a unit test failed. We has previously specified that was to
be the behaviour, but also specified that no matter what there would always be
a focused element (if there were any elements at all). The general test drew
out an inconsistency in our spec because we were forced to write general
rules.

Mixing quickcheck and cucumber has been one of my "Some weekend I'll do it"
projects for a couple of years now.

~~~
jfrisby
So your example still requires that state exist, it's just far more global.
I.E. there must be some prerequisite code that sets up the thing to which "I"
refers such that it's available to any step at any time. There may be some
good patterns for that -- and that approach may be an excellent way to keep
step definitions clean and simple, but I think it's somewhat orthogonal to my
point.

I'm still not with you on the timing thing. By the time that step is reached,
either the state has changed, or it hasn't. We address the need for it to be
able to see "back in time" by storing a history of the state of the object in
a way that is accessible to the steps. I.E. For steps like this:

    
    
      Given a credential for a supported vendor
      When I change the nickname
      Then the edit version should not be changed
    

The step definitions might look like:

    
    
      Given /a credential for a supported vendor/ do
        @thing = FactoryGirl.create(:'credential/amazon', :valid)
        @thing_history = [@thing.attributes.dup]
      end
    
      When /I change the nickname/ do
        @thing.nickname += " meh"
        @thing.save!
        @thing_history << @thing.attributes.dup
      end
    
      Then /the edit version should not be changed/ do
        @thing_history[-1][:edit_version].should == @thing_history[0][:edit_version]
      end
    

(The use of "@thing" is a way to encourage myself to only be talking about one
object at a time...)

Perhaps that's a somewhat obtuse way of handling things -- and there's likely
ways of DRYing up that pattern, if it's worth preserving, but it's proven to
be fairly effective for me in the past.

And yeah, Cucumber might make a good vehicle for that sort of testing.
Quickcheck sounds like a brilliant idea -- although a bit terrifying in the
context of a slow language like Ruby. Would love to get my hands on / build a
tool like that...

~~~
IanCal
> So your example still requires that state exist, it's just far more global.

The state exists only in your _application_ , not in the test. Your given sets
up the environment in the way you want, the "when" manipulates the environment
and the "then" checks that the environment exists in a particular setting.

> I'm still not with you on the timing thing. By the time that step is
> reached, either the state has changed, or it hasn't.

This doesn't apply to your tests. What I've seen before is a step that says

    
    
        Then X changes
    

Which is implemented as

    
    
        previous = X.state
        wait 5 seconds
        X.state != previous
    

This relies on the previous step taking long enough to actually change the
state that the change happens during the wait. All of the tests I was dealing
with were asynchronous.

Your pattern does look quite nice but there are still dependencies that aren't
specified in code. For example, assuming you have a complementary step about
the edit version changing written in the same way.

    
    
        Given a credential for a supported vendor
        When I do something that changes the edit version
        Then the edit version should be changed
        When I do something irrelevant
        Then the edit version should not be changed
    

This would fail, saying the edit version has changed. Contrived, I know, but
the problem with dependencies that aren't specified is that you start having
valid tests that fail because they simply do not do what they say. I'm
strongly in the camp of "If it shouldn't work, it shouldn't build".

My core suggestion for all of this is to write your behavioural tests as a
test script. What does the user do to get to that point, how do they interact
and what's the result. The state exists either in your application (which is
fine) or is explicit in the writing of the tests (Given I log in as person X
rather than Given I log in).

Cucumber is a great tool, but it leaves a lot of things implicit, which to be
fair could probably be solved as a library. I'd have much less of a problem
with all of this if when I wrote an invalid test, something warned me. I've
spent far too much time battling with cucumber tests which lied about what
they were actually testing.

~~~
jfrisby
> The state exists only in your application, not in the test. > Your given
> sets up the environment in the way you want, the > "when" manipulates the
> environment and the "then" checks > that the environment exists in a
> particular setting.

Err, are you asserting that that is what I _am_ doing, or what I _ought_ to be
doing?

And yes, time-dependent code is evil. I should probably add commentary to my
style-guide to explicitly call that out, but thankfully we never ran into that
despite testing of distributed-job-queue functionality, by virtue of having a
queue so simple stubbing its main loop to work in-process in a deterministic
way was trivial.

Your example should be in violation of the style guide for precisely the
reason you state, among several others (blurring of concerns, etc). If my
style guide isn't clear on that point -- that a When should NEVER follow a
Then -- I need to clarify that. :)

I recall a Gherkin-based testing framework that handled the actual step
definitions MUCH differently and much more cleanly but I never got around to
fully switching us over and don't recall the name... :-S

~~~
IanCal
> Err, are you asserting that that is what I am doing, or what I ought to be
> doing?

Ought to be. Your Given and When manipulate the environment _and_ store state
in the test. Your then doesn't check the environment at all, it checks it's
own internal state. I think in your example this is less of an issue than the
cases I've worked with before (usually testing remote running apps).

> And yes, time-dependent code is evil. I should probably add commentary to my
> style-guide to explicitly call that out, but thankfully we never ran into
> that despite testing of distributed-job-queue functionality, by virtue of
> having a queue so simple stubbing its main loop to work in-process in a
> deterministic way was trivial.

Yes, as I say I wasn't claiming that was what you were doing, but it's the
only other way I've seen the same kind of tests written. Before the first
reply, I didn't know which it would be. I'm very glad it wasn't :)

> If my style guide isn't clear on that point -- that a When should NEVER
> follow a Then -- I need to clarify that. :)

My example was contrived, but I think you can probably see the point I'm
making. What the test does is not clear without understanding the ruby
underneath, and by storing state you can have tests which don't do what you
expect because they aren't just querying the state of your application.

I'm aware I've been quite ranty about this, it's mostly I've had to deal with
_bad_ tests written in this style. _Good_ tests written in this style seem
fine generally. I think this is more of an issue with cucumber than the tests
themselves.

> I recall a Gherkin-based testing framework that handled the actual step
> definitions MUCH differently and much more cleanly but I never got around to
> fully switching us over and don't recall the name... :-S

I might have a look around. It's something I've probably spent more time
writing about it than it would have spent trying to write something better
(/at least with dependency docs, maybe quickcheck style).

