Hacker News new | past | comments | ask | show | jobs | submit login
A Case Against Cucumber (8thlight.com)
113 points by gigasquid on Sept 19, 2013 | hide | past | favorite | 110 comments

I've been working in Rails since 1.2, and the vast majority of developers I've met have been of the opinion that cucumber sucks. There's a really vocal minority on the internet though, so you'd get the impression that it's pretty popular.

That said, almost every rails project I've worked on has had the cucumber gem included at some point, just not actively used.

IMHO the vocal Rails community puts too much emphasis on testing. I write tests for my code, but not as many as some would like. The RSpec book in particular is a gigantic load of crap. And I say this as a fan of comprehensive tests! The ridiculous lengths that book goes through in testing that codebreaker game are so detached from reality it makes my head hurt.

Yes, cucumber is terrible. But I mean terrible in the "it not only doesn't solve the problem, it actually makes it worse" sense, not the "it's poorly written code" sense. I bet it's a well-written (and tested!) piece of software, but I haven't actually looked.

Cucumber tries to solve the problem of turning customer requirements into 'real code'. In exchange for that worthwhile benefit, it asks you to implement the most terrible, reg-ex based spaghetti code imaginable.

The problem is that it doesn't solve the original problem AT ALL. And then you are left with terrible reg-ex driven spaghetti code. Like the Jamie Zawinski saying, "now you have two problems".

The lesson here is that software development processes have to pass the 'human nature' test.

The software industry has largely abandoned waterfall development because it just doesn't work well in practice. It doesn't work because people don't know perfectly what they want before they build it. Agile processes usually are much more efficient because they are more closely aligned to how humans solve problems in the real world.

Cucumber suffers from the same issue of being disconnected with reality. In theory, you can find a customer who can give you perfectly written use cases and you can cut-and-paste those into your cukes. In practice, that never, ever works. So let's all stop wasting our time pretending it was a good idea now that it has been shown to not work.

> Cucumber tries to solve the problem of turning customer requirements into 'real code'

No, it tries to solve the problem of narrowing the gap between objective, easily reusable tests and customer-understandable requirements. Insofar as it involves code, "real" or otherwise, that's a means to solving the problem, not the fundamental problem its trying to solve.

> In theory, you can find a customer who can give you perfectly written use cases and you can cut-and-paste those into your cukes.

The theory that "customers" on their own do this is flawed (and seems to be part of the we-don't-need-no-stinking-analysts school of software development theory); that's not a problem with cucumber or tools, its a problem with not having system/business analysts (which term is preferred depends on the environment, but they are the same thing) who work with customers to elicit requirements that the customers own and can validate but which the analysts helps them to shape into the needed structure.

This entirely depends on how you write your step code. If you treat it like production code and aggressively refactor it, so you have minimal reuse of steps, lots of one or two-line steps, calling clean well factored plain Ruby code that actually does the work, you'll be a lot better off.

This does take time to create, but in my experience having acceptance tests written in a form which is readable to anyone is very useful. I've even gone so far as to create a gem to display the features as the 'help' section of a website:


And now you have three codebases to refactor and maintain.

What's wrong with `rspec -f doc`?

exactly. saves so much time. perfectly legible for most.

You weren't the first: https://www.relishapp.com/

Yeah, Relish is similar but not quite the same: it's a separate website rather than an integrated one.

In theory, this is a valuable thing, a bridge between the divergent worlds of developers and managers. In practice, however, I’ve never seen Cucumber used this way. Non-technical people don’t read code, no matter how easy it is to read.

I don't know Cucumber at all but I find it fascinating how this software seems join a long list of hopeful but discarded "human readable coding" systems - both Cobol and SQL were touted back in the day as ways non-programmers could write code and consensus seems to be that the human-readable part just made things more difficult on the balance.

There are many, many people with limited programming skills who routinely do business analysis with SQL, and a whole suite of tools from e.g. Oracle to cater to exactly that audience. So while it's not really "human readable", in a sense it has achieved its goal of letting not-so-technical people do things that used to be reserved to computer scientists.

SQL is nowhere near to being discarded; quite the contrary, for ex. Hive bringing it into the Hadoop ecosystem.

Totally agree about the RSpec book, what a nightmare going through the examples, seems to me like they're trying to convince you that RSpec is good for you, but...I'm already reading the book, so just cut the crap and teach me how to write good tests based on realistic scenarios.

cucumber sucks

Perhaps, but I think more importantly it's just a time sink. You end up writing nearly all of the same test code, only reusing bits becomes more difficult thanks to the abstraction layer.

I believe this is because people don't properly consider who cucumber is for (business analyst), it was developed with a specific use case in mind. Aside from that, writing steps is ugly.

I've only suggest cucumber is extreme cases, and even then with caution, and not before mentioning Spinach as an alternative. https://github.com/codegram/spinach

Spinach operates with plain old ruby objects. They fix the two weakness of cucumber: Step maintainability & Step reusability.

That said, I seldomly use either, because I can easily write the same under one house with RSpec.

In defense of the RSpec book, it is trying to teach a testing methodology. It should err on the side of over-testing.

I have not read the book, but I respectfully disagree teaching a testing methodology should err towards over-testing. Your testing process should intelligently focus on the areas that most benefit from it, such as business domain logic, avoiding the pitfall of trying to cover e.g. your entire UI codebase.

Over testing can be as bad as not testing at all.

Well, I agree. But they're demonstrating testing. There's going to be a plethora of tests.

Do you have one resource (book/guide/blogpost) that you think gives a good explanation of how people should be thinking about testing?

No, but I should write one. Broadly speaking, I think test coverage in the real world, as practiced by effective programmers, is variable. One must use judgement to decide the appropriate level of test coverage. There are so many factors in making that judgement however that it's hard to distill.

The primary heuristic I look for is "Do I have enough tests that I can be reasonably sure these tests will fail if I break the code". That is the fundamental question! This is different than the RSpec books' approach which is more dogmatic and highly structured (you need to test every function, every entry point, pretty much every single thing). The thing is, the RSpec book approach does work! It works better in fact! It also is a colossal waste of time. I'd write tests like that if I was writing a trading platform for the NYSE, but almost no one does that sort of work.

Since I'm in the midst of writing a book (shameless plug http://exploringelasticsearch.com ) I don't have time to writ e that post unfortunately.

Your post makes me feel like you're missing the point of BDD.

The point of BDD isn't to "stop breaking the code". I think "coders" value "code" too much.

The point of BDD is to build understanding of the problem you're solving by enumerating behavioral ambiguities/edge cases up front, before you've invested time and emotion into a particular code approach.

You're using testable _behavior_ to drive the development of your code. More simply - test first, code second. Having a regression suite is a cool side effect.

Of course, this is just how it's "supposed to work." In real life YMMV. Whether or not you think there's much value in that methodology is another story - but if you are going to dismiss something, you have to dismiss it based on a relevant metric.

Yes, you have just regurgitated the rhetoric. But there are a lot of problems which the enumeration you describe is simply the wrong way to build understanding.

See http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s... for an interesting example.

I agree! I like the idea of BDD. I disagree with the rspec book in degree, it goes overboard, too many tests for not enough return.

I think Ryan Bates "How I test" screencast is a great example. It's specific to rails but I've used this approach on web applications in a few different frameworks. http://railscasts.com/episodes/275-how-i-test?view=asciicast

Basically, test the heck out of your domain model layer, and then do full stack acceptance tests above that to simulate the user using the app. If it's a web app use something like Capybara, if its a terminal app or some other interface, simulate that appropriately with some other kind of integration test.

Perfect Software and Other Illusions About Testing by Jerry Weinberg might be a nice start. Or anything by Weinberg


Depends what level of testing, but the Cucumber Book gives great examples as to how to write Acceptance Tests for customers:


The Rspec Book. Even though it advocates Cucumber (I kind of ignored that part), it really explains how to think about tests. How to approach unit tests, integration tests, and testing behavior not implementation. I don't write Ruby code or Ruby Specs, and I found the code and the book completely applicable to PHP and JavaScript testing.


I could be wrong, but the op was asking for a book recommendation beyond the Rspec book, because the root message of this thread says: "The Rspec book is a load of crap."

this is true in general. most internet noise comes from a vocal minority - being part of a vocal minority usually correlates to lack of skill and knowledge and not the other way around :/

you are right. Rails be are very very cautious.

My impression (as a static typing bigot) is that Cucumber style testing is like the crap shoot of dynamic testing (what should I type? What are valid commands? What params do they accept?) made an order of magnitude worse by using regular expressions.

...who thought this was a good idea? (Sorry, being harsh.)

That said, I think the idea is conceptually fine, and, as usual, I think could actually be fairly pleasant with the right (static) tooling applied, e.g. ThoughtWorks's Twist:


Which is a GUI that provides all the nicities of "what commands are allowed", "what params do they take", "oh, refactor this command name from abc to xyz", etc.

I think PM/testing types would love this sort of setup. Granted, it would still take dev investment to get the fixtures/commands/etc. setup. But if you have a huge line-of-business app that will drive a business for the next 5-10 years, I think that's a good investment.

Disclaimer: I've never actually used Twist because it's a commercial product. Yes, I know I suck.

Twist last time I used it appeared to be much more inflexible to cukes. You still had to delve into Java to make it do anything useful so it still has the problems that cukes has.

This is just a GUI to write the TW equivalent of cukes. Personally I prefer vim.

Cucumber has been a great success for our team. We've trained the BA's how to write cucumber scripts, which in turn has done a lot to train the BA's in how to think about the stories and features that they are requesting. The dev will sit down with this feature definition and the BA's, and the feature is groomed into a viable and feasible feature that is testable. We strap these to a web automation (we use specflow to watin), and get a fairly significant return on investment.

Cucumber is like any tool; if used correctly, it's great, and if it's used as a de facto tool, then you're going to struggle with it.

For the most part, in my experience Cucumber isn't worth the effort (one place called it "Encumber"). If you have product managers or BAs who can read code, then it's easier to capture the requirements with more developer-friendly testing tools. And if the product managers or BAs do not read code, Cucumber doesn't usually help much.

Having said that, I did work a gig where the tester would sit with the subject matter experts, and as they would talk, he would capture what they were saying as Cucumber tests, and he would then echo the tests back to them to see if they were right. Afterwards, he would then add whatever other tests he needed (checking weird edge cases and such), and code up any additional fixtures he needed. Then it was just a matter of me getting all the tests to pass. It was a really nice way to work - I knew unambiguously when I was done with something. (It helped that the tester sat right next to me.)

I'm a huge fan of the gherkin format, it let's me sketch out a feature and then work through it methodically, testing as I go. But cucumber has a critical flaw, it forces you to think about code while you're writing. Because step definitions are global you can't just write any old thing, you might fire off a step you wrote previously. This is only a side issue though, compared to the biggest problem, step reuse.

    Given a person "fred" exists
    And a person "ethel" exists
    And a fatherhood exists with parent: person "fred", child: person "ethel"
Did someone mention cucumber steps are written in english? What happened to:

    Given Ethel has a father called Fred
I moved over to Spinach[1] about 18 months ago and I've been much happier. Steps are specific to a feature so I can go to town with shakespearean english. Reuse isn't an issue either, it just happens in the step definitions, not in the gherkin files. Using more natural english tends to lead to fewer larger steps anyway, so it feels a lot less like call-by-regex.

[1] https://github.com/codegram/spinach

Shakespearean English did you say? Something like this?

  Borne o'er the cruel firmament become Ethel
  That Fred hath usurp'st from her the silence
  When lo!  Upon them should harrow a fatherhood
  Of parent, Fred besieged by the burden
  And his daughter the indifferent imposition of child
Sounds great! I'd much rather use that than Cucumber.

TL;DR stop bitching about an acceptance testing tool when you're not using for acceptance testing.

The problem isn't with Cucumber. The problem is your cukes suck. This is probably also a problem in your Rspec/Minitest/whatever tests. If you're using cucumber the way 90% of the cucumber tests I've ever seen (indeed, a great many i've written myself) are written, then you're writing integration tests, and probably crappy ones at that.

  Given a user
  And I go to some really cool page
  And I click on some button
is NOT a domain test. Domains, for the most part, don't have 'users' (unless maybe your domain is drug addiction). Or 'pages' (unless you're in publishing). And they're definitely not about 'buttons' (unless you're a tailor.) [Yeah, you can probably find some domains where those are actual concepts. They're probably not all concepts in the same domain.]

Write actual domain tests for your acceptance tests. Do this regardless of your framework—there's absolutely nothing preventing you from doing it rspec or anything else. Fuck step reuse (gherkin alternatives like spinach are great, specifically because the steps are isolated to the test.)

If you don't want to write acceptance tests, and you feel comfortable with that, no biggie. Lots of people don't. But don't create a straw man out of Cucumber just because you don't understand layers of testing.

This goes for Cucumber fanboys as well. Don't push cucumber into layers it doesn't belong. Its a domain testing tool. Not an integration test. definitely not a unit test. Don't push it on a team that doesn't want it. You can write the same kinds of tests in rspec. Disclaimer: I like cucumber. I am not using it in my current job because it wasn't a good fit for the company. I will probably use it again one day. I still write AT's.


  Given I hate cucumber
  And I post a scree against it
  Then the haters will rejoice
is pretty much the same as this:

  describe "hating on cucumber" do
    it "produces a response" do
    def given_i_hate_cucumber

Arguments based on "X doesn't suck, 90% of people are just doing it wrong" aren't convincing to me. If nearly everyone is using a tool incorrectly, the problem is still with the tool. Maybe the tool doesn't suck, but obviously the documentation and education around it does.

> Arguments based on "X doesn't suck, 90% of people are just doing it wrong" aren't convincing to me.

The argument isn't som much that Cucumber doesn't suck, but people are using it wrong, but that Cucumber does suck for what people are complaining about it sucking for, but that Cucumber isn't designed or promoted for that use.

Its "yes, hammers suck for connecting things with machine screws, but that's not what they're for", not "hammers don't really suck for connecting things with machine screws, its just most people are holding the hammer wrong when they try to do that".

I kind of addressed that issue. If most people have the idea that your hammer is actually a screwdriver, it isn't their fault for misusing it. Someone put that idea there, and you as the creator of this not-a-screwdriver are responsible for the failure of that message.

> If most people have the idea that your hammer is actually a screwdriver, it isn't their fault for misusing it.

Its not my hammer (or that of most of the people, I would imagine, posting in a similar vein), and I'm not interested in fault.

> Someone put that idea there

Yes, people have put out bad ideas of what Cucumber is for -- and the people complaining about it not being good for things it isn't designed for are among those people.

That's why some of us are countering those incorrect ideas, so they don't keep spreading.

> you as the creator of this not-a-screwdriver are responsible for the failure of that message.

Wrong ideas that only some subset of the people exposed to a product get about its use are not solely the responsibility of the creator of the product. Obviously, if you have an interest in selling the product, you are the person that stands to gain or lose based on those wrong ideas and you have a particular interest in correcting it and a responsibility to your business to take any efficient steps to combat that misimpression, but then again, if you are using tools in your business, you likewise have a particular interest in identifying and correcting your own mistaken ideas about tools and a responsibility to your business in taking any efficient steps to correct misimpressions you may have.

I'd agree with that. The education around proper utility of cucumber is definitely a problem.

Proper utility: Do not use.

I've seen people try and fail to use Cucumber effectively.

Now, Cucumber is a really impressive implementation, but it's a fundamentally flawed idea. You cannot pivot your requirements from business talk "As a blah I want X" into functional tests without a lot of shear.

You may as well write your test descriptions in Farsi or Russian for a team that only speaks English. They make no sense.

Functional requirements should set the groundwork, but they shouldn't serve as the template for construction. You convert these requirements into a format and language understood by developers. If you can preserve some kind of mapping between the original requirement and the myriad of things that had to be implemented to make that feature work, you've done something amazing.

Sadly, Cucumber doesn't let you do that.

Your premise is fundamentally flawed if you're using "functional tests" as a technical term (cucumber is not a functional testing tool) and just wrong if you're using it as a descriptor. Just because you've seen people try and fail to use cucumber effectively doesn't mean it can't be done. I would go so far as to say if you can't do it, you should reexamine how both how you're writing your tests, and how you're writing your application.

That basically explains PHP.

yeah, I think the code examples are contrived, but I don't think that's enough to just jump to "your cukes suck," guess I shouldn't be surprised to find a knee-jerk reaction like that on HN.

The damn example you posted at the end of an RSpec test that has a bunch of defined methods IS EXACTLY WHAT THE AUTHOR IS SAYING!

"Cucumber is just a way to wrap RSpec tests with a non-technical syntax. Any supposed benefits of it go to waste because code is only interesting to those who are working in it. Quit writing cukes unless you can honestly say that there is someone reading them who would not understand pure Ruby."

Who exactly are you arguing against?

The authors argument is "You can do this with another tool so tool X is a bad idea." I disagree. The language behind testing is important. When you test code, the language is code; when you test domain, the language is natural language. My advice to teams who can't use cucumber for doing domain testing is to approximate the same thing in rspec, even though its not the best/optimal tool for it. Gherkin (not just cucumber) is NOT just about wrapping rspec tests with a non-technical syntax.

Also, as contrived as his examples are, he still managed to illustrate exactly the problem inherent in most of the cukes i've ever seen: they're about clicking buttons and navigating a browser. That's not domain. (To be fair, the cucumber team didn't do themselves any favors there by adding the web_steps.rb to their installation process way back when—its long gone now but the legacy lives on.)

> This goes for Cucumber fanboys as well. Don't push cucumber into layers it doesn't belong. Its a domain testing tool. Not an integration test.

Its far from the worst integration testing tool available, and I'm not really sure that there is a big difference in the requirements for an integration testing tool vs. a domain/acceptance tool; certainly, the ideal usage patterns into which the tools are embedded for those uses (including who should be writing tests: if you are trying to involve customers in test design, but your trying to do integration-style tests, that is likely to fail hard independently of tooling.)

It may not be the worst, but i don't think its the best either. There's a world of difference in thought and language between an integration test and an acceptance test—IT's are about code. AT's are about concepts. I have a rule on my teams that integration tests are allowed to know about some internals and use them. Acceptance test steps are not allowed to know about anything that isn't an accepted part of the domain vocabulary and the user interface of the SUT.

There's an advantage to this as well, if you practice something that approximates DDD: you can use the same test to hit your domain library as your user interface. The step implementations are different, obviously, but the gherkin test is the same.

Also, I definitely agree with those who have stated elsewhere that customer collaboration is a red herring. I've guided customer discussions with cukes before (e.g. talking about edge cases and aspects they haven't thought about), but never has customers writing cukes been a good experience.

> It may not be the worst, but i don't think its the best either. There's a world of difference in thought and language between an integration test and an acceptance test—IT's are about code. AT's are about concepts.

I agree that there is a big difference (I'd describe it differently because I don't think of ITs as being about code, I think of UT as being about code, ITs as being about architecture, and ATs as being about domain concepts, but that's, arguably, quibbling); I just think its a difference that often manifests more in terms of the organization of the who-does-what-and-how outside of the tool but which, between ITs and ATs, doesn't necessarily make a huge difference in desirable tool features. (Particularly, I think than non-code language for tests can have value for ITs, though the style of the language and its audience is different than for ATs.)

Could you point to more information about domain testing, or even what the domain concept means in this context? Is it about what I've heard described by "SMEs" as "business rules"?

I've been using Cucumber for almost 3 years now, but if I were to start a new project today I'd pick Capybara with MiniTest or RSpec.

When I first started using Cucumber I thought it was cool that "anyone could read the test and understand it"

But that one advantage doesn't really matter that much when everybody in your project knows Ruby and doesn't need the natural language side of it.

> But that one advantage doesn't really matter that much when everybody in your project knows Ruby

Well, yeah, if people who aren't Ruby coders aren't reading your tests, Cucumber is overboard: the motivating use case for Cucumber is acceptance tests where the main part of the test (the part that isn't implemented in Ruby code) is owned by and validated by the customer/user, who presumably isn't generally, except by coincidence, a Ruby coder.

It might also be useful for integration tests where that need to be validated by people familiar with the overall system design/architecture, but not necessarily the language that any particular components are implemented in, for the same reason.

There is some value in using it when everyone knows Ruby: it forces you to get your head out of the code, using language that your customers understand rather than naturally defaulting to codespeak.

I use Cucumber on my own projects when it's just me, for precisely this reason.

I think you're perhaps missing the point in one or two places.

Cucumber helps techies (devs, testers, etc.) and non-techies (product owners, scrum masters etc.) work collaboratively. I don't think the authors of Cucumber ever said that it enables non-technical people to 'read and understand the underlying code'.

Ultimately though, I agree with your conclusion but for slightly different reasons. I use both RSpec and Cucumber daily; they are both awesome and provide a similar end result. I enjoy writing Cucumber features but in certain circumstances it doesn't scale for large/complex applications (in my experience). I think that this isn't a problem for many people though, unless they're doing something wrong 'under the hood'.

I couldn't agree more. The pipe dream is that you have BAs or other non-technical team members write the Gherkin specs and that saves the developers time because "the tests practically write themselves!" (Assuming you're maintaing your giant library of awkward cucumber regex matchers...). In reality it takes much longer to sit down with someone and teach them how to write in Gherkin, all for the goal of having a human readable spec that doubles as test steps. It's much better if they write the specs however they want, in normal english, then you can translate those into normal rspec or test unit tests.

It's a laudable goal, but the abstraction is so leaky in reality it just ends up creating more work for everyone.

Cucumber, gherkins, capybara, handlebars, sinatra... do these goofy names bother anyone else? I'm writing code, not hosting a children's show.

The fact that you could list them all off proves that they are memorable.

I think it more proves that HN users drone on about them.

I honestly love the kooky names you find with most open source stuff. I personally hate it when I get mired too deeply with an enterprise-y company where everything's name is an unmemorable three-letter acronym for some bull shit name that does not mean anything even when it's written out.

Cucumber doesn't mean anything. Sinatra doesn't mean anything. It's no different of a situation, except my clients would look at me like I grew a third ear if I made any mention of them.

Cucumber _does_ mean something: it's about outside-in testing. Cucumbers have a skin on the outside and juicy stuff on the inside.

You may argue that that is silly, but there is some sort of reason.

(None for Sinatra, though. "He's so classy he deserves a web framework written after him.")

lib_ruby_behaviarol_test_runner, lib_ruby_behavioral_test_parser, bracketed_template_language_3, ruby_application_server_16

They sure make it easy to search for related questions.

No, you are alone.

It completely depends how you use it. There's so much misuse of it out there I can understand why people hate it so much. But done right it's an extremely valuable collaboration tool.

I'd recommend the Cucumber Book if you want to read about how to use it well:


I've also blogged a lot about how to use Cucumber well here:


Yeah. Just about everyone I've run into using Gherkin-based testing seems to think it's for step reuse, or for covering code, or for integration testing.

Those are different kinds of tests. Go write them in a unit or a spec or whatever you feel like. Code coverage - covering your ass during refactors - and integration - covering your ass before deployments - are just for you and can skip the regexp.

Gherkin is for collaborating with my customer and making them tell me what the fuck they actually want. 90% of the value happens before any of the steps are ever implemented. A few user interactions will be covered, likely not even all of the possible ones. Little step reuse as focus is on readability.

It's a really specific problem, and so far I haven't seen a superior option aside from Gherkin.

While there is initial overhead in hooking in the tests, the power comes when you have modularized test steps that you can reuse for multiple tests.

At my last company we had QA write all the cucumber tests and someone would hook in the new statements, which is the overhead you suggest. Now if you write 3 different gherkin statements that do the same thing, then that is not optimal.

Like many problems, I don't think the tool is at fault. It does what it claims. Provides human readable syntax for test cases and lets you hook that in however you want.

I think it's a valid point to think that extra layer is unnecessary, but I wouldn't go as far as discounting it and saying that no company has derived value from it.

Like all code, the messes usually stem from how the code is implemented, not the language itself.

Cucumber was most developers first exposure to capybara (IIRC, capybara was developed by the cucumber folks).

Since capybara is so awesome, it gave a lot of people a nice impression of cucumber.

But unless you actually have a customer in the loop writing cucumber tests, just use the capybara directly.

Not a fan of cucumber. I started it, but the extra layer of abstraction was too much complexity for too little benefit. Instead, I think you can make almost as easy to read and understand acceptance tests with plain old rspec.

I just TDD'ly implemented a "user signs up for Direct Deposit" feature, and the acceptance tests look like this:

  describe 'Direct Deposit' do
    before do
      @workflow = DirectDepositWorkflow.new(create(:user))
    describe 'User sets up direct deposit with correct info.', :js do
      it 'user sees confirmation page - individual' do
        expect(@workflow).to have_confirmation_message

Then you can make the workflow spec helper class:

  class DirectDepositWorkflow < Struct.new(:user)
    include Capybara::DSL
    include UtilitiesHelper
    include Rails.application.routes.url_helpers

    def sign_in_and_visit_direct_deposit_form

    def submit_correct_form
      fill_in 'name', with: 'John'
      click_button 'Submit'

    def has_confirmation_message?
      page.has_css? '.success_alert', I18n.t(...)
To me, this is the perfect blend of easy to read, high level specs (serving the rule of cucumber/gherkin), with easy to understand, rspec syntax and ordinary method definitions.

Interesting technique—I'm still a fan of Gherkin in concept (though I don't use it right now) but I do the same kind of testing in Rspec. My variation isn't quite as refined as your workflow though; i'm just defining methods in the describe block. It has (what i perceive as) an advantage in that the scenarios don't hit any instance variables so they're a bit more readable, but i like the way you're doing that.

Cucumber is part of our workflow and, although I don't have a control for this comparison, I think it makes us more efficient. Our product owners write in Gherkin (non-technical != illiterate btw) and can essentially throw the feature files over the wall and any dev can pick it up and get to work. Now, there is conversation around every feature but sometimes only at estimation time.

Our POs also use git (non-technical != vcs illiterate) so we have a history of who wrote/modified what spec when.

Also, we have an additional simple layer of abstraction between feature steps and the actual tests. Most of our step definitions are three line methods that make calls to a ui driver that knows the details of our ui; or to a "given" driver that knows the details of our database/models. This ensures that there is only one place where anyone will implement, say, a user logging in.

  Given(/^I am a logged in user$/) do
while elsewhere there might be

  Given(/^I have signed in$/) do
If the log in process changes, we just edit the given driver.

We generally only write golden path features. Edge cases are handled with lower level testing.

Our feature files stick to specifying business value rather than ui implementation. For example, we don't do this:

  Given I visit the homepage
  When I enter "children's bikes" in the search field
  And I hit the submit button
  Then I am on the results page
  And I see a fieldset with a legend "Children's Bikes"
  And I see a table containing "Huffy Sprite"
We would write it like:

  Given I search for a specific product category
  Then I see the search term I used
  And the results of the search
When a dev needs to work on an app they haven't touched in 6 months, it's nice for them to read feature files like this to get up to speed on what the app is actually supposed to do. Then can then dig into the ui and given drivers to see how it does it.

There is additional infrastructure and PO training involved, but overall, I'd say cucumber improves clarity and communication on our team.

"When I write a Cucumber feature, I have to write the Gherkin that describes the acceptance criteria, and the Ruby code that implements the step definitions."

As someone who recently started using cucumber, this was definitely a frustration at first. But after a couple weeks of writing step definitions, and focusing on making them reusable (using some of the tips here: http://coryschires.com/ten-tips-for-writing-better-cucumber-... ), I ended up with a pretty good bank of general steps that I could cobble together into new features without much modification. The result was something much more readable and reusable than if I wasn't using Gherkin on top of the step definitions.

Be very careful of over reuse of step definitions, and of Transforms. You'll end up with code that's very difficult to understand as your step codebase grows.

I now tend towards minimal step re-use, lots of one or two-line steps calling into nice clean plain ruby classes which actually do the work of my steps.

I've blogged a lot about how to use cucumber well here: http://chrismdp.com/tag/cucumber

Your case, while very valid, doesn't address the original author's concerns related to implementing the tests. In that case I feel as if it's equivalent to using GOTO in my tests. But worse, my labels are regex.

We use cucumber extensively, our cucumber suite is not really that large, and with each new feature it becomes harder to maintain. There has to be some middle ground out there.

I don't find implementing the tests very difficult at all - but I am comfortable with regular expressions, and ruby.

It's no more burden to write step definitions than it is to write RSpec directly. I use SOLID OOP to write my tests, most of the logic lives in regular old methods, and my steps look like:

  Then 'the current subscription payment date should be tomorrow' do
    verify_next_payment_date(Date.today + 1)
Since I start with the Gherkin file, I type less than 10 keystrokes to make the step.

The underlying assumption of this article is that it's easier to read code than it is to read english.

Well even for a developer of 15 years even the most obtuse English can be more easily absorbed than the cleanest code.

The speed of parsing and understanding code is directly related to the amount of time spent writing it and how recently it was read.

English can be read and understood at the same speed every time.

There is also a non-linear relationship between your feature file size and the underlying code they represent, especially if they represent integration testing. A single feature could test code from 10 different files.

For a single developer in an codebase they know and understand well and have recently worked on, Cucumber can act as friction on development.

For a single developer who hasn't worked on a codebase before, or who is returning to a codebase after some time Cuke steps will definitely improve their ability to understand what the code is doing and by providing a regression test suite improve the quality and probably speed of their output.

Code is a language of instruction - not communication, documentation can help, but it is easily forgotten and fails at communicating integration.

Assuming you're focused on writing reusable steps and building up a solid test architecture I don't see cucumber costing you time. You also really shouldn't need to go to code that often to debug a test failure. Write better test frameworks, write tests for your tests, and self report issues in your test framework when they come up. E.g. unable to create test fixture -> this->skipped("reason");

In my experience the ability to just plug and play steps and quickly add additional coverage for edge cases is a huge bonus for test automation.

But again you have to actually engineer things and write reusable steps and intelligent enough test frameworks that you don't need to often dig through the test code.

I think this issue is somewhat similar to how when writing unit tests you'll want to build up a library of reusable custom asserts targeting your domain so you can quickly exercise your sut and validate things. It's definitely an upfront costs to build things up like that but in the long run it pays off very nicely when you avoid taking on too much test debt.

I am biased though since I'm working on a phpunit cucumber/jbehave like port. https://news.ycombinator.com/item?id=6410900

So what exactly does writing tests for your tests look like...? Keeping this in a Cucumber realm, of course.

Probably a tricker task for cucumber than most frameworks. At the end of the day though I expect that you'll want to have some classes that manage fixtures, perform setup functions, interact with the sut etc and the cucumber regex matches just call into those classes to manipulate the sut and validate it.

So tests here are just cucumber or rspec tests that verify the underlying frameworks that cucumber drives. So for example if you have partial object equivalency methods that verify that only specified fields on two objects match you would write tests to verify they are correctly matching or failing to match two objects as expected before you use that in your cucumber tests.

Disclaimer: I'm not for or against the use of cucumber.

The second claim against cucumber / gherkin is not it's fault, it's the toolset's fault. Using grep to find your step definition is not very effective.

If you use a tool like Rubymine, you can jump directly to the applicable step definition through one key combination.

You should read http://catb.org/esr/writings/unix-koans/gui-programmer.html

In practice, you'll find grep is in the toolkit of far more Ruby programmers than the IDE Rubymine. And with good reason!

The real problem is step definitions with regex in them. You don't need to do this. Steak was written to avoid this problem: https://github.com/cavalle/steak

But the underlying issue described in this post is still valid - Cucumber is an unnecessary layer because it adds no value that you don't get with Capybara and your favourite testing framework.

Thank you for the heads up about ruby devs liking *nix tools and not ide's.

The fact of the matter is grep is a poor tool to try to find your step definitions. There are tools out there that are much better for such things, vim with http://www.vim.org/scripts/script.php?script_id=2973, or an ide or <your favorite tool>.

You could make the claim that any abstraction is unnecessary if you see no value in it. It doesn't mean others will not find value in it.

Personally, I find it quite easy to understand and convey to non-devs the stories told through gherkin. That is part of the value prop for me.

If no one is reading them but the devs, I'd rather use something other than gherkin.

Cucumber also prints the step source file and line number with a command-line argument.

> Non-technical people don’t read code, no matter how easy it is to read.

There is something deep and surprising about this fact.

I love cucumber. When you join the development of a project it's a great place to start and learn what the project is about.

The whole 'customer writes features' thing never worked for me either, but just having it as automated specifications for my projects makes it very valuable for me.

Customer writes features may not work. But writing features and asking customers to confirm that the description matches their expectations works a lot better.

I've used Cucumber for years, and I've never had a customer write a feature. Customers reading features after I've written them and checking them works much better in my experience.

Couldn't you do the same with RSpec or straight Capybara code?

I no longer see the added value of Cucumber when you work in a project that has no non-technical team members writing tests.

If I go to my stakeholders and customers with Spec or Unit code, and ask them: "Does this test represent your idea of how this feature should work" I'll get no useful answers.

I can read it to them, or we could collaborate on a user story that doesn't get directly mapped to tests, but then we have room for the interpretation/translation process to create the wrong assumptions in the test code.

There are plenty of tests I'd never show the customer because they involve implementation details, and those aren't going to be written in Gherkin because I don't hate myself enough to juggle that many regexps.

The value is when you have non-technical members reading tests.

The gherkin language is nice to describe features. I've worked in environment where business analysts did exactly that. As for making them executable scenarios... this never panned out for the reasons described in the article. It doesn't add any value.

I drank the Cucumber Kool-Aid and had an app with basically 100% Cucumber coverage. If I wanted to change something, I would first write a cuke for it, write specs as necessary, and only as necessary write actual code.

Of course I got into the same spot that almost everyone gets into doing that: too many "and then I click..." scenarios. Great coverage, but then the requirements change substantially and the test base is just a massive snarl of interdependent assumptions. At one point I had effectively written, in cucumber, a natural language interface to my app.

But then at some point I needed to move and I needed to move fast and I just let the whole infrastructure rot. Now I'm doing almost everything on the client side anyway, so it's all a little moot.

All of that being said, I'm still a Cucumber fan, but I approach it differently. Here's what I think:

You should have cukes for the 5-10 things that if they don't work make your product completely pointless.

On Gmail, there would be cukes like:

    When I receive the email from Fred about dinner
    And I open it
    Then I should see the message
    When I reply
    Then Fred should get my reply
And those cukes basically don't need to have wildcards:

    When /^I reply$/ do
      click "reply_button"
      fill_in "message", :with => "I'll be there!"
      click "Send"

* Use cucumber as a way to force you to keep track of the handful of absolutely critical integrated experiences

* Don't use fancy grammars... Think of steps as human-readable function names.

* Don't do anything fancy in the steps. They should just be a list of actions that users would have to do.

And yeah, there's no reason you couldn't do all of this in Steak. I think Cucumber is just nice in that it encourages you to write it out in human terms.

What's the major benefit of having those 5 or 10 things be cucumber/gherkin, instead of straight rspec? (or straight minitest!)

What are the costs of having to understand and maintain code written using a seperate little technology stack (cucumber) for just 5 or 10 scenarios?

Okay, you sort of answer that: " I think Cucumber is just nice in that it encourages you to write it out in human terms."

Okay. If it's worth the cost to you to do this. You could also, of course, just write things out in human terms in comments above the 5 or 10 tests written in rspec or whatever.

Towards the end, he says "...but I do enjoy the Gherkin syntax. Not for testing, but for gathering feature requirements..." and this is where cucumber or fit or FitNesse is useful - you gather requirements, then because these requirements can be run as tests you start running these requirements as tests and now you have a living document of your product and product owners can read the tests (which are the requirements). If you look at such tools as mere test runners, then they do add an extra layer of complexity, but these tools were not meant to be just test runners... so you are using it for the wrong thing... :)

We started using Cucumber 6 months ago and I hate it. My biggest issue is that it's unreliable. 90% of the test failures work if you do it exact same steps manually.

It seems like it always times out waiting for something on the page to load.

This is probably a driver issue, if you're not using it try Poltergeist[1], a headless webkit driver. Because it's webkit it basically behaves the same way as if you were testing it yourself in chrome.

[1] https://github.com/jonleighton/poltergeist

Damnit, I was hoping for an article about the health drawbacks of cucumber (I'm boring that way), and it's about another Framework.

It'd be awesome if people started tagging these as [technical] or [computers] or [vegetable-afficionados].

Rant out.

"I don’t prefer to use Cucumber for any of my testing, but I do enjoy the Gherkin syntax. Not for testing, but for gathering feature requirements. It provides a very clear and concise way of explaining a feature, without confusion. But that is where the line is drawn."

That is also where I draw the line. The whole argument that other non-technical folks can read Gherkin may be true, but I can't imagine them enjoying it.

It's like reading computer-speak. Why can't features just be described in English...?

Yes. And he's not the first one to say this. There was a post saying the exact same thing about a year ago, maybe even on HN?

But, yes, just yes.

(I'd go even further and say "why rspec when you can test::unit/minitest? especially for Rails since the minitest path is officially supported by rails." Again, the extra layer of stuff is just more stuff to understand and troubleshoot when your tests aren't working as you'd like -- for unclear benefit.)

The thread seems to be overrun by a bunch of whiny crybabies.

No one forces people to use them, yet they seem to have a great time crying about it without trying to defend their position.

Cucumber and RSpec are among a set of testing tools which has really changed the testing landscape. It would be really wonderful if people spent half this time and energy on trying to create a better tool or joining the open source groups of Cucumber and RSpec and trying to make it better.

> No one forces people to use them

Well, other than, in many cases, employers.

I'm a .net dev so I've been using SpecFlow (which uses Gherkin) and frankly, I love it. I write a happy path and a sad path for my scenarios, make sure my step definitions are well parameterized and my QA can go off and create as many new scenarios to test all the edge cases he wants. It works wonderfully.

As a product/program manager... I can tell you that writing RSpec tests are just as easy to write as Cucumber tests. And, with less overhead.

Really, I'm not a Ruby guy, so I thought it's some vegetable story... cause I love cucumbers (& tomatoes)... Ah...

Figured the bottom voted comment would be about the virtues of fresh cucumber.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact