Stop Using Mocks

ericb · on March 23, 2011

I'm a Rails dev, I do mostly test driven development, but my current bit of heresy is that there are some tests with low or negative ROI that are not worth writing.

Here are some examples: I almost never test at the view level, and barely at the controller level, and instead skip to the integration level. View, and even controller-level tests are brittle. If I'm going to spend time on brittle tests, I'd rather it be at the highest level. My thinking is, if your app can run a marathon, it can probably also walk. I'd also rather my test be married to output than to implementation details, as that gives more ways the code can change and still not break the test.

I feel like some folks view testing like a religion, or code coverage percent as a "score" in the game of coding.

qaexl · on March 23, 2011

I did the same thing until Rails 3 came out -- skipping controller and view tests completely. Now I do Rack-level tests, at least for the non-HTML endpoints. If they are using standard REST conventions, they are even easier to test. I send Rack HTTP request, I get back Rack HTTP response. I test the JSON/XML coming out.

Considering that "web apps" in the future will really mean "mobile apps", I think in the future Rails will mostly drive endpoints and barely serve dynamically-generated HTML, most of the views being constructed out of Javascript/HTML on the browser end. Testing to see that data gets applied to the right DOM element will be easier on the browser side (or even with a fake Node.js browser) if the mocking is done there. I suspect that isolating the views properly from the rest of the app will make the whole stack easier to test and maintain. (It is also easier to find a decent Javascript developer than it is to find a decent Rails developer).

tmcneal · on March 23, 2011

Interesting perspective, but I'm surprised that you're lumping in "controller level" tests (i.e. unit tests) with tests against the view.

While I'm definitely on board that tests against the UI are brittle, I feel like unit testing individual methods and classes ensure your code is modular and functions as designed.

Integration tests are great, especially since they more reflect how a user interacts with the code, but I think you can get a lot of value from unit tests without getting too religous about it.

trumbo · on March 23, 2011

Aren't controller level tests called functional tests? I'm still new at this, but I thought unit tests are to the model as functional tests are to the controller. And integration tests mimic user behavior.

ericb · on March 23, 2011

They are, but the naming convention is a rails-specific quirk.

Usually functional testing is defined as checking functionality vs. requirements, and it is done with something like selenium, or Quick Test Pro when it is automated. And usually, since requirements don't deal with controller functionality but are most often written about the system as a whole, functional testing would be what Rails calls "integration" tests, or Rails/RSpec calls "request" tests.

When the controller tests were named "functional tests" I think integration tests in Rails were not implemented yet, and controller tests were the highest up the stack you could test, so the name made more sense then.

russellperry · on March 23, 2011

"I almost never test at the view level, and barely at the controller level"

You sound like every Rails dev I know.

It depends on the controller action in question. If it's a boilerplate Rails controller action, then the ROI might not be that high, but if the controller is doing something more specific (assuming it's not doing something that ought to be pushed to the model) then unit testing the action is important, sometimes very important.

Code coverage is pretty much a meaningless metric because it can't say whether the tests are actually good and useful. But it can help point experienced devs into areas of risk that may or may not warrant more attention.

ericb · on March 23, 2011

Well, part of that goes to how I code. The controller is rarely doing anything important, more than boilerplate. I push that into the model as much as possible.

qaexl · on March 23, 2011

If you're going with Fat Models and Thin Controllers, then the controller shouldn't be doing anything more than something standardized. You can probably reduce that code down to using inherited_resources (though with Rails 3, it's just as easy to roll your own). I don't know about HTML endpoints, but I test all my JSON and XML endpoints by sending fake Rack responses. They tend to be highly orthogonal and easily metaprogrammed.

latch · on March 23, 2011

code coverage numbers are ridiculously stupid.

I know you aren't attacking me. I realize that for this simple method, the code coverage is, well, very good (perhaps to a point of being bad). This is an contrived example to showcase the implication of using mocks versus stubs though....or more abstractly to show the difference, and the different way to test, behaviors vs implementation logic. Whether these always have to be tested, I agree is worthwhile to talk about.

I don't wanna become the "Programming, Motherfucker" example of what's wrong with software development :)

Tomek_ · on March 23, 2011

There is only one good code coverage number and it's "100". If you aim for 100% code coverage then it really helps make your code better as you must test all the parts, not only the easy ones - that forces you to write code that's easier to test, and "code easier to test" equals "better code" (more sane interfaces/API, decoupled classes, etc. etc.).

kscaldef · on March 23, 2011

I've seen "100% coverage" specs that managed to run every line of code while only testing trivial, obviously correct, aspects and failing to test any of the actually important behavior. This sort of stuff gives TDD a bad name, but is, in my experience, sadly common.

Also, at least in the Ruby community, people seem to only measure line coverage. Without branch and conditional coverage reports, it's possible to run 100% of the lines but still miss a lot of code paths.

alain94040 · on March 23, 2011

I'll tell you what we do in hardware to get test coverage that is worth way more than 100%: we do fault injection. I have never heard of anyone trying to do the same in software, but it would be interesting.

Here's how it works:

for each statement in your code, break it, then run your entire test suite. If none of your tests have uncovered a problem, mark the code as uncovered. Repeat for each code statement.

It's crazily expensive to compute, but it's so much more accurate than to say that a line of code was hit or not. It measures whether the code had a functional impact on the overall behavior.

aarongough · on March 23, 2011

That's generally referred to as 'mutation testing'. There are a number of systems that enable this in dynamic languages, the one I am most familiar with is Heckle: https://github.com/ryansobol/heckle

Mutation testing is great, it highlights all the areas where your tests aren't written well, and teaches you to write better tests. I'm not sure that using it all the time is the best use of resources in most cases though...

famousactress · on March 23, 2011

I disagree. Maybe flamebait, but I think 100% is usually either a silly goal, or a good start. 100% for a complex project is basically impossible to achieve (and assumes every area of the code is as important as the next), and for a lot of code 100% is nothing. Consider a function that divides to arguments and returns the result. I can pass in div(2,2). There, done. 100% coverage.

[edit] I forgot to mention what value I do think it has. On teams that I'm on I always say I frankly don't care what the number is, as long as it's going up and not down.

Tomek_ · on March 23, 2011

Of course 100% code coverage won't make your code better in every possible aspect, nor it guarantee that tests are perfectly written, but at least a) the tough parts of the code won't be skipped during the test b) you will have cleaner, more usable (testable) API.

famousactress · on March 23, 2011

I still think it's a bit of a wonked metric. The goal out to be to increase your coverage of possible test cases, not LOC. Since you can't cover every possible test case on any non-trivial project, they must have priority. Priority is probably something like how likely the case is to occur, balanced against the consequences of a failure.

Test coverage is a simple indicator, and simple indicators are handy.. but chasing 100% too blindly will cause you to start seeing every line of code as equal, and they're not.

[edit] - oh, I meant to mention that both of your points (a & b) make sense. I can see how chasing coverage could help make sure you test parts of your code you've been avoiding because they're hard to test. I just wonder if it's the best way to accomplish that goal.

timtadh · on March 23, 2011

However, you shouldn't fall into the fallacy that if you have 100% "code coverage" that you code is really fully tested.

cageface · on March 23, 2011

Amen. Bugs tend to crop up at the interfaces of subsystems, so you're usually better off spending your testing effort at a feature level.

jjrumi · on March 23, 2011

I think tests should not go for code coverage, but for "Cyclomatic Complexity" coverage.

limmeau · on March 23, 2011

What precisely is cyclomatic complexity coverage?

awj · on March 23, 2011

I have no idea what the GP meant, but here's my take:

Set a minimal threshold of cyclomatic complexity. Below this threshold, function are assumed "so simple there are obviously no defects". Above this threshold you need tests, roughly prioritized in order of descending complexity.

It's not a bad idea, but that assumption of simplicity is a doozy. Also, figuring out cyclomatic complexity in Ruby (especially Rails apps) is ... hard. Especially if you want the computer to do it for you.

jjrumi · on March 23, 2011

For instance, you can have a test result of 100% line coverage but if you have composed conditionals and you only check the first part of them... you are NOT checking all the paths of your code.

limmeau · on March 23, 2011

There are a couple of well-established advanced criteria like decision coverage, condition coverage, MC/DC to choose from if statement coverage isn't enough.

PaulHoule · on March 23, 2011

Like most TDD examples, I see the code and the first thing I do is want to punch the wall.

Here we see an example of modularity for the sake of modularity, not business requirements.

In all systems I've built, if encryption is pluggable, it is selectable on the level of an individual user. The reason for this is that, in general, you can't transform one kind of encrypted password into another kind of encrypted password, not unless you have the plaintext form.

Practically, your encryption is pluggable on paper only because you can't decide you want to use a new algorithm for people who register after a certain date... Examples like this add to the perception that TDD is rife with "Cargo Cultism".

latch · on March 23, 2011

Better than wanting to punch me! I'm curious what language you program most often in? DI, and this artificial modularity for the sole purpose of testability, is a common pattern and limitation of static languages. In fact, most .NET and Java developers don't even realize that DI is a form of IoC. Most think the two are synonyms.

If you move this example to a dynamic language, suddenly artificial complexity and the verbosity of the code shrink to something of (relative) pure beauty.

As a ruby developer, I've made this argument many times before. As a C# developer, while I every line of code I write, I feel the same frustration that you do, I don't see much that can be done.

PaulHoule · on March 23, 2011

I program quite a bit in C#, which I like, as well as in dynamic language such as PHP, Perl, Python, etc.

I avoid the use of DI in C#; it's not a "limitation of a static language" but it's a choice that people make. I find that wonderfully concise and maintainable code can be written in C#. Some of the benefits of DI can be had through careful use of the static scope in desktop applications and of the request scope in ASP.NET applications. There are all sorts of tricks you can do to improve testability in a system of this sort.

Testability is a good thing, in and of itself, but you've got to weigh the cost of bulking up your code. In a case like this it may make sense to make the unit of testability larger.

I see everything in a program, including lines of codes, methods, classes, files, tests to be a liability. Each artifact is like a puppy, you not only have to create it but you're committed to maintain it in the future. I've used automated unit tests to create systems that I could never have built otherwise, but there comes a point where the benefit of tests exceeds the cost and when the artifact count increases dramatically (as when you create mock objects) that it's time to consider the economics.

dansingerman · on March 23, 2011

I think the lesson here is that (often) good test code is much harder to write than good runtime code. I have plenty of runtime code I would post publicly and be happy to defend. That's much less true of my test code.

sparky · on March 23, 2011

> The reason for this is that, in general, you can't > transform one kind of encrypted password into another kind > of encrypted password, not unless you have the plaintext > form.

You have the plaintext form every time your users log in. If you want to change the hashing/encryption scheme, keep a bit for each user denoting whether or not they've been switched to the new scheme. If the bit is clear, hash their password the old way; if it matches, hash it the new way, store that, and set the bit. This is how people usually suggest transitioning from crappy hashing schemes to something more sane, rather than requiring people to sign up for new accounts or actually change their (plaintext) password.

Granted, this requires some logic external to the encryption/hashing module, but the encryption/hashing bit should be relatively pluggable.

PaulHoule · on March 23, 2011

I've used this trick to gradually change the encryption used in web user authentication systems and even built it into a framework.

It still involves encryption being pluggable at the level of an individual user rather than at the level of a system as a whole.

nvarsj · on March 23, 2011

A better article, in my opinion, is Martin Fowler's "Mocks Aren't Stubs": http://martinfowler.com/articles/mocksArentStubs.html.

However, I disagree with some of the basic premises. For example, Fowler says that using mocks ties the test to the implementation. I don't see how stubs are much different - you test the state of the stub, and this requires knowledge of how the implementation will use the stub. Any internal interaction testing will be tied to the implementation by its very nature.

I suppose the argument is that mocks can be even more coupled to the implementation by using expectation matchers. I find this is a bit disingenuous though - any decent mocking framework will make these matchers optional. But when you really need to test that specific interaction, it can be very handy.

So perhaps "mocks aren't stubs", but I would say a stub is a mock - just a dumber one.

brown9-2 · on March 23, 2011

I agree with you on this, and I feel like I must be missing out on some feature of "stubs" or working with a different definition of what a "stub" is - testing with a stub implementation of the collaborator, which you have to write yourself anyway, seems to tie your test to not just the implementation of the class-under-test but also to the stub you've written.

In the OP's article, I don't see the difference between what the author is calling "mock" or "stub", only that in the latter the expected value of the arguments passed to the collaborator's method are not asserted. This still seems like a mock to me.

Groxx · on March 23, 2011

I see what they're trying to say, and I like the increased decoupling with FakeItEasy (I'll probably switch to it), but every single test in there makes a call that depends on parameter type and order.

For instance. Every test includes a line like this:

  var user = new UserRepository(store, encryption)...

Later, there are lines like this:

  A.CallTo(() => encryption.CheckPassword("Ghanima", "Duncan"))).MustHaveHappened();

How is that not identical? The faking code is more flexible, as you can fake any kind of call without regard to the parameters or types, but that's just as tightly bound as the other examples, and they'll all fail if you re-order the parameters.

latch · on March 24, 2011

The test that you picked is specifically there to test that call. Did you not see:

  Any.CallTo(store).WithReturnType<User>().Returns(user);

Also, two of the tests have seen been updated to truly leverage FakeItEasy's stubbing capabilities:

  [Test]
  public void ReturnsNullIfThePasswordsDontMatch()
  {
    var store = A.Fake<IDataStore>();
    var encryption = A.Fake<IEncryption>();

    A.CallTo(() => encryption.CheckPassword(null, null)).WithAnyArguments().Returns(false);
    var user = new UserRepository(store, encryption).FindByCredentials(null, null);
   
    Assert.IsNull(user)     
  }

There's no setup/expectation on the store - it'll just returned a canned (default new user) response.

angryjim · on March 24, 2011

I agree with this point. You are using a more flexible stubbing framework and THAT is what the point of this post should be about. The rest is just semantics.

My questions with TDD always come with the refactoring.

jjrumi · on March 23, 2011

In my case, most of the time mocks are used to avoid Database interactions: They are dangerous for writing and slow for reading.

The best thing I've run into lately have been "mocking" the DB creating a copy (temporary tables with only the needed data) of the tables that the code interact with.

This way you create a "mock" of the data you're going to work with and code the test getting a more reliable test in my opinion.

stuhacking · on March 23, 2011

Testing a DB connection sounds more like integration than unit testing.

Mocks are used to fake things that add significant overhead but no value. If you have a suite of 1000 unit tests, you really don't want half of them making actual database requests.

Simply making 'safe copies' of the database data does not solve the problem that Mocks are intended to solve.

MartinCron · on March 23, 2011

In my experience, it's all about layers.

I do both, I have a set of integration tests create a "safe copy" of the database and populate it with seed data, so I can trust that my data-layer stuff works correctly, while my pure unit tests interact with a super-fast fake database implementation.

I actually wrote the in-memory fake db before I wired up a real database. It made the early iterations much faster without having to make changes to multiple layers. I could have not bothered with writing tests against a "safe copy" of the database, but I sleep better knowing that the assumptions about how the persistence types work in my unit tests are validated by my integration/data tests.

jjrumi · on March 23, 2011

It's not a "safe" copy, it's a set of known data in memory that fakes the DB connection and reacts as it will with a real DB connection.

And yes, I have 500+ UTs.

stuhacking · on March 23, 2011

I'm sorry, I misunderstood the situation you described.

angelbob · on March 23, 2011

His example code isn't really about mock versus stub. It's about only depending on the parts (parameters passed, mostly) that you actually use, not creating extra bits that mock other objects.

I agree that your test code should only specify the bits you care about. I find the mock versus stub distinction weird and artificial.

Maybe .NET people use those terms differently? I'm a Rails guy.

latch · on March 24, 2011

No, you are right. Its about being as unspecific as the test requires. I updated a couple examples to lean more heavily on more traditional stubs; still generated by the framework, but expectation or set-response, its all canned.

ericmoritz · on March 23, 2011

The biggest issue that ever comes up for me when using mocks is getting the assumptions wrong. For instance perhaps I mock my interactions with an external web service and they change the JSON structure on me. My tests continue to pass but in production, bad things happen.

jdminhbg · on March 23, 2011

I've been using ephemeral_response[1] in Ruby, there may be a similar library in whatever language you're working with. The idea is that it makes one real call to the service and caches it for a specified amount of time as a fixture. That way you don't constantly make calls against e.g. the Twitter search api, but when they change how it works in 6 months, your test will fail.

[1]: https://github.com/sandro/ephemeral_response

jjrumi · on March 23, 2011

I don't know if ephemeral_response solves this, but the first thing that comes into my mind is caching an error (webservice doesn't respond in time or whatever). This happened to me some times.

Caching the error is a big problem when you have a system that determines if your code is stable or not and blocks deployments in production to prevent errors.

masklinn · on March 23, 2011

It does not bother me that my unit tests can't tell me the power just went down or my graphic card is going up in flames. That's not their job at all.

So that's not an issue at all, that's not relevant. The best thing tests can do (and what they should do) is ensure your code behaves "correctly" in the face of invalid or corrupted data, which an unexpected JSON structure would certainly be.

jister · on March 23, 2011

Looking at your code, I don't think you still need the UserRepository class. From the 2 sample methods you provided, notice that "Store" and "Encryption" can be standalone so leaves the UserRepository class only as a wrapper which complicates your unit tests.

Of course I might be wrong since I haven't seen the rest of the code in UserRepository.

Also, you can also design your FindByCredentials method as:

public bool FindByCredentials(string userName, string password, out User user) {}

latch · on March 23, 2011

The entire example is contrived to talk about testing. Not much about UserRepository is ideal.

I think out and ref are pretty horrible. I understand that the pattern is convenient for the BLC *.TryParse methods, but it should really stick there. Your signature tells me that you'd likely return false rather than throw an exception. I'd agree that returning null isn't the best, but I'd favor returning a NullObject that true/false with a out parameter.

MartinCron · on March 23, 2011

I think out and ref are pretty horrible.

I agree wholeheartedly, though I'm not sure why. I often find myself creating small "data bunch" types with names like ValidationResult that consist of a bool & a set of messages to use as responses to the question "is this input OK, and if not, why not?"

aneth · on March 23, 2011

The idea that every test should cover only one code change is absurd and a recipe for writing way more code than necessary. More code is more brittle and more difficult to change agile-y. Write enough tests to ensure things work, not to ensure every line of code is where you put it. That's folly and a waste of resources.