
Why a mock doesn’t work - UkiahSmith
https://nedbatchelder.com//blog/201908/why_your_mock_doesnt_work.html
======
stephen
Agreed with the articles, mocks are very tied to implementation details. I
almost always prefer state-based testing:

[https://martinfowler.com/articles/mocksArentStubs.html](https://martinfowler.com/articles/mocksArentStubs.html)

As "observable state goes from a to b" is much closer to a business/functional
requirement that will still/always be true, regardless of refactorings.

Refactoring in codebases with state-based tests is a pleasure; in codebases
with mock-based tests it's tedious, constantly updating tests when no semantic
behavior was supposed to change.

Also, mocking via module hacks like in the article (and in the JS world) is
scary; modules are basically global variables so it's a very coarse grained
slice point. Dependency injection is almost always better.

~~~
hinkley
Dependency injection leads to elaborate, brittle fixtures that become more
elaborate and arcane over time.

And the sunk cost fallacy has people working for hours (and I’ve witnessed
pairs spend two days, that’s over three man days!) trying to maintain them.

I use mocks to arrange state A without going through all the business rules
involved in creating state A. But you have to expose the states to do it,
which can make for pretty descriptive code but is outside of some people’s
experience and so they resist it.

Earlier tests have already verified all routes to State A, the next batch of
test now takes it as a given. This controls test explosion by utilizing
transitivity. A->B B->C implies A->C. Tons of unit tests for A->B and B->C and
then you only need a couple of functional tests (say, one negative and
positive) test of A->C. You’re just checking the plumbing isn’t broken.

Otherwise, you get Cartesian products. You end up with elaborate, (often
custom), mocks that couple all of the tests together in hard to maintain ways.
You end up with tests that accidentally test the mocks/fixtures instead of the
code.

Some of my current coworkers do this too. I don’t know where this pattern
comes from. It’s often easier to replace their two page fixtures with two or
three lines of mocks per test. It’s often almost the same amount of code, but
so what, each test can be read. Each test can be run individually, and
behavioral changes to the code due to urges affect mostly the tests you would
expect. Those tests can be fixed, rewritten, or just removed as they disagree
with the new requirements.

The absolute worst is fixtures with asynchronous code. Those break constantly,
and often invisibly. What’s the point of a fire alarm if the damned thing
doesn’t work? It’s almost worse than nothing at all.

~~~
PopeDotNinja
> The absolute worst is fixtures with asynchronous code

I concur that setting up a test for async code is a complete pain in the arse.
It's extra fun when the caller doesn't get a reply, so there's no response
that you can assert your as valid or invalid. It's extra, extra fun when the
tests in your test suite need to run in parallel without stepping on each
other.

Here's an example. Say you've got PipelineService123 that receives a message
with a chunk of data, transforms that data, and sends it elsewhere. Try
thinking about dynamically setting an instance of PipelineService123, wiring
up an input message sender, giving that sender the dynamic address of the
instance of PipelineService123, wiring up an out message receiver, making sure
the instance of PipelineService123 knows how to reach the receiver, all to
test whether or not everything is wired up correctly & that the data is being
sent, transformed, and received properly, running this test in parallel with
all of the other test cases, and keeping it reasonably easy to reason about.
Good luck with that! I'm not sure even I even stayed on top of my attempt to
describe it :P

~~~
littlestymaar
Just curious : which language do you work with ?

In my previous work I had to work on a big asynchronous codebase in JavaScript
and I had zero issues with asynchronicity at all, so maybe it's an
ecosystem/language dependent issue ?

~~~
PopeDotNinja
Elixir. Are you running your tests in parallel in the same process?

~~~
littlestymaar
It depends on what you mean by parallel: nodeJs is single-threaded, so every
CPU-bound task runs sequentially. But all the asynchronous tests are run
concurrently by the test runner (Mocha[1] in my case).

[1] [https://mochajs.org/](https://mochajs.org/)

------
pojzon
I really don't like that the topic title is a general statement but the
article under is talking strictly in python landscape as an example and not
discussing the idea itself.

Mocking by itself works fine. It's a good idea that works if used correctly.
Misusing it or bad usage leads to issues - duh.

I'm familiar with the narrative that anything that is too hard (to get right
at the first try) in our field means that it is bad - but I don't agree with
it at all. We are at the point where craftsmanship should be a good metric to
distinguish medicore developers from experts..

~~~
nedbat
What title would you prefer? "Your Python mock might not work, but it could
still be a good idea if you do it right, and here I will explain how"? :)

I didn't mean to imply that mocking is bad. Is that what you took from it? Why
would I explain how to get mocks to work if I thought people shouldn't use
mocks?

~~~
_ix
I’ve recently learned that some actually do consider mocking harmful.

Where I work right now, there’s a really outdated and unfashionable fight over
the benefit of unit testing in general. Existing engineers don’t see value in
test driven development, exhaustive testing, unit testing, and mocks/spies are
thrown in... and we’re a python shop. I’m utterly confused. I, too, grew
concerned after reading–will I be hearing this cited/twisted as further
evidence against investing in our dreadful testing situation?

~~~
mikekchar
I'm a died in the wool TTDer, but I actually think that mocking unnecessarily
is usually harmful. I use it as a technique of last resort.

I should define my terms before I explain, because many people use the term
"mock" to mean things it didn't originally mean. Test objects that are used in
place of domain level objects were traditionally known as "fakes". A fake that
represented a fixed known value was called a "stub". A fake that included an
assertion that a function was called (or that collected data on function
calling) was called a "mock". It's a bit confusing for me that many people use
the term "mock" to mean "fake". I found it weird that the original article
pointed to an article on faking and then used the term "mock" without referred
to the original meaning of the that word (which makes me wonder what they mean
when they say "fake").

Anyway, usually you want a fake when you don't have access to some part of the
system to test it directly. Sometimes that's because it's a completely
different service. You can fake out that service so that you can see if the
code that interacts with that service is working, without having to actually
set up the service.

A stub is useful in situations where you need to know that your code is
working with specific values of data inputs. So you might have an object that
you pass to a function and you want to know what happens if one of the
properties on the object is null. It might be hard to set that up, so you can
stub it out.

A mock on the other hand, basically tests if a function is called. A good
example where you might legitimately need a mock is where you pass an object
to a function and you are expecting that a callback on that object will be
called. It's really hard to test that without a mock.

Where mocks can be dangerous is when you completely mock out any interfaces
and stub the return values. You pass a fake object as a collaborator to your
function and you test that your function works. The problem is that your fake
object may not necessarily represent a real object in the system.

If you ever want to refactor the code, your tests will no longer tell you that
a property is missing, or that a function is missing because all of your test
code is using fake objects with mocked and stubbed methods. Ideally a unit
test that uses an interface should fail when you change that interface. This
allows you simply to change an interface somewhere and have your tests tell
you exactly what you need to do to make that change work.

Where you end up getting a lot of conflict WRT testing strategy is that some
people believe very strongly that unit tests should test things in isolation.
Secondly people believe that unit testing should be a black box testing
strategy. So you should test through your public interfaces only and any
collaborators that adheres to the interface contract should work as expected.

In this style of testing, you are often encouraged to mock anything and
everything at the interface boundary. This has many advantages. First, it
means that your test objects can be very simple, so writing tests is very
quick -- even if the code in the system is complex (because you aren't using
any of that code). Second, because you are testing the public interface only
and there is minimal setup, your tests become documentation of the interface
contracts. Third, because it is black box, if you change the implementation of
your "unit", you don't have to change your tests.

Despite these benefits, I'm not a big fan of this style. I like white box
testing using real collaborators. My goal is not to define interfaces and nail
them up -- quite the opposite. I want to be able to change interfaces fluidly.
I value ease of refactoring over just about anything else. Second, I want to
use real collaborators almost _because_ it is painful. If your collaborator is
awkward and brittle to set up in tests, it is also awkward and brittle to set
up in production code. My goal is to remove that and to simply the code.
Again, my highest value is my ability to refactor the code. I want the code to
become easier to work with over time, not harder and more complex. Finally, I
want to code to break at a "white box" level, not a "black box" level when I
change behaviour. Ideally, I want my tests to say, "On the third line of that
function, we're going to have a problem because that function is different
now". I _don 't_ want to be aware of problems at a larger scope "Somewhere in
function A that calls function B which calls function C and D there is
something wrong because it does something weird".

In the end I write small functions that are tested directly with real
collaborators. I avoid private functions because it hides my implementation
details. I test at a low level so that I avoid test complexity from excess
branching. I get incredible specificity from failing tests, when end up
essentially giving me a TODO list for what I need to do when refactoring code.

Hope that gives you some idea of at least why one person avoids mocking --
although, you _do_ need it sometimes. And to be fair, sometimes I'll do a
London School, outside in, mock the world implementation if I'm not sure what
I'm building. However, I throw away all my mocks and re-TDD once I know what
I'm building.

~~~
chriswarbo
I agree with basically all of this.

> some people believe very strongly that unit tests should test things in
> isolation. Secondly people believe that unit testing should be a black box
> testing strategy. So you should test through your public interfaces only and
> any collaborators that adheres to the interface contract should work as
> expected.

I think much of the confusion and talking-past-each-other comes from ambiguous
language. I actually agree with all the things in the above quote (isolation,
black-box, public-only, relying only on specified interfaces). Where I've
differed from co-workers is that I consider the appropriate "unit" to be a
feature/piece-of-functionality (e.g. "logging in"), whereas they consider the
appropriate "unit" to be a piece of code (e.g. a method or class).

I had a bit of a rant about this at
[http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo...](http://chriswarbo.net/blog/2017-11-10-unit_testing_terminology.html)

~~~
mikekchar
I think Michael Feathers explained it the best. He likened unit testing to
clamping a piece of woodwork while you are working on it. The bits you are
working on need to be in motion because you are working on them. The bits you
are not working on them need to be clamped in place -- you don't want those
things moving while you are working on some other bit. A "unit" is anything
you might want to be clamped in place. It _can_ be a function. It _can_ be an
object. It _can_ be a subsystem. You want to unit test at different levels of
abstraction so that you can "clamp" those levels of abstraction down.

One of the things I've found people get confused with is that they see unit
testing and integration testing as orthogonal. They think a unit test should
exercise a small piece of code in isolation and an integration test should
test examples of real collaborators. Frequently they mock out all their unit
tests and write a few integration tests. Then their unit tests become brittle
and annoying and so they delete them, leaving only a few integration tests.
This leads a lot of people with the impression that only integration tests are
useful. If we can back up and redefine "unit test", then the problem
disappears.

I read your "rant". Don't even get me started on BDD :-) Originally people had
problems understanding the purpose of TDD because the word "test" had them
confused. They would think, "I need to write tests to ensure that this is
working". They didn't think about it in terms of clamping the behaviour so
that it doesn't change when you are working on another part of the system. For
that reason, a lot of people discussed changing the word "test" to something
else that truly embodied what TDD was all about. Many people hit on the word
"behavior" \-- you want to document the current behaviour of your "units" (at
different levels of abstraction). Somehow _this_ got totally confused with
automated acceptance testing! Now we have things like cucumber (which I don't
actually hate, but it accomplishes a _completely different goal than TDD_!)

What really frustrates me is when I talk to people about this stuff and they
think I'm a complete lunatic :-)

------
llarsson
For a person who does not code very large programs in Python, this looks scary
and like something that is hardly "only one way to do it" and that the most
elegant solution gives you what you want. How I import things (or the
libraries I depend on!) affects how I can write my tests? Really?

My own Python scripts are typically single-screen in length. And one-off
stuff, where they either work or don't, basically.

Is this stuff really what developers of large-scale Python programs have to
take into account? Or is this blog post misinformed, because there is an
obviously better and standard way?

~~~
zzzeek
no, this blog post is absolutely on point and correct. it can be extremely
inconvenient and complicated to get around using mocks in many situations
where you want to test things, so we want to use mocks when appropriate. Then,
you definitely want to mock at the most specific level possible.

while mocking in a way that is specific to how modules are imported is
technically "fragile", in that it is deeply dependent on the structure code
that's being tested, this is not an issue in practice, because the mocks are
present in our test suites that run for every code change. If a code change
moves around module assignments, our test will fail, and we know that we have
to adjust the test to accommodate for the change. With appropriate continuous
integration and code review practices, a broken mock-oriented test can't be
inadvertently pushed into a repository.

This does mean that when using mocks, you need to make sure mocks were called
in the way that was expected for those cases where the code might silently
move off using the mock in a way that wouldn't be detectable. The
"os.listdir()" example in this blog is a pretty common case, using mocks to
test code that works with filesystems, where you don't need or want to get
involved with actually creating filesystems which may be a complex and
expensive process, especially if the test suite runs concurrent processes. if
you mock the behavior of "os.listdir" to return a series of results, after the
test code has been exercised, you usually want to assert using mock.mock_calls
that the test code did in fact call the functions expected, unless it's clear
in some other way that the test code definitely used that information.

An example of this kind of code that I just helped someone with can be seen
here:
[https://github.com/sqlalchemy/alembic/commit/02a1bf3454acb7b...](https://github.com/sqlalchemy/alembic/commit/02a1bf3454acb7b02942e246c19326630a8f9175#diff-c1de9c971ea0a629f46ecaf7dccad37dR817)
The Alembic test suite has a lot of test cases that go through all the trouble
to build up real directory structures to test things, but that's a lot more
work than just using a mock, so I use them where I can get away with this
simpler approach.

~~~
EdSchouten
> The "os.listdir()" example in this blog is a pretty common case, using mocks
> to test code that works with filesystems, where you don't need or want to
> get involved with actually creating filesystems which may be a complex and
> expensive process, especially if the test suite runs concurrent processes.

Alternative: have an abstract base class that describes a file system API.
Have one implementation that builds on top of OS primitives and have another
one that is a mock (potentially auto-generated through some mocking
framework). That way there is no need to monkey-patch standard library
functions at runtime.

I did that within a project of mine, written in Go:

[https://github.com/buildbarn/bb-
storage/blob/master/pkg/file...](https://github.com/buildbarn/bb-
storage/blob/master/pkg/filesystem/directory.go)

[https://github.com/buildbarn/bb-
storage/blob/master/pkg/file...](https://github.com/buildbarn/bb-
storage/blob/master/pkg/filesystem/local_directory.go)

Pretty slick that I can now use
[https://github.com/golang/mock](https://github.com/golang/mock) to
automatically generate a mock of that.

~~~
zzzeek
Sure but then I have to write my real library code using an abstraction,
making my code more difficult to read and maintain; a dependency injection
system is then necessary in order to have the correct concrete implementations
set up at runtime.

In this sense, mocks are solving the problem of having code that is full of
dependency-injected AbstractFooBarFileSystemWithExtraPickles style of code,
which is considered to be pretty un-Pythonic. I spent many years with Java and
Spring so I can attest to both sides of this equation.

Those of us using Python are using it because it is an interpreted, dynamic
scripting language. If I'm coding in something more rigid like C or Go, then
I'd expect to have a more complex architecture in order to achieve things that
are fairly simple in a scripting language.

~~~
EdSchouten
> Sure but then I have to write my real library code using an abstraction,
> making my code more difficult to read and maintain;

It depends, right? If you suddenly wanted to let your existing set of classes
read their inputs not from disk, but from some other kind of storage (e.g.,
files embedded in a Zip file), you'd only need to write one extra class and
you're good to go. That would be a lot harder if your code called os.*
directly.

> a dependency injection system is then necessary in order to have the correct
> concrete implementations set up at runtime.

If by dependency injection system you mean invoking one extra constructor in,
say, main() and pass the object along as a handle, sure.

~~~
zzzeek
well again, I grew up on GOF programming and once I grokked how mocks in
Python worked, I was very glad to embrace their approach, which has allowed me
to write much simpler code that is more thoroughly tested; I of course still
use abstractions to a great degree, but I no longer have to build out an
abstraction system when I just want to make sure some fairly straightforward
code is fully tested. I no longer have to build out everything as an
abstraction when such a system is generally YAGNI, I can use the Python
standard library directly.

This allows me to do less work, write and maintain less code, and have better
test coverage. It allows my code to be more fully tested even when it has not
yet been abstracted, if that's what's in store for it. Patching local imports
within the scope of two lines of code is a non-issue thanks to Python context
managers. I have much more complicated examples of code that was already
plenty complicated and mocks allowed me to get it tested quickly and
effectively, instead of having to break it out into even more complexity.

Basically mocks have been all productivity and no downside for me whatsoever,
using the Python standard library mock which is extremely well designed.

------
michaelmcmillan
If you find yourself fighting with mocks, ask yourself: is there a deeper
design problem with my code? I often find that things I can't test easily have
crappy design.

~~~
theptip
Couldn't agree more. I've done a few "coding dojo" sessions where my team and
I start from scratch and write a new set of mock-based tests for a piece of
existing code, and when it starts to get gnarly, it's always been because of
an inconsistent interface, or confusing API of the code under test.

Bob Martin talks about this a lot; your UTs should be thought of as first-
class clients of your objects' APIs. If something is hard to test, it's
probably hard to use, or abstracted at the wrong level.

------
raymondh
I think Ned strikes the right balance, showing risks associated with mock
objects without condemning them outright. There is no doubt that people
sometimes go overboard with mocking and there is no doubt that there are
situations where it is really helpful.

------
alanfranz
The author hasn't a problem with mocks. It has got a problem with monkey
patching. You can use a mock along dependency injection, and never run into
the problems the author has.

------
gitgud
Mocking is hard _really_ hard, in order to mock something you need to imitate
its functionality and interface. This means the mock is inherently tightly-
coupled to the implementation, which is now another dependency in your system.

After trying to work with mock databases and file systems, I've personally
found that there's no substitute for the real thing. There's much less
maintenance and greater reliability in spinning up a test environment with the
exact implementation that will be used in the production environment.

There are cases where mocks are the only practical solution, (embedded
systems, distributed systems) but mocking is surely the last resort...

------
raymondh
Python core developer Lisa Roach has a nice how-to video on exactly this
subject:
[https://www.youtube.com/watch?v=ww1UsGZV8fQ](https://www.youtube.com/watch?v=ww1UsGZV8fQ)

------
jabwork
I've wrestled with this problem several times. My conclusion was mocks are
just fine, but this is a wart in that there isn't "one way to do it"

For the most part I can get by with one rule: always mock __the module __.

    
    
      with mock.patch('os.listdir'):
    

will always work, even if it doesn't accomplish what you want.

    
    
      with mock.patch('mymodule.os.listdir')
    

will fail if that module does not explicitly _import os_ and instead does
something like _from os import listdir_ (perhaps because a later dev did not
realize importing os directly was actually a requirement for the test and
changed the code).

The rule is not perfect though. In the above case, the error will actually be

    
    
      ImportError: No module named os
    

This can be fixed with e.g.,

    
    
      assert hasattr(module, 'os'), "os module is not explicitly imported"
    

as a preamble to your test but ... it is not perfect by any means.

EDIT: formatting

~~~
nedbat
I don't understand what you mean by "will always work, even if it doesn't
accomplish what you want." Mocking os.listdir will be useless if your product
code's imports don't match it. How is this "one rule" to use?

~~~
jabwork
The os module ships with python, so the function call mocking os.listdir will
always succeed even if the code being tested does not use os.listdir

By mocking mymodule.os.listdir you add a requirement that mymodule actually
import the os module and take advantage of mock.patch failing loudly if it
does not.

------
askvictor
I've always been dubious of using mock for tests which involve external API
calls; at best they require you to reimplement the API according to the
documentation (which there may be none, or that the API doesn't follow exactly
for edge cases (i.e. the things you're meant to be testing)). At worst you're
implementing a very small subset of the behaviour of the API and not testing
how your code responds to the other behaviour. But I haven't come across other
solutions (not that I have that a heap of experience here, just contributions
to a couple of oss projects).

~~~
alexanderdmitri
Mocking inputs from APIs is actually great if someone besides the original
developer picks up the codebase, whether from a system or even down to
isolated function stubs. This is because it gives insight into the expected
inputs the original developer(s) had been expecting and can elucidate the
source under test.

Integration tests fill the hole you're pointing out. You can even have
integration tests designed to validate the mock inputs in many cases.

------
jchook
“...there are other approaches to solving the problems of isolating your
product code from problematic dependencies.”

What are some other approaches?

I have followed the redux-saga pattern with success but how else should we
accomplish the same goal?

~~~
j88439h84
See the links at the top of the post. Both are excellent.

------
wojciech_bulaty
Ned's implicit definition of a mock is narrower than the generally accepted
one. He actually described a stub created by monkey-patching. A mock allows
for call verifications as well.

There are 3 main categories of techniques for managing dependent components
used these days:

1\. In-process class/method/function mocks or stubs
([http://xunitpatterns.com/Mocks,%20Fakes,%20Stubs%20and%20Dum...](http://xunitpatterns.com/Mocks,%20Fakes,%20Stubs%20and%20Dummies.html)
and
[https://martinfowler.com/articles/mocksArentStubs.html](https://martinfowler.com/articles/mocksArentStubs.html))

1a. By monkey patching (which is what Ned has demonstrated in his article very
well)

2a. By dependency injection

2\. Over-the-wire API mocks or stubs
([https://en.wikipedia.org/wiki/Comparison_of_API_simulation_t...](https://en.wikipedia.org/wiki/Comparison_of_API_simulation_tools))

3\. Virtual services/simulators
([https://en.wikipedia.org/wiki/Comparison_of_API_simulation_t...](https://en.wikipedia.org/wiki/Comparison_of_API_simulation_tools))

It's worth keeping in mind that all of them are part of a wider group of test
doubles: [https://www.infoq.com/articles/stubbing-mocking-service-
virt...](https://www.infoq.com/articles/stubbing-mocking-service-
virtualization-differences/)

Other options available for decoupling from test dependencies:

1\. In-memory database [https://en.wikipedia.org/wiki/In-
memory_database](https://en.wikipedia.org/wiki/In-memory_database)

2\. Test container
[https://www.testcontainers.org/](https://www.testcontainers.org/)

3\. Legacy in a box [https://www.thoughtworks.com/radar/techniques/legacy-in-
a-bo...](https://www.thoughtworks.com/radar/techniques/legacy-in-a-box)

------
mpweiher
Why I don't mock:

[https://blog.metaobject.com/2014/05/why-i-don-
mock.html](https://blog.metaobject.com/2014/05/why-i-don-mock.html)

~~~
frenchman99
You're talking about mocking database calls though. In my line of work
(insurance brokerage), we use lots of insurance APIs and they are sometimes
very slow (+20 seconds / call) or completely down at random hours. There is
simply no way around mocking those API calls if you want a fast and reliable
testsuite.

~~~
redis_mlc
> or completely down at random hours

Also describes "dinosaur payment company" API sandboxes (stage).

So we end up testing against the production API with a staff member's credit
card - well if we want to deploy any time soon.

Or I guess you could mock and cross your fingers that they haven't changed the
API recently without telling you. Payment APIs are the most solid, but that's
a low bar considering the state of third-party APIs in the real world.

If it wasn't hard for enterprises to build and manage APIs, then Google Apigee
and Mulesoft wouldn't be worth billions.

------
pramttl
Beautiful explanation for something that tripped me up in early days of using
mock/patch. Summary: (1) Variables in Python are names that refer to values.
(2) (For that reason) Mock an object where it is used, not where it is
defined.

------
baxtr
Even though it’s not related, I wonder if this does apply to UI as well. I
often want to see production near prototypes

------
franzwong
Is this python specific? Usually I replace the whole module with mock instead
of the functions of it.

------
blondin
the python mock module is one of those modules i would like to see rewritten
from scratch. you won't get it right by first principles. you always need to
go to the documentation. that's a sign that something is not right imo.

------
c-
"Why a mock doesn't work" *in python because of reasons

fixed the title.

