Very very often, (one of the most common source of errors in programming large systems), two parts of a program are doing the "right" thing, within their assumption, but when combined together you erroneous results.
Unit tests will not help you find them, as they cover the "is this gear/part working", but not the whole machine itself, e.g. is the engine actually running well...
In my experience, higher level system tests have a much higher ROI and real work value then simple unit tests.
Unit tests just test for made up "hypothetical" simplistic scenarios, and usually are written by the same dev. that did the functionality, and they will not expose blind spots on the implementation, while higher level system/integration tests are more likely to test on real world scenarios.
And I haven't found that integration (system) tests have a higher ROI – because they're usually much more expensive to write and maintain. And they can be more brittle than your actual code.
If your code depends on an external service, should your integration test actually make a (test) request to the real live service endpoints? Maybe! But the more 'real' the test is, the less of your own test it covers relative to all of the other external dependencies.
I'm not sure there's much disagreement among you and I and Kent. I can imagine always maintaining some non-zero number of integration test for suitably large code-bases.
But I think I get what he's getting at in terms of – ideally – wanting to minimize those kinds of tests by transforming previously unreliable parts of the system into something we can use "as a given" (like integer arithmetic).
The brittlest tests I've ever seen have been mock-heavy unit tests. Some of them amount to little more than file-change detectors. Those tests can fail massively even when everything still actually works. They cry wolf so often the team ends up trained to unthinkingly change them so they "go green."
> But I think I get what he's getting at in terms of – ideally – wanting to minimize those kinds of tests by transforming previously unreliable parts of the system into something we can use "as a given" (like integer arithmetic).
A noble goal, but practically not a lot different from saying we should aim for formally verified software. After all, it's what we have for computer arithmetic example: http://www.cl.cam.ac.uk/~jrh13/slides/jnao-02jun06/slides.pd... (2006):
> Intel uses formal verification quite extensively, e.g. ... verification of...floating-point unit[s]...
Not to knock mocking in general, but the need to replace multiple things at once to test one thing is just a big design smell, and I agree: their usefulness tends to be very limited in terms of logic exercised.
Call me a fuddy-dud, but I'm generally working with dependancy injection when doing OOP so stubbing out app layers for test contexts feels kinda natural, and I prefer to make my test mocks the old fashioned way: manually using inheritance. It keeps the pain high if I'm doing something wrong, keeps the tests focused, and means I'm hammering the objects design from the POV of an inheritor/reuser from the get-go.
I've noticed that the additions necessary to make a manual mock tend to be a good predictor of future changes required in the code-base, while framework based mocks tend to spend a lot of time re-declaring things we already know.
That's a terrible idea. You've now coupled your tests to someone else's code.
Test the endpoints separately. Wrap the endpoints to allow you convenient dependency injection. Test the wrapper. This wrapper is ideally a very thin layer of boilerplate, that you won't ever touch very much, and shouldn't take much time to write or test.
But this is the key, however: use the data specifying the expected outputs of your third party API as input parameters to the test of the wrapper, and expected outputs from the wrapper as parameters for your integration tests. Keep each set of data in one place, so that changing it for one test, changes it for the others.
By using shared test datasets in this way, you prevent modules from getting out of synch with each other, and can still replace/update parts in separation.
Your code is coupled to someone else's code. Which yeah, can lead to a lot of problems, but code that doesn't talk to someone else's code is usually not very useful.
The thing is, the fact that your code relies on someone else's, and that their code can change out from under you is exactly why you need separation in your tests and modularity in your code. Using the system I am describing, you are still working with 'real world' data - you are just asserting it in advance (since the real world inputs correlated with the inputs of your integration suite via the edge tests) and ring fencing your endpoints.
I've actually been part of a software team driven wild by unknown changes to infrastructure endpoints, causing failure of the integration test suite. And then by the brittleness of an integration suite too tightly coupled to inputs.
I've also worked with test suites that failed when run after business hours, but only during Atlantic DST, for the same reasons. :|
Now imagine that your tests hadn't been so tightly coupled, and there were unknown changes to infrastructure endpoints. Suddenly your code starts failing, and you can't tell why, because your test suite is all green.
It's a terrible idea not to. If your code isn't coupled to someone else's code your code isn't doing anything useful.
Moreover, you should test against the code your code is coupled to as realistically as possible, which ideally means testing against the real thing.
Paradoxically, you want your tests separate from the external inputs, because complete control over the inputs is required to systematically and programmatically hit every edge case with every pass of your test suite.
You also need the modularity in your tests and code, so you can fix and replace 3rd party endpoints without being afraid of polluting your code's inputs, or having to rewrite your integration suite. This is where the data contracts come in: they make sure that each module of the system and its partners are speaking the same language, even after they have been replaced or updated to handle new 3rd party behaviours.
A browser bug (for example) may not be our fault, but it is our problem. Therefore, we need to run our tests in real browsers to make sure it actually works, and keep running those tests on new browser releases to find any new bugs. Similarly, mobile apps need to be tested with a variety of different phones.
If you upgrade a dependency and it changes behavior, you're still responsible for fixing or working around it, hopefully before users find out. Lots of interesting bugs happen in the gaps between modules, especially when they're maintained by different teams.
A good reason to test code in isolation is that it makes diagnostics easier. However, if you only do that you're blind to a lot of other bugs.
That said, I think you miss my point - I'm not saying don't test in a real-world environment: I'm saying decouple your program logic from its inputs, and enforce shared test data between top-level edge and the integration test suites to ensure end-to-end integrity (and do the same for every other part of your code that interacts with another part of your code, fwiw).
> Test the endpoints separately. Wrap the endpoints to allow you convenient dependency injection. Test the wrapper.
So, test in the browser, or on the app. Just test the endpoints separate from your integration test, and use the explicitly stated output of integration test as the input parameters for your browser tests. I'd go far as to argue that browsers and hardware - where you have the greatest concerns about the stability of your 3rd party APIs, are actually the circumstances where separating edge testing from integration makes the most sense.
Perhaps a better way to explain my process is to think of my set of edge and integration tests as the equivalent of your 'integration' test suite, with the addition of specific contracts (the shared data) to ensure that interaction between modules is also controlled for. You are still testing end-to-end and you are still testing with real-world data, since that's what the edge tests will be returning. You are just explicitly stating what you expect at each boundary of the code.
Unless you're testing different behavior when you're testing the endpoints, what you're doing is essentially writing duplicated test code.
That means two sets of tests (add person browser test / add person API test) that will nearly always break on the same bugs. That means two sets of tests to maintain when you change the code.
It's not that you shouldn't sometimes drill down to test code at a lower level, it's that when you do, you should be drilling down to a reusable abstraction that is independent of the higher level code.
Except the endpoints are a wrapper. They aren't doing anything but acting as a pass through for injection. So, you are talking about a few minutes of boilerplate that will only change when the API changes, as a small price to pay for the ability to inject your endpoint dependencies (or their stubs) without the mess and complexity of proxying.
If it's so objectionable, handle injection some other way, without the wrapper: it's not fundamental to the idea of using data contracts to ensure correct communication between modules, while keeping the edge and core integration suites modularized.
Proxying is less of a mess and less complex than DI. It requires zero changes to your code and can be done with off the shelf tools, it will continue to work if you rewrite your endpoint in a different language and you easily can do stuff like mimic high load or flaky networks to see what happens to your system.
>If it's so objectionable, handle injection some other way
Why do DI at all if you're just doing it for testing purposes?
This is an orthogonal question to whether code should be tested realistically or not, however.
You want to test your code against real external inputs for additional realism as well as mocks that are as realistic as possible to isolate the external inputs and remove a potential source of test instability.
Ding ding: it's not either/or, it's both/and across a spectrum from dumb to smart to dumb again.
If you use a google rest API in something serious you want to be able to model the error conditions of that API, you also need to verify the current status and content of that API, you also need to verify your apps behaviour separately for both cases.
Loose coupling and DI are important to strike the right balance, but fundamentally you're either testing your solution or you're not.
Yes, that was the point I was making.
>Loose coupling and DI are important to strike the right balance
DI is useful sometimes (e.g. when you have a set of modules you want to hotswap with one another), but it's often just an overused crutch to deal with the inability of unit tests to couple to real things - like an actual REST API endpoints over a loopback interface.
The whole idea that unit tests drive "good code" by making you DI all the things is a pile of shit. Unit tests just make it painful to not do DI because unit tests are themselves a form of tight coupling.
This leads to few mocks, yet I know the expected behavior of the external services.
A strangely underappreciated facet of being a more 'real test' is that you catch more 'real' bugs.
>should your integration test actually make a (test) request to the real live service endpoints?
Ideally you should be able to run your integration test in a mode that mocks it and a mode that calls the real live service so that you can see problems in your code and (and avoid brittleness) and problems with the service, and distinguish them.
I think the worst code bases to modify are the ones with heavy use of mocking and dependency injection. You still get no feedback about the behavior when run against the real system and you have to maintain a potentially complex mock.
Complex mocks shouldn't exist. Mocks of complex interfaces should almost entire be sustainable by your domain objects if they're exposed to your testing 'datalayer'. So if they're required to test, and I've seen many codebases where that's true, then you've generally got a major design issue that is unaddressed in the main codebase.
Personally I use mocks very sparingly coupled with a few strategic stubs (ie hard coded "databases" of domain objects), and almost always code them manually by hand. That way it's very clear if I'm doing something dumb, the sytems DI is leveraged, and the mock is highlighting design issues in a re-use/inheritance context.
That's a smell for sure. I've got a codebase with lots of 'mocks' and a good amount of dependency injection, but I don't feel like it's hard to modify. In fact, the mocks and dependency injection make testing possible. But then all of the mocks are really dumb, e.g. implementing an interface in the 'dumbest' way possible.
The most complex mock behavior I have is return a value supplied as a parameter to this object's constructor. The mock then is really just a way to explicitly encode assumptions about (possible) behavior of whatever it is being mocked or injected.
But the more 'real' the test is, the more of your own assumptions get about how your dependencies work get tested. Assumptions are where most bugs are born and hide.
PS: Sorry for the multiple replies. HN is being glitchy and my previous comment kept getting truncated.
Very well put—I like this.
Integration tests are a god send when refactoring something that used to work in the first place. I'm currently switching 3rd party API providers in a system developed by somebody else long long time ago - all gritty corner cases need to continue working. The documentation is way outdated, especially the design docs. But, thankfully, the system has really good integration tests; so I'm super comfortable replacing modules one at a time, and knowing that functionality is preserved.
Integration tests have their role, and, while they shouldn't be there to validate your building blocks, they should be there to validate that the customer is getting the functionality they're paying for.
Like C++ itself it allows implementing freedom.
The result is that you get to actually test any assumptions you make.
Approximately all of the bugs I encounter in real life would have been prevented by integration tests but not unit tests. Unfortunately my organization only believes in unit testing, so 95% of our test code mostly just exercises its own mocks.
Unit tests are great for those little ... 'units' that you can feed specific test cases.
I've found that the code that needs lots of mocks to be tested is exactly that which is best covered by integration tests.
How much time and effort goes into testing and otherwise validating the mocks?
A unit test verifies that your Q-tip is made of cotton, and that the cotton is soft and small enough to fit inside an ear to a depth that will fetch wax. Another unit test might confirm that a swabbed substance is actually ear wax, by reporting the qualities of a verified sample of earwax, and maybe also a sample of candle wax as a true negative.
An integration test verifies whether you are allowed to swab a specific person's ear, that they still HAVE at least one ear (but check for both, YES. EVERY SINGLE TIME.), that the ear is healthy enough to tolerate a cotton swab, that the person will hold still, and wait for you to finish swabbing (and emergency procedures for what happens if they violently react to suddenly getting stung by a bee in the middle of being swabbed), and that the ear in fact HAS wax to swab, before swabbing.
One preliminary unit test tells you that you're not holding a knife. The integration tests do almost everything else. You need to authenticate (ask first, maybe this alone tells you they have ears... OR NOT), connect to the network resource (reach your arm out with Q-tip in hand, and approach the ear), start the transaction (apply pressure to someone else's ear), complete the transaction (extract a sample of ear wax), and check the response for your request (inspect the earwax specimen). One last unit test to make sure you got back wax, and not blood.
Sounds disgusting, right? It is. And you can keep a rubber ear in the cabinet, as a stub test target, sure. We all understand the textbook definition of the noun (earwax, integers, money).
Real ears still need to be cleaned.
Current professional wisdom is that you shouldn't even bother cleaning your ears because wax buildup is healthier and the body's own wax removal process is good enough.
It really isn't, though. If I go more than a few days without cleaning my ears they get really crusty and it gets harder to hear.
Ecommerce providers often follow a process of cycling an immaterial test product through a real credit card for pennies, with a petty cash credit card, to exercise the purchase/refund cycle, end-to-end, depending on the activities of the project. A runaway batch process with an infinite loop of purchases is bad, infinite refunds is even worse.
Sometimes these tests are needed to ensure that the service account exists on the third party system, is recognized, and has limited permissions for a fixed, restricted set of API calls. The tests might be run only during a release deployment, against live production, but more frequently against a third party sandbox host while coding. This is a very common pattern.
That sounds familiar. I'm sure that was being done at the last place I worked, an e-commerce 'agency'.
This is actually a strength of a QA team. They can build test plans and execute them on a different technology than your Dev team built the system with and on. This is also why acceptance testing is so important. You want to get as close to possible as capturing "in the wild" users using your system as you can.
Instead, I often find people that try to make "units" of code not "Money" or "Transaction" but crap like "WellsFargoTransaction" or "EuroToUSDollarTransaction" complete with separate classes for each of the things that those imply.
And the amusing dig against mutable state is... well, unnecessary? It didn't even really do anything other than signal to a certain crowd that he is on their side.
Integration tests may be a symptom of poor design. I don't see it, though. Integration tests should exist to test the assumptions that your unit tests rely on. Because, by necessity, your unit tests will be making assumptions about the rest of the systems they interact with. To claim that some "pure" design can get you away from needing to test those assumptions is interesting debate starting rhetoric, but a dangerous goal to put in people's minds.
It is funny, because I do think this is why we are constantly reimplementing things in the new languages, instead of finding ways to connect old programs.
That is, people are so worried about bugs in hooking grep up to their system, that they instead reimplement all the bugs building grep into their system. (Where grep is an easy to type example from my phone. :) )
GIVEN a function that divides one number by another, which is fully unit tested.
WHEN I integrate both functions together.
THEN I should get back a number.
This is why you need integration tests. Interaction of fully tested units does not guarantee that they can be integrated together in arbitrary ways without the possibility of bugs being introduced from the integration.
The above example integ test would reveal a DivideByZero error. This is a super simplistic integration of two simple functions, now imagine a massive enterprise system and how many bugs could be due to the integration of its parts alone, even if individually each part is one hundred percent correct and bug free.
Unit tests are for standalone modules. Integration tests are for apps. It's definitely good to factor stuff out into modules where you can, but ultimately you'll still want to check that everything works when it's all hooked up together.
I guess the alternate approach would be writing "mocks" for the submodules, so the higher-level app can be treated as a module itself, and unit tested rather than integration tested. I've found that to be difficult and error-prone compared to integration testing (or normal unit testing for that matter) -- as you say, unexpected interactions between modules can easily fall through the cracks.
BTW, nobody is saying not to use integration tests. But unit tests are faster, more stable, and easier to maintain and execute.
It's the test of the user of the function that is incomplete.
> I take this as a challenge. I’m happy to write integration tests. I insist on it. But a part of me registers the need for integration tests as a failure or limitation of my design skills. I’ll put that frontier in the back of my mind and a month or a year or a decade later bing I’ll figure out how to raise the level of abstraction, put more of the system in that happy state where I have complete confidence in it, and I’ll have new tools for thinking. Yippee!
He is trying to figure out which design abstractions are yet to be invented. It is a challenge not a criticism.
Wherever it is, it sounds wonderful.
Funnily enough, it only ever bites me when I want to read something like this. Or use some site or service that requires auth via Facebook.
The idea of logging-in to Facebook now, having not done so at all in ... maybe a year? – I just expect to be overwhelmed with the volume and underwhelmed with the significance of whatever it is my 'friends' are posting that it's just easier to never login again.
Rather, I think the essay accidentally supports something else entirely: developers easily get distracted testing the wrong things and thinking about testing as being easier, more valuable, or less fallible than it is.
All automated tests (unit, integration, functional ui) are regression tests. They are not there to find problems when you write code; they are there to surface problems when you change code.
The distinction between the types of test help you find at what level the problem lies.
One of the best skills of the troubleshooter is to break down interactions into layers, and then troubleshoot each layer.
Organizing tests in a similar fashion aids in troubleshooting.
I wish I had a way to get this across to people. Even Glenford Myers, who I otherwise have great respect for, wrote some forty years ago that a test that doesn’t find any bugs is a waste of time. Every formal test I’ve written is a “waste of time” because if there’s a bug, I already found it while writing the test. The actual test is there simply to make sure the bug doesn’t come back.
If we can’t get this simple idea right, we can’t have a productive discussion on unit tests vs $OTHER_TESTS, etc.
I think we are all agreed that the way to deal with complex problems is to break them down into smaller, independently-solvable problems. An inevitable consequence of this is that there will be at least one level of abstraction that is concerned with the interaction of little-problem solvers, and things can go wrong even if all the components are working as you think they should. Complex systems generally have emergent properties (that's what we make them for), and some of those emergent properties might be bugs.
In fact, the more you practice divide-and-conquer, the more your time will be spent on assembling units and dealing with the issues of their interaction, rather than in making them.
If some of the assemblages have broad applicability, we make them units - for example, we can use matrices to simplify a broad range of mathematical processes that would otherwise have to be done as a mass of bespoke scalar operations - but the integration testing must be done (at least once) before they can be used as units.
A system is never going to be abstract generalities all the way up, however. At some point, towards the top of your abstraction hierarchy, you will be solving a unique problem - using matrices in a control system for a specific vehicle, for example - and that is going to need integration testing.
Insofar as integration testing is a consequence of divide-and-conquer, we might say that the inability to create unit tests would be a symptom of poor design, as it would indicate that the system is not composed of self-contained elements having well-defined interfaces.
When people start arguing about "unit" vs "integration" tests, I think about Smalltalk, the home of the original xUnit testing framework: In Smalltalk, everything is an object. The constant integer value zero is an instance of a class. And your application may define new methods on that class. So in Smalltalk, you literally can't do anything without using some number of classes, many of which may have methods defined by your application. Also, in assembly language, the only abstractions handed to you are bytes, words, and various CPU status flags. The "units" you're working with are extremely low level.
So I conclude that the dividing line between "unit" and "integration" is largely arbitrary, and subject to reinterpretation, depending on the tools and libraries we happen to be using, and possibly how familiar and comfortable we are with them.
Never thought I’d be disagreeing with Kent Beck about anything in his bailiwick, but there you go.
For instance, in my company we have developed an internal tool that allows external translators to directly translate resource strings in our database and VCS. Parts of this system are in unit tests (like the parts recognizing what the language of a file is), but the interop with the VCS is not something that makes sense to isolate. Therefore the only tests for that parts are through the integration tests.
At one extreme, some people think it means that you have your entire system running and then you have some headless bot simulate clicks and evaluate the system's behavior based on the UI...
From my point of view, any test that traverses through the logic of more than one kind of object is an integration test. Primitive types don't count.
I actually find integration tests much more useful than unit tests. It's unusual for me to come across a unit of code that is so complex on its own that it needs to be tested in isolation. I think that simple units of code is actually a sign of good de
The simpler your unit of code is, the less useful your unit test becomes. By that definition, I reach the opposite conclusion as the author's.
The problem with unit tests is that they assume that most of the complexity lies in individual functions.
From my experience, most of the complexity and unpredictability of software lies in the wiring logic between components.
Most difficult software bugs are not the result of a function not working correctly - Most of the time, it is simply the result of a function not being used correctly... E.g. Someone make an incorrect assumpti
Yes, with a lot of work and time, we can come up with very good abstractions so that all code is easy to understand and follow and unit tests are all we need. But we don't have that time, nor often the skill needed. What we need is a running product, today, for our customers.
Integration tests are a crutch. Some of us need crutches. And I don't feel like I should be ashamed to need one sometimes.
Not really. He explicitly says 'I’m happy to write integration tests. I insist on it'; in other words Kent Beck is admitting he doesn't always know how to perfect the underlying layers' design so as to push 'integration tests' towards being unit tests. But if he's right about the nature of the distinction, ie. that apparent 'integration tests' are really a symptom of underlying design limitations, it's surely a good thing to be aware of?
There are many areas where I do know my own code is problematic, but for lack of time or competence I can't improve it now. I don't believe I would be better off not knowing there was a problem.
There's no shame for me in using integration tests. They just hint at an alternate universe where the design is different and they either disappear entirely or become unit tests. So today isn't the day that happens. Okay. "Perfect" is a verb.
My 'integration tests' almost always cross the boundaries of two or more (theoretically) well-defined APIs. If they're within those boundaries, well, they're not really integrating anything, so I would just consider them a unit test.
Do you have any examples of this?
I am yet to see one production-ready real-world code example from such gurus of design of the past. Kent Beck, Bjarne Stroustrup, Scott Meyers.
I think that in 2017 you shouldn't be entitled for your opinion about design if you have nothing to back up your cases. Sadly, this also covers me with this throw-away account, but bear with me for a while.
When they started their journey the world was different and chains of thought that led them to their current points of view are long obsolete.
We are living in a different world with different values. My own experience with integration tests led me to believe that I cannot afford to skip on them. There are far more moving parts nowadays and it's better to know that something introduced breaking changes from tests than from angry customers. Something along long chain of dependencies that you did not even consciously knew you are depending on.
Now, some guy who is known only for some books he wrote in times of yore comes in and tells me I am doing it wrong, only on basis of his non-existent experience and unfounded authority? We already know what comes out of unsoliticed advice like that. We wasted decades on OOP modeling, UML and tons of other things that never came off.
There are now widespread ideas that he either conceived or popularized, but even these mutated to the point of unrecognizability.
How about we will move on?
If you really meant no offense, you need to work harder at giving others the benefit of the doubt and watching out for ignorance masquerading as knowbetterness. We all need to work on these things, of course, but comments that go below minimum levels are not welcome on HN.
I too initially felt a bit weird about the insinuation that "Integration tests are a symptom of poor design", but it's worth pointing out that that's not really the title of his FB note: it's '"Unit" Tests?'. * In fact, going back and reading it a 2nd time, I can't really find anywhere where he's saying you're "doing it wrong" if you're writing integration tests - in fact he says he writes them himself. I think he's just relating a realization he had, that a test should perhaps only be considered an "integration test" if there are systems it depends on that we're not sure are bulletproof yet, i.e. we wonder how well they "integrate". For example, even our "Unit" tests rely on relatively complex systems, e.g. the interpreter of the scripting language you're using. This fits with my (limited) understanding of software testing.
It did seem like he was kind of presuming that pure functional implementations (e.g. of the Transactions objects in his example) would be so much better and more airtight that they probably wouldn't even need testing, but I'm not sure he was really implying that; perhaps he was just using that example to convey that once you really feel like you know the behavior of a subsystem, you take it for granted, and the tests right above it no longer have to "worry" about "integrating" it.
* To be fair, I guess it is KentBeck himself that posted this with that name, so that is a bit "considered harmful"-ish. Not sure what else you'd call it though. Anyways, I enjoyed his reflection, and didn't find it that preachy myself.
The point at which integration tests become important is not really a question of bullet-proof-ness. It's a question of complexity. When the underlying system has sufficiently many moving parts/degrees of freedom, at some point, you need to see what it does when your system interacts with it in a certain way.
Generally this is not necessary for integers. In most domains, you're not going to come upon any boundary conditions in the way they are transformed or compared. Transactions are also a pretty simple concept with little in the way of moving parts. If your system doesn't rely on their behavior near boundary conditions, maybe they are transparent enough that you don't have to test your system's interaction with the transaction concept. But there are a lot of underlying systems that are not like that.
Your point about up-to-date examples is well taken. Finding good examples is the hardest part of technical writing for me. As I work on Facebook and Instagram I will keep my eyes open for clear examples of the same principles, because the principles really are the same regardless of shifts in technical fashion. You'll have the opportunity to learn that in the years to come.
I guess the ultimate advice should be given by people like Linus Torvalds who run large projects.
Scott Meyers is a C++ guy. I will bet that nowadays most of C++ programmers are in game development, HFT or other computationally expensive performance oriented areas, and there is this data-oriented design movement which goes against almost everything he ever said. There are entire constructs of language he is proficient in that are never used in real world.
I’m happy to write integration
tests. I insist on it. But a
part of me registers the need
for integration tests as a
failure or limitation of my
He also almost exclusively talks and writes about C++, he's not telling you anything about integration tests.
You just seem ignorant by including his name on that list.