Hacker News new | comments | show | ask | jobs | submit login
Integration tests are a symptom of poor design (facebook.com)
71 points by KentBeck 10 months ago | hide | past | web | favorite | 103 comments

I can't agree with this. In his example (Transactions), he assumes that there is perfect knowledge of the underlying data and perfect knowledge on the assumptions made by the teams in the corner cases of each module, which is not the case in practice.

Very very often, (one of the most common source of errors in programming large systems), two parts of a program are doing the "right" thing, within their assumption, but when combined together you erroneous results.

Unit tests will not help you find them, as they cover the "is this gear/part working", but not the whole machine itself, e.g. is the engine actually running well...

In my experience, higher level system tests have a much higher ROI and real work value then simple unit tests.

Unit tests just test for made up "hypothetical" simplistic scenarios, and usually are written by the same dev. that did the functionality, and they will not expose blind spots on the implementation, while higher level system/integration tests are more likely to test on real world scenarios.

I don't think you're really disagreeing with Kent.

And I haven't found that integration (system) tests have a higher ROI – because they're usually much more expensive to write and maintain. And they can be more brittle than your actual code.

If your code depends on an external service, should your integration test actually make a (test) request to the real live service endpoints? Maybe! But the more 'real' the test is, the less of your own test it covers relative to all of the other external dependencies.

I'm not sure there's much disagreement among you and I and Kent. I can imagine always maintaining some non-zero number of integration test for suitably large code-bases.

But I think I get what he's getting at in terms of – ideally – wanting to minimize those kinds of tests by transforming previously unreliable parts of the system into something we can use "as a given" (like integer arithmetic).

> And they can be more brittle than your actual code.

The brittlest tests I've ever seen have been mock-heavy unit tests. Some of them amount to little more than file-change detectors. Those tests can fail massively even when everything still actually works. They cry wolf so often the team ends up trained to unthinkingly change them so they "go green."

> But I think I get what he's getting at in terms of – ideally – wanting to minimize those kinds of tests by transforming previously unreliable parts of the system into something we can use "as a given" (like integer arithmetic).

A noble goal, but practically not a lot different from saying we should aim for formally verified software. After all, it's what we have for computer arithmetic example: http://www.cl.cam.ac.uk/~jrh13/slides/jnao-02jun06/slides.pd... (2006):

> Intel uses formal verification quite extensively, e.g. ... verification of...floating-point unit[s]...

My experience with unit tests is that they force a developer to think through their logic. So the tests themselves,once passing don’t help that much (though a little), but the act of writing them did have high ROI. The coverage metrics also help management understand what is baked.

> The brittlest tests I've ever seen have been mock-heavy unit tests

Not to knock mocking in general, but the need to replace multiple things at once to test one thing is just a big design smell, and I agree: their usefulness tends to be very limited in terms of logic exercised.

Call me a fuddy-dud, but I'm generally working with dependancy injection when doing OOP so stubbing out app layers for test contexts feels kinda natural, and I prefer to make my test mocks the old fashioned way: manually using inheritance. It keeps the pain high if I'm doing something wrong, keeps the tests focused, and means I'm hammering the objects design from the POV of an inheritor/reuser from the get-go.

I've noticed that the additions necessary to make a manual mock tend to be a good predictor of future changes required in the code-base, while framework based mocks tend to spend a lot of time re-declaring things we already know.

> If your code depends on an external service, should your integration test actually make a (test) request to the real live service endpoints?

That's a terrible idea. You've now coupled your tests to someone else's code.

Test the endpoints separately. Wrap the endpoints to allow you convenient dependency injection. Test the wrapper. This wrapper is ideally a very thin layer of boilerplate, that you won't ever touch very much, and shouldn't take much time to write or test.

But this is the key, however: use the data specifying the expected outputs of your third party API as input parameters to the test of the wrapper, and expected outputs from the wrapper as parameters for your integration tests. Keep each set of data in one place, so that changing it for one test, changes it for the others.

By using shared test datasets in this way, you prevent modules from getting out of synch with each other, and can still replace/update parts in separation.

> You've now coupled your tests to someone else's code.

Your code is coupled to someone else's code. Which yeah, can lead to a lot of problems, but code that doesn't talk to someone else's code is usually not very useful.

> You've now coupled your tests to someone else's code.

And so your tests actually demonstrate working code!

Right, but what I'm saying is, it's even worse than that. Not only are your tests coupled to someone else's code, but your code itself is so coupled.

The trivial realities of the actual world may chill my heart to the bone, but even I can accept this sad inevitability ;)

The thing is, the fact that your code relies on someone else's, and that their code can change out from under you is exactly why you need separation in your tests and modularity in your code. Using the system I am describing, you are still working with 'real world' data - you are just asserting it in advance (since the real world inputs correlated with the inputs of your integration suite via the edge tests) and ring fencing your endpoints.

I've actually been part of a software team driven wild by unknown changes to infrastructure endpoints, causing failure of the integration test suite. And then by the brittleness of an integration suite too tightly coupled to inputs.

I've also worked with test suites that failed when run after business hours, but only during Atlantic DST, for the same reasons. :|

> I've actually been part of a software team driven wild by unknown changes to infrastructure endpoints, causing failure of the integration test suite.

Now imagine that your tests hadn't been so tightly coupled, and there were unknown changes to infrastructure endpoints. Suddenly your code starts failing, and you can't tell why, because your test suite is all green.

>That's a terrible idea. You've now coupled your tests to someone else's code.

It's a terrible idea not to. If your code isn't coupled to someone else's code your code isn't doing anything useful.

Moreover, you should test against the code your code is coupled to as realistically as possible, which ideally means testing against the real thing.

>It's a terrible idea not to. If your code isn't coupled to someone else's code your code isn't doing anything useful.

Paradoxically, you want your tests separate from the external inputs, because complete control over the inputs is required to systematically and programmatically hit every edge case with every pass of your test suite.

You also need the modularity in your tests and code, so you can fix and replace 3rd party endpoints without being afraid of polluting your code's inputs, or having to rewrite your integration suite. This is where the data contracts come in: they make sure that each module of the system and its partners are speaking the same language, even after they have been replaced or updated to handle new 3rd party behaviours.

You're missing a fundamental difference in philosophy. Why should we only care about bugs in our code? They aren't the only bugs our users care about.

A browser bug (for example) may not be our fault, but it is our problem. Therefore, we need to run our tests in real browsers to make sure it actually works, and keep running those tests on new browser releases to find any new bugs. Similarly, mobile apps need to be tested with a variety of different phones.

If you upgrade a dependency and it changes behavior, you're still responsible for fixing or working around it, hopefully before users find out. Lots of interesting bugs happen in the gaps between modules, especially when they're maintained by different teams.

A good reason to test code in isolation is that it makes diagnostics easier. However, if you only do that you're blind to a lot of other bugs.

I beleive our disagreement is fundamentally one of semantics. However, I'll concur that I have not used this method for front end or mobile or embedded code yet, and there may be problems that I haven't anticipated.

That said, I think you miss my point - I'm not saying don't test in a real-world environment: I'm saying decouple your program logic from its inputs, and enforce shared test data between top-level edge and the integration test suites to ensure end-to-end integrity (and do the same for every other part of your code that interacts with another part of your code, fwiw).

> Test the endpoints separately. Wrap the endpoints to allow you convenient dependency injection. Test the wrapper.

So, test in the browser, or on the app. Just test the endpoints separate from your integration test, and use the explicitly stated output of integration test as the input parameters for your browser tests. I'd go far as to argue that browsers and hardware - where you have the greatest concerns about the stability of your 3rd party APIs, are actually the circumstances where separating edge testing from integration makes the most sense.

Perhaps a better way to explain my process is to think of my set of edge and integration tests as the equivalent of your 'integration' test suite, with the addition of specific contracts (the shared data) to ensure that interaction between modules is also controlled for. You are still testing end-to-end and you are still testing with real-world data, since that's what the edge tests will be returning. You are just explicitly stating what you expect at each boundary of the code.

>So, test in the browser, or on the app. Just test the endpoints separate from your integration test

Unless you're testing different behavior when you're testing the endpoints, what you're doing is essentially writing duplicated test code.

That means two sets of tests (add person browser test / add person API test) that will nearly always break on the same bugs. That means two sets of tests to maintain when you change the code.

It's not that you shouldn't sometimes drill down to test code at a lower level, it's that when you do, you should be drilling down to a reusable abstraction that is independent of the higher level code.

> Unless you're testing different behavior when you're testing the endpoints, what you're doing is essentially writing duplicated test code.

Except the endpoints are a wrapper. They aren't doing anything but acting as a pass through for injection. So, you are talking about a few minutes of boilerplate that will only change when the API changes, as a small price to pay for the ability to inject your endpoint dependencies (or their stubs) without the mess and complexity of proxying.

If it's so objectionable, handle injection some other way, without the wrapper: it's not fundamental to the idea of using data contracts to ensure correct communication between modules, while keeping the edge and core integration suites modularized.

>without the mess and complexity of proxying

Proxying is less of a mess and less complex than DI. It requires zero changes to your code and can be done with off the shelf tools, it will continue to work if you rewrite your endpoint in a different language and you easily can do stuff like mimic high load or flaky networks to see what happens to your system.

>If it's so objectionable, handle injection some other way

Why do DI at all if you're just doing it for testing purposes?

What you are arguing for is that code should be loosely coupled. I generally agree with this (although, like DRY, its validity is context sensitive and one can take this approach too far).

This is an orthogonal question to whether code should be tested realistically or not, however.

You want to test your code against real external inputs for additional realism as well as mocks that are as realistic as possible to isolate the external inputs and remove a potential source of test instability.

> ...You want to test your code against real external inputs for additional realism as well as mocks that are as realistic as possible...

Ding ding: it's not either/or, it's both/and across a spectrum from dumb to smart to dumb again.

If you use a google rest API in something serious you want to be able to model the error conditions of that API, you also need to verify the current status and content of that API, you also need to verify your apps behaviour separately for both cases.

Loose coupling and DI are important to strike the right balance, but fundamentally you're either testing your solution or you're not.

>Ding ding: it's not either/or, it's both

Yes, that was the point I was making.

>Loose coupling and DI are important to strike the right balance

DI is useful sometimes (e.g. when you have a set of modules you want to hotswap with one another), but it's often just an overused crutch to deal with the inability of unit tests to couple to real things - like an actual REST API endpoints over a loopback interface.

The whole idea that unit tests drive "good code" by making you DI all the things is a pile of shit. Unit tests just make it painful to not do DI because unit tests are themselves a form of tight coupling.

I usually have this concept of integration tests covering the interaction of modules you are in control. 3rd party services are tested using something I call "edge tests".

This leads to few mocks, yet I know the expected behavior of the external services.

>But the more 'real' the test is, the less of your own test it covers relative to all of the other external dependencies.

A strangely underappreciated facet of being a more 'real test' is that you catch more 'real' bugs.

>should your integration test actually make a (test) request to the real live service endpoints?

Ideally you should be able to run your integration test in a mode that mocks it and a mode that calls the real live service so that you can see problems in your code and (and avoid brittleness) and problems with the service, and distinguish them.

"And I haven't found that integration (system) tests have a higher ROI – because they're usually much more expensive to write and maintain."

I think the worst code bases to modify are the ones with heavy use of mocking and dependency injection. You still get no feedback about the behavior when run against the real system and you have to maintain a potentially complex mock.

I think the issue you're describing, a massive variance in test and system behaviour, is symptomatic of a poor delineation of the unit/integration test dichotomy. It's almost a skill within a skill, but the moment a mock starts taking on some life or complexity is a moment you should be backing out and refactoring.

Complex mocks shouldn't exist. Mocks of complex interfaces should almost entire be sustainable by your domain objects if they're exposed to your testing 'datalayer'. So if they're required to test, and I've seen many codebases where that's true, then you've generally got a major design issue that is unaddressed in the main codebase.

Personally I use mocks very sparingly coupled with a few strategic stubs (ie hard coded "databases" of domain objects), and almost always code them manually by hand. That way it's very clear if I'm doing something dumb, the sytems DI is leveraged, and the mock is highlighting design issues in a re-use/inheritance context.

> complex mock

That's a smell for sure. I've got a codebase with lots of 'mocks' and a good amount of dependency injection, but I don't feel like it's hard to modify. In fact, the mocks and dependency injection make testing possible. But then all of the mocks are really dumb, e.g. implementing an interface in the 'dumbest' way possible.

The most complex mock behavior I have is return a value supplied as a parameter to this object's constructor. The mock then is really just a way to explicitly encode assumptions about (possible) behavior of whatever it is being mocked or injected.

> But the more 'real' the test is, the less of your own test it covers relative to all of the other external dependencies.

But the more 'real' the test is, the more of your own assumptions get about how your dependencies work get tested. Assumptions are where most bugs are born and hide.

PS: Sorry for the multiple replies. HN is being glitchy and my previous comment kept getting truncated.

> But the more 'real' the test is, the more of your own assumptions get about how your dependencies work get tested. Assumptions are where most bugs are born and hide.

Very well put—I like this.

IMHO integration tests are not there to validate the functionality that you're building - Kent is right here, if you rely on integration tests to find bugs, either your modules are either, or your design is bad (e.g. cross-module code paths that break encapsulation, etc).

Integration tests are a god send when refactoring something that used to work in the first place. I'm currently switching 3rd party API providers in a system developed by somebody else long long time ago - all gritty corner cases need to continue working. The documentation is way outdated, especially the design docs. But, thankfully, the system has really good integration tests; so I'm super comfortable replacing modules one at a time, and knowing that functionality is preserved.

Integration tests have their role, and, while they shouldn't be there to validate your building blocks, they should be there to validate that the customer is getting the functionality they're paying for.

In real life, you cannot depend on encapsulation or especially documentation. Interesting behaviour is left undefined or not documented. (This including such hairy things to test like thread safety and reentrant behaviour.)

Like C++ itself it allows implementing freedom. The result is that you get to actually test any assumptions you make.

The “benefit” of unit tests telling you exactly where the problem lies is not compelling. All tests should always be passing on master, therefore the problem lies in “git diff master.” If you can’t tell where the problem is that’s breaking an integration test, your changeset is too large or there are too many layers of indirection.

Approximately all of the bugs I encounter in real life would have been prevented by integration tests but not unit tests. Unfortunately my organization only believes in unit testing, so 95% of our test code mostly just exercises its own mocks.

> so 95% of our test code mostly just exercises its own mocks


Unit tests are great for those little ... 'units' that you can feed specific test cases.

I've found that the code that needs lots of mocks to be tested is exactly that which is best covered by integration tests.

>95% of our test code mostly just exercises its own mocks.

How much time and effort goes into testing and otherwise validating the mocks?

Networked resources have listeners. Listeners have ears. Ears have wax.

A unit test verifies that your Q-tip is made of cotton, and that the cotton is soft and small enough to fit inside an ear to a depth that will fetch wax. Another unit test might confirm that a swabbed substance is actually ear wax, by reporting the qualities of a verified sample of earwax, and maybe also a sample of candle wax as a true negative.

An integration test verifies whether you are allowed to swab a specific person's ear, that they still HAVE at least one ear (but check for both, YES. EVERY SINGLE TIME.), that the ear is healthy enough to tolerate a cotton swab, that the person will hold still, and wait for you to finish swabbing (and emergency procedures for what happens if they violently react to suddenly getting stung by a bee in the middle of being swabbed), and that the ear in fact HAS wax to swab, before swabbing.

One preliminary unit test tells you that you're not holding a knife. The integration tests do almost everything else. You need to authenticate (ask first, maybe this alone tells you they have ears... OR NOT), connect to the network resource (reach your arm out with Q-tip in hand, and approach the ear), start the transaction (apply pressure to someone else's ear), complete the transaction (extract a sample of ear wax), and check the response for your request (inspect the earwax specimen). One last unit test to make sure you got back wax, and not blood.

Sounds disgusting, right? It is. And you can keep a rubber ear in the cabinet, as a stub test target, sure. We all understand the textbook definition of the noun (earwax, integers, money).

Real ears still need to be cleaned.

That's an interesting choice of analogy given that boxes of Q-Tips warn you to never try cleaning your ears with a Q-Tip and so do most professionals: http://time.com/4290668/q-tip-ear-wax-removal/

Current professional wisdom is that you shouldn't even bother cleaning your ears because wax buildup is healthier and the body's own wax removal process is good enough.

Come on, people. Work with me, work with me.

Yeah, but it's so satisfying to see what you've produced.

> the body's own wax removal process is good enough

It really isn't, though. If I go more than a few days without cleaning my ears they get really crusty and it gets harder to hear.

Fascinatingly, you are probably suffering thus due to very specific genetics, not bad hygiene, personal variation in earwax, or an inherent quality of earwax: https://en.wikipedia.org/wiki/Earwax#Physiology

I always assumed that warning was just J&J covering their ass.

This was a good metaphor but I don't understand the last line. Our tests can never 'really' clean 'real ears' can they? [Actually, that's kind of an interesting idea. Send a real order thru your inventory, order management, etc. systems ...]

Payment processors frequently conduct "no-capture" test transactions for authorizations of possibly $1, before tokenizing CC info. Arguably, this follows the pattern of an integration test.

Ecommerce providers often follow a process of cycling an immaterial test product through a real credit card for pennies, with a petty cash credit card, to exercise the purchase/refund cycle, end-to-end, depending on the activities of the project. A runaway batch process with an infinite loop of purchases is bad, infinite refunds is even worse.

Sometimes these tests are needed to ensure that the service account exists on the third party system, is recognized, and has limited permissions for a fixed, restricted set of API calls. The tests might be run only during a release deployment, against live production, but more frequently against a third party sandbox host while coding. This is a very common pattern.

> Ecommerce providers often follow a process of cycling an immaterial test product through a real credit card for pennies, with a petty cash credit card, to exercise the purchase/refund cycle, end-to-end, depending on the activities of the project.

That sounds familiar. I'm sure that was being done at the last place I worked, an e-commerce 'agency'.

If you aren't sending real orders through your systems.. you are waiting for your customers to find and report all problems. Right?

This is actually a strength of a QA team. They can build test plans and execute them on a different technology than your Dev team built the system with and on. This is also why acceptance testing is so important. You want to get as close to possible as capturing "in the wild" users using your system as you can.

I have tests that do this :-) Well, not for an order, but with a sort-of-monolith sort-of-microservice system with email, dashboards, reporting, a JS SPA and a lambda/flask/zappa based API thats decoupled from the admin app, we want one live, round trip of all the data flowing and actions along the way... we can point it at staging, production, and I think dev (via docker-compose, but that part wasn't done last I worked on this). It relies on some special hard-coded/pre-setup accounts and stuff, but they're real, just marked as test related in their names so everyone knows not to touch them.

I think the point was to notice the continuum between completely fake stubs that only have enough "real" behavior to facilitate unit tests and a full-fledged production scenario. Integration tests are not the same as production, but they are much closer than unit tests. In integration tests as much is real as possible vs. unit tests where as little is real as possible.

And if all your tests are unit tests, you can't swap to using a sonic ear cleaner without losing your test coverage.

I'd have to register my vote on this not clearing things up. All it did was talk about relatively simple sections of code with well understood boundaries.

Instead, I often find people that try to make "units" of code not "Money" or "Transaction" but crap like "WellsFargoTransaction" or "EuroToUSDollarTransaction" complete with separate classes for each of the things that those imply.

And the amusing dig against mutable state is... well, unnecessary? It didn't even really do anything other than signal to a certain crowd that he is on their side.

Integration tests may be a symptom of poor design. I don't see it, though. Integration tests should exist to test the assumptions that your unit tests rely on. Because, by necessity, your unit tests will be making assumptions about the rest of the systems they interact with. To claim that some "pure" design can get you away from needing to test those assumptions is interesting debate starting rhetoric, but a dangerous goal to put in people's minds.

About the only such pure design would then be fully formally verified and specified, also across module boundaries. Good luck with that.

Completely agreed.

It is funny, because I do think this is why we are constantly reimplementing things in the new languages, instead of finding ways to connect old programs.

That is, people are so worried about bugs in hooking grep up to their system, that they instead reimplement all the bugs building grep into their system. (Where grep is an easy to type example from my phone. :) )

GIVEN a function that returns one or zero, which is fully unit tested.

GIVEN a function that divides one number by another, which is fully unit tested.

WHEN I integrate both functions together.

THEN I should get back a number.

This is why you need integration tests. Interaction of fully tested units does not guarantee that they can be integrated together in arbitrary ways without the possibility of bugs being introduced from the integration.

The above example integ test would reveal a DivideByZero error. This is a super simplistic integration of two simple functions, now imagine a massive enterprise system and how many bugs could be due to the integration of its parts alone, even if individually each part is one hundred percent correct and bug free.

That's what I was about to say too! But you said it better. :)

Unit tests are for standalone modules. Integration tests are for apps. It's definitely good to factor stuff out into modules where you can, but ultimately you'll still want to check that everything works when it's all hooked up together.

I guess the alternate approach would be writing "mocks" for the submodules, so the higher-level app can be treated as a module itself, and unit tested rather than integration tested. I've found that to be difficult and error-prone compared to integration testing (or normal unit testing for that matter) -- as you say, unexpected interactions between modules can easily fall through the cracks.

Of course, but the point of the original article is that if the given functions work properly in isolation, it won't feel like writing an integration test. It's just an ordinary test.

you're missing a unit test case for your division function -- the case of 0. That test will reveal the function's domain is non-zero numbers. And you know this because you got the fast feedback from your unit test. No integration test using 0 necessary unless you want to test the failure scenario.

BTW, nobody is saying not to use integration tests. But unit tests are faster, more stable, and easier to maintain and execute.

Hmm? The division function actually works as defined by throwing a floating point exception, returning one of infinities or other well defined behaviour

It's the test of the user of the function that is incomplete.

I can feel him searching for the difference, and think he is so close but I have a slightly different take on it. He says over time you become confident enough in something that you don’t bother mocking it. Others have said they don’t bother mocking primitives. I would say you don’t bother mocking values. Objects with mutation, methods that cause side effects, these things need to be mocked, pure values never do. The system clock has been around a long time, and I trust it, but it has mutation, so I will mock it. If I make a new point class that is just an immutable wrapper for two immutable integers I won’t mock it. Pure functions never need to be mocked. Immutable values never need to be mocked. These don’t need time to become trusted, they are created trustworthy.

Kent is not arguing against integration tests.

> I take this as a challenge. I’m happy to write integration tests. I insist on it. But a part of me registers the need for integration tests as a failure or limitation of my design skills. I’ll put that frontier in the back of my mind and a month or a year or a decade later bing I’ll figure out how to raise the level of abstraction, put more of the system in that happy state where I have complete confidence in it, and I’ll have new tools for thinking. Yippee!

He is trying to figure out which design abstractions are yet to be invented. It is a challenge not a criticism.

It would be a lot easier to evaluate his criticism, sorry, challenge, if he had actually provided an example of a design abstraction that did what he claimed, i.e. turned an integration test into a unit test.

A link (via Internet Archive) in case, like me, you can't access Facebook wherever you are:


>like me, you can't access Facebook wherever you are

Wherever it is, it sounds wonderful.

It's just a state of mind man!

Funnily enough, it only ever bites me when I want to read something like this. Or use some site or service that requires auth via Facebook.

The idea of logging-in to Facebook now, having not done so at all in ... maybe a year? – I just expect to be overwhelmed with the volume and underwhelmed with the significance of whatever it is my 'friends' are posting that it's just easier to never login again.

I uninstalled the Facebook apps from mobile devices, put it in the blacklist for StayFocusd and just recently enabled U2F authentication. All of that makes it just sufficiently difficult to actually get into Facebook that I only look at it once a day. Curiously Facebook seems to regard difficulty logging in as a serious show stopping bug because while testing U2F I made some failed login attempts. They emailed me several times over the next 24h asking if I needed help logging in. Mo logins mo money..

No non-trivial design is ever perfect, and tests are meant to test where the empirical reality (implementation) diverges from the design (expectation). It isn't clear at all how these two things are related in the way suggested, much less to the extent that they can be considered unified under an (as yet undiscovered) abstraction.

Rather, I think the essay accidentally supports something else entirely: developers easily get distracted testing the wrong things and thinking about testing as being easier, more valuable, or less fallible than it is.

Eh...if you have a REST API for instance, If I'm an ardent user of your API I'd sure feel a hell of a lot better if you were testing at the REST layer. I'm sure your mocks for your internal helper classes are really swell, but being a dev It really doesn't make me believe the thing is actually working.


That's a great tweet. Another one of my favorites illustrating the same point:


Integration tests are meant to exercise your external dependencies, ie. Make sure that the external API you depend on still replies that 2+2=4 as it did before. Unit tests prove that your own code is behaving consistently, and integration tests prove that your external dependencies are still behaving the same too ... The point is to assure that nothing unexpected changed, it's not a direct measurement of the quality of the design.

Why is testing so confusing?

All automated tests (unit, integration, functional ui) are regression tests. They are not there to find problems when you write code; they are there to surface problems when you change code.

The distinction between the types of test help you find at what level the problem lies.

One of the best skills of the troubleshooter is to break down interactions into layers, and then troubleshoot each layer.

Organizing tests in a similar fashion aids in troubleshooting.

All automated tests (unit, integration, functional ui) are regression tests

I wish I had a way to get this across to people. Even Glenford Myers, who I otherwise have great respect for, wrote some forty years ago that a test that doesn’t find any bugs is a waste of time. Every formal test I’ve written is a “waste of time” because if there’s a bug, I already found it while writing the test. The actual test is there simply to make sure the bug doesn’t come back.

If we can’t get this simple idea right, we can’t have a productive discussion on unit tests vs $OTHER_TESTS, etc.

I don't see the need for integration testing as being an indication of poor design, I think it is just what happens to testing as you put together your well-understood abstractions to solve larger and more complex problems.

I think we are all agreed that the way to deal with complex problems is to break them down into smaller, independently-solvable problems. An inevitable consequence of this is that there will be at least one level of abstraction that is concerned with the interaction of little-problem solvers, and things can go wrong even if all the components are working as you think they should. Complex systems generally have emergent properties (that's what we make them for), and some of those emergent properties might be bugs.

In fact, the more you practice divide-and-conquer, the more your time will be spent on assembling units and dealing with the issues of their interaction, rather than in making them.

If some of the assemblages have broad applicability, we make them units - for example, we can use matrices to simplify a broad range of mathematical processes that would otherwise have to be done as a mass of bespoke scalar operations - but the integration testing must be done (at least once) before they can be used as units.

A system is never going to be abstract generalities all the way up, however. At some point, towards the top of your abstraction hierarchy, you will be solving a unique problem - using matrices in a control system for a specific vehicle, for example - and that is going to need integration testing.

Insofar as integration testing is a consequence of divide-and-conquer, we might say that the inability to create unit tests would be a symptom of poor design, as it would indicate that the system is not composed of self-contained elements having well-defined interfaces.

I liked this bit of perspective from the comments:

When people start arguing about "unit" vs "integration" tests, I think about Smalltalk, the home of the original xUnit testing framework: In Smalltalk, everything is an object. The constant integer value zero is an instance of a class. And your application may define new methods on that class. So in Smalltalk, you literally can't do anything without using some number of classes, many of which may have methods defined by your application. Also, in assembly language, the only abstractions handed to you are bytes, words, and various CPU status flags. The "units" you're working with are extremely low level.

So I conclude that the dividing line between "unit" and "integration" is largely arbitrary, and subject to reinterpretation, depending on the tools and libraries we happen to be using, and possibly how familiar and comfortable we are with them.

I think the problem I have with this is the idea that the design is static. In practice, if you’ve got five layers, their exact behaviour can be subject to subtle change. Integration tests (not necessarily whole system tests) are invaluable for finding where changed assumptions actually break required behaviour. This, in turn, can inform a better design.

Never thought I’d be disagreeing with Kent Beck about anything in his bailiwick, but there you go.

How is this a disagreement?

If you don’t categorise it as one I guess we’re good. :)

Unit tests are great if you need to drill down and verify a very specific piece of logic. Integration tests on the other hand are very useful if you need to test the overall system, especially if testing smaller parts is not possible or does not make sense.

For instance, in my company we have developed an internal tool that allows external translators to directly translate resource strings in our database and VCS. Parts of this system are in unit tests (like the parts recognizing what the language of a file is), but the interop with the VCS is not something that makes sense to isolate. Therefore the only tests for that parts are through the integration tests.

Making Transaction mutable usually means its internal state is not 100% controlled by its implementation. The consumer of a Transaction object might be able to manipulate its state in an expected way (e.g. calling mutation methods in an unexpected ordering) and makes it invalid. So the unit test suite you write for Transaction will never cover all cases. You need to test Transaction in the context where it's used as well, e.g. the Account example. That means when you're writing tests for Account, you are actually testing Transaction at the same time. That's probably why it feels more like an integration test than a unit test.

Different people tend to have completely different ideas about what an integration test is.

At one extreme, some people think it means that you have your entire system running and then you have some headless bot simulate clicks and evaluate the system's behavior based on the UI...

From my point of view, any test that traverses through the logic of more than one kind of object is an integration test. Primitive types don't count.

I actually find integration tests much more useful than unit tests. It's unusual for me to come across a unit of code that is so complex on its own that it needs to be tested in isolation. I think that simple units of code is actually a sign of good de

...sign. Isn't that the whole idea behind keeping methods short and classes high cohesion?

The simpler your unit of code is, the less useful your unit test becomes. By that definition, I reach the opposite conclusion as the author's.

The problem with unit tests is that they assume that most of the complexity lies in individual functions.

From my experience, most of the complexity and unpredictability of software lies in the wiring logic between components.

Most difficult software bugs are not the result of a function not working correctly - Most of the time, it is simply the result of a function not being used correctly... E.g. Someone make an incorrect assumpti

The presumption here is that you are a team of Kent Beck's. Mine isn't.

Yes, with a lot of work and time, we can come up with very good abstractions so that all code is easy to understand and follow and unit tests are all we need. But we don't have that time, nor often the skill needed. What we need is a running product, today, for our customers.

Integration tests are a crutch. Some of us need crutches. And I don't feel like I should be ashamed to need one sometimes.

> The presumption here is that you are a team of Kent Beck's

Not really. He explicitly says 'I’m happy to write integration tests. I insist on it'; in other words Kent Beck is admitting he doesn't always know how to perfect the underlying layers' design so as to push 'integration tests' towards being unit tests. But if he's right about the nature of the distinction, ie. that apparent 'integration tests' are really a symptom of underlying design limitations, it's surely a good thing to be aware of?

There are many areas where I do know my own code is problematic, but for lack of time or competence I can't improve it now. I don't believe I would be better off not knowing there was a problem.

Hang on a minute. I'm both Superprogrammer and a washed up old has-been? This is getting confusing.

There's no shame for me in using integration tests. They just hint at an alternate universe where the design is different and they either disappear entirely or become unit tests. So today isn't the day that happens. Okay. "Perfect" is a verb.

I'm struggling to understand how the transition from integration test to unit test might manifest in terms of a real world refactoring.

My 'integration tests' almost always cross the boundaries of two or more (theoretically) well-defined APIs. If they're within those boundaries, well, they're not really integrating anything, so I would just consider them a unit test.

Do you have any examples of this?

Is there really such a hard boundary between integration tests, unit tests and other types of tests? If you do something that crunches only numbers and returns numbers you can probably live purely with simple unit tests. But once you get more complex systems you slowly creep into integration tests territory. I feel like it's more of a continuum.

The title here is a non-sequitur from the article.

Does this imply that unit tests are a symptom of poor coding?

Disclaimer: I mean no offense.

I am yet to see one production-ready real-world code example from such gurus of design of the past. Kent Beck, Bjarne Stroustrup, Scott Meyers.

I think that in 2017 you shouldn't be entitled for your opinion about design if you have nothing to back up your cases. Sadly, this also covers me with this throw-away account, but bear with me for a while.

When they started their journey the world was different and chains of thought that led them to their current points of view are long obsolete.

We are living in a different world with different values. My own experience with integration tests led me to believe that I cannot afford to skip on them. There are far more moving parts nowadays and it's better to know that something introduced breaking changes from tests than from angry customers. Something along long chain of dependencies that you did not even consciously knew you are depending on.

Now, some guy who is known only for some books he wrote in times of yore comes in and tells me I am doing it wrong, only on basis of his non-existent experience and unfounded authority? We already know what comes out of unsoliticed advice like that. We wasted decades on OOP modeling, UML and tons of other things that never came off.

There are now widespread ideas that he either conceived or popularized, but even these mutated to the point of unrecognizability.

How about we will move on?

Given the context, this comment is a personal attack. That's not ok here and I've banned the account.

If you really meant no offense, you need to work harder at giving others the benefit of the doubt and watching out for ignorance masquerading as knowbetterness. We all need to work on these things, of course, but comments that go below minimum levels are not welcome on HN.


No offense taken, but let me add this:

I too initially felt a bit weird about the insinuation that "Integration tests are a symptom of poor design", but it's worth pointing out that that's not really the title of his FB note: it's '"Unit" Tests?'. * In fact, going back and reading it a 2nd time, I can't really find anywhere where he's saying you're "doing it wrong" if you're writing integration tests - in fact he says he writes them himself. I think he's just relating a realization he had, that a test should perhaps only be considered an "integration test" if there are systems it depends on that we're not sure are bulletproof yet, i.e. we wonder how well they "integrate". For example, even our "Unit" tests rely on relatively complex systems, e.g. the interpreter of the scripting language you're using. This fits with my (limited) understanding of software testing.

It did seem like he was kind of presuming that pure functional implementations (e.g. of the Transactions objects in his example) would be so much better and more airtight that they probably wouldn't even need testing, but I'm not sure he was really implying that; perhaps he was just using that example to convey that once you really feel like you know the behavior of a subsystem, you take it for granted, and the tests right above it no longer have to "worry" about "integrating" it.

* To be fair, I guess it is KentBeck himself that posted this with that name, so that is a bit "considered harmful"-ish. Not sure what else you'd call it though. Anyways, I enjoyed his reflection, and didn't find it that preachy myself.

I think that caveat that you cite is operative. Although I take it in a slightly different direction than the author.

The point at which integration tests become important is not really a question of bullet-proof-ness. It's a question of complexity. When the underlying system has sufficiently many moving parts/degrees of freedom, at some point, you need to see what it does when your system interacts with it in a certain way.

Generally this is not necessary for integers. In most domains, you're not going to come upon any boundary conditions in the way they are transformed or compared. Transactions are also a pretty simple concept with little in the way of moving parts. If your system doesn't rely on their behavior near boundary conditions, maybe they are transparent enough that you don't have to test your system's interaction with the transaction concept. But there are a lot of underlying systems that are not like that.

I think that's right, that other than bullet-proof-ness, the boundary conditions and interfaces between nontrivial subsystems matter a lot and need to be integration-tested. Glad you pointed that out.

I feel like some of these comments are just reading the title and immediately posting their disagreement. I'm with you, he says he writes integration tests himself. There's nothing controversial about this post, it's actually pretty interesting.

After reading the article, I thought the person posting to HN had either misunderstood it or misappropriated it for his own agenda - but then I saw that it was apparently posted by the author himself.

There's a challenge finding the right headline when posting to HN. My last few posts had very literal headlines and went nowhere. I amped this headline up a bit while making sure it was still honest and, looky here, it got more attention. Now I have to decide how I feel about the difference.

FWIW, it doesn't bother me. Perhaps phrasing amped-up headlines as questions would forestall at least the more moderate wing of the clickbait squad?

Thank you for letting me know how I appear to you. I don't agree that my point of view is obsolete or that I am not entitled to an opinion about design. Judging what I write based on my age led to you, as other commenters have pointed out, missing the point of my post.

Your point about up-to-date examples is well taken. Finding good examples is the hardest part of technical writing for me. As I work on Facebook and Instagram I will keep my eyes open for clear examples of the same principles, because the principles really are the same regardless of shifts in technical fashion. You'll have the opportunity to learn that in the years to come.

I am on the same page with you. In my personal experience, biggest problems occur between interaction of the modules and not so much within particular module. Without integration tests there is just no way to catch these things no matter how good of an architect you are.

A long time ago we had Scott Meyers for a week long workshop. He definitely knew his stuff and showed a lot of real world experience. But with a lot of "gurus" I have my doubts. A lot of advice from Scrum gurus falls apart when it meets reality.

I guess the ultimate advice should be given by people like Linus Torvalds who run large projects.

I will never question that he knows language and its obscurities, but real world in his case is even more different.

Scott Meyers is a C++ guy. I will bet that nowadays most of C++ programmers are in game development, HFT or other computationally expensive performance oriented areas, and there is this data-oriented design movement which goes against almost everything he ever said. There are entire constructs of language he is proficient in that are never used in real world.

Back then I was at a C++ shop. He definitely knew how to deal with practical problems and wasn't just a language geek. This was more than 10 years ago so a lot of ideas from then are not fashionable anymore.

Data-oriented programming is just one optimisation technique, not a revolutionary design movement.

Did you really read the piece? The reason I ask is that what he wrote and what you wrote are essentially the exact same:

    I’m happy to write integration
    tests. I insist on it. But a 
    part of me registers the need 
    for integration tests as a 
    failure or limitation of my 
    design skills.
(That's how I feel about integration tests, too — which I always use.)

Stroustrup invented C++ and he's actively developing and designing it.

He also almost exclusively talks and writes about C++, he's not telling you anything about integration tests.

You just seem ignorant by including his name on that list.

Yeah, I doubt he writes any tests at all - unit or integration.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact