Hacker News new | comments | show | ask | jobs | submit login
Write tests. Not too many. Mostly integration (kentcdodds.com)
362 points by wslh 11 months ago | hide | past | web | favorite | 331 comments



I'd take a slightly different take:

- Structure your code so it is mostly leaves.

- Unit test the leaves.

- Integration test the rest if needed.

I like this approach in part because making lots of leaves also adds to the "literate"-ness of the code. With lots of opportunities to name your primitives, the code is much closer to being self documenting.

Depending on the project and its requirements, I also think "lazy" testing has value. Any time you are looking at a block of code, suspicious that it's the source of a bug, write a test for it. If you're in an environment where bugs aren't costly, where attribution goes through few layers of code, and bugs are easily visible when they occur, this can save a lot of time.


I have adopted the same philosophy. A few resources on this, part of the so-called London school TDD:

- https://github.com/testdouble/contributing-tests/wiki/London... (and the rest of the Wiki)

- http://blog.testdouble.com/posts/2015-09-10-how-i-use-test-d...

- Most of the screencasts and articles at https://www.destroyallsoftware.com/screencasts (especially this brilliant talk https://www.destroyallsoftware.com/talks/boundaries)

- Integration Tests Are A Scam: https://www.youtube.com/watch?v=VDfX44fZoMc

All of these basically go the opposite way of the article's philosophy:

Not too many integration tests, mostly unit tests. Clearly define a contract between the boundaries of the code, and stub/mock on the contract. You'll be left with mostly pure functions at the leaves, which you'll unit test.


Thanks for the links, they make sense - I've always had trouble with blind "you should unit test" advice, but especially the video explains the reasoning very well :)


I’ve been practicing TDD for 6 years and this is exactly what I ended up doing. It’s a fantastic way to program.

My leaves are either pure functions (FP languages) or value objects that init themselves based on other value objects (OOP languages). These value objects have no methods, no computed properties, etc. Just inert data.

No mocks and no “header” interfaces needed.

On top of that I sprinkle a bunch of UI tests to verify it’s all properly wired up.

Works great!


- Structure your code so it is mostly leaves. - Unit test the leaves. - Integration test the rest if needed.

Exactly. You expressed my thoughts very succinctly. Though I feel the post tries to say the same just in a lot more words.


I didn't get that from the post at all, I thought the post advocates mostly for integration tests and I didn't see anything about refactoring code to make unit testing easier.


This is my exact mentality as well! In fact, I like it so much that I apply it to system design as well. Structure the pieces of code into a directed acyclic graph for great success. A tree structure that terminated leaves is a DAG.

https://en.m.wikipedia.org/wiki/Directed_acyclic_graph


> Integration test the rest if needed.

Is there any situation where there is integration, but no need to test it?

You seem to be suggesting that if the leaves are thoroughly tested, nothing can go wrong in their integration, but at the same time, I cannot imagine someone believing that.


Exactly, most bugs I see are in integration, mis-matches in data-models or state. But I also work on business-y applications which tend to be more integrations than local business logic.


Integration tests are always needed in some form, because you need to make sure the leaves are actually called, since unit tests are executed in a vacuum, the functions might work but might never be called at all, or might not work because of weird bugs that appear when only testing the whole tree.


Anyone can give me an example or explain a little more about "Structure your code so it is mostly leaves."?


Leaves in this context would be classes that have no dependencies.

If you need to create an object, can you pass the name of the class in? Or can the object be created elsewhere and passed in fresh? If you're making a call to a remote service (even your local DB) are you being passed a proxy object?

All of these references can then be provided as a test double or test spy, so long as they are strict about the interface they provide/expect, and you can exhaustively cover whatever internal edge cases you need with unit tests.

Don't _forget_ the integration tests, but my personal opinion is that it usually suffices to have one "success" and one "error" integration test to cover the whole stack, and then rely on unit tests to be more exhaustive about handling the possible error cases.


This is very interesting. I'm not 100% sure I understand. Any example of this or resources on this style?


So much people in this thread is talking about different domains and are not able to see that they need different rules.

It is not the same creating a library that is going to be used to launch a multi-billion rocket to Mars than developing a mostly graphical mobile app where requirements are changing daily as you A/B test your way into better business value.

The article has really good points and the reasons why they work. Apply them wisely. Take the right decision for your project. Don't be dogmatic.


I've programmed in many domains. There are only three places where I don't use heavy testing:

1. Very, very graphical programming - like SVG charting with animations. If it were static generation it would be easy, but throwing time into the mix makes the tests really hard plus if things go wrong in the future people can literally see it going wrong and complain, so I don't think it is worth the trouble.

2. Data analysis meant for static reporting. You know, those 2000 line SQL queries that barf out data that you pop into excel to munge through before typing up a 20 pager for upper management.

3. Small personal tools, like a CLI script that spits out equivalent yearly interest rates or what have you.

Everything else I test. Libraries, backend web apps, machine learning shit, compiled, whatever. It is too easy for codebases to turn into a hellscape without tests. You get too afraid to change things.

Should it change the API to the codebase? I usually don't think so, but occasionally I'll, say, make something an instance variable that I'd normally keep as a locally scoped variable. So I model test what I can and I integration test the rest. I think that he's right that integration tests cover a lot of functionality and I think he's right that mocks aren't usually great, but I think he's wrong about how much testing we should be doing. Most things should be tested most of the time.

It's cheaper. Why?

Because it costs money to hire support people and it costs money to validate their tickets and it costs money to fix the bugs and the bugs are harder to fix when the data is already in the system and the data is wrong.

It's also more profitable to write more tests. Why?

Because bugs mean lost customers and even when you keep the customer the feedback that you get from them is which problems you need to fix, not which functionality could be made better.


The fight against the TDD cargo cult is vicious, barely started, and far from over.


Please don't do this here. If you have a substantive point to make, make it thoughtfully; if you don't, please don't comment until you do.

https://news.ycombinator.com/newsguidelines.html


Good lord. Why integration tests?

    I think the biggest thing you can do to write more integration tests is to just stop mocking so much stuff. 
Okay. The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

So they try to write E2E tests which work for about 5 or 6 quarters and then fall apart like a cheap bookcase. If you can find a new job before then, you never have to learn to write good tests!

I agree with the author that the trick is to stop using mocks all the time, but you don’t have to write integration tests to get rid of mocks. You have to write better code.

Usually if I have a unit test with more than one mock it’s bcause I’m too invested in the current shape of the code and I need to cleave the function in two, or change the structure of he question asked (eg, remake two methods into two other methods).

Almost always when I accept that the code is wrong, I end up with clearer code and easier tests.

Unit tests run faster, are written faster, and not only can they be fixed faster, they can be deleted and rewritten if the requirements change. The most painful thing to watch by far is someone spending hours trying to recycle an old test because they spent 3 hours on it last time and they’ll be damned if they’re going to just delete it now.


> The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

The biggest problem I see with people advocating for tests and employing TDD is that they do change how they write code to accommodate tests. This leads to inclusion of lots of unnecessary abstraction and boilerplate patterns that make code less readable and more bug-prone. OO world has spawned numerous non-solutions to turn your code inside-out so that it's easier to mock things, at the expense of code quality itself.

That said, if you go for functional style in OOP, i.e. shoving as much as you can into static helper functions and most of the rest into dumb private stateless functions, you suddenly gain both a clean architecture and lots of test points to use in unit tests. So you can have testable code, but you have to chill out with the OOP thing a bit.


> as much as you can into static helper functions and most of the rest into dumb private stateless functions

In our work we use C# and it is very hard, even next to impossible to make a static class pass a code review - given it's not for extension methods (which I hate... why not be explicit about the first parameter and stop acting as a part of the class </rant>). They just tell us to use IoC and move to the next point. I honestly don't know why. Our IoC library can treat a dependency as static or singleton, but those are also discouraged. Once I had a static class named GraphRequestHelpers* and the reviewer got really negative, FSM knows why. She told me that we need IoC to make everything testable and "Helper" in the name is a code-smell. Sounds cargo-culting to me but I have only 6 years of experience so who I am to know.

* Now we have RequestExtensions and everything is apparently perfect.


There is some cargo culting there but it's mostly correct.

Helper is a code smell as it's a sign of "we don't know what the responsibility of this is or what to call it so we'll just chuck a load of shit in this file and call it a helper". The methods in should belong to something and live on that class, not in an external class.

RequestExtensions is more shit than the original solution. Extension methods are even worse! Shoot the reviewer.


This is a matter of taste not fact. In functional languages the style is compositional with static functions everywhere. It works well. The keeping data and methods together thing is one approach. Sometimes it's great. Sometimes unnecessary.

For example would you argue against string formatting helpers? Or would they need to be written to an interface and added to myriad DI bucket lists?


It's not that simple and it's not a fact. I'm an advanced user of functional languages as well and have written an entire scheme implementation before. I only semi-agree. That's slightly disingenuous representation of functional languages which have more than a few pitfalls. They certainly aren't the silver bullet and they really do not scale to the same height and complexity of the problem domain as the OO languages do due to the nature of the abstraction you describe. Nothing is particularly explicit. I'd rather take the compromises of OO over the maintenance problems of a functional language.

String formats are data so they would be stored as constants so that they are interned. They can be stored in a const class which is a static class with no methods i.e.:

   sealed class StringFormats {
       public const DateFormatX = @"...";
   }
Also string formats for example tend to be owned by the respective objects so you can add overloads to the object to provide certain arbitrary representations. If the translation between an object and the string representation is complex, then you're really serializing it so that should be an abstracted concern.


To be clear I'm not saying functional is always better. I'm saying there are other possibilities and dogma is bad.

The Haskell community has its dogmas too. And it's fair share of "let's do this simpler" blog posts.

As an aside Haskell has many equivalents of dependency injection and sugars to help and you can "inject all the things" there too.

My point is to think of the problem you are trying to solve, rather than ticking off the SOLID / Martin Fowler etc. tick boxes.

Understanding OO patterns, SOLID etc is a good thing but being prepared to "break the rules" is good IMO too.


Breaking the rules is good when appropriate. Problem is those rules are pretty amazingly good. I went through a weird phase of denial and ended up back where I started before I applied the aforementioned rules.

Every exception I have shot myself.


> Shoot the reviewer

Duly noted! Although I'll try talking to her first, I'm sure there's more behind the decision :)

One of the methods that was inside takes a request, extracts the body and returns the parsed graph from the body. It's used by many controllers from many projects. I don't know where to put such a thing, hence the request extension.


Always ask for the reason before slating it :)

Usually that's a single responsibility class:

   interface IGraphParser {
       Graph Parse(Request request);
   }
Inject that into the caller via the container then you can mock the thing that calls it and just return a static Graph object, which you can't do with a simple extension method (which is why it sucks).


Extension methods are useful for only one reason: they trigger code completion for browsing what this object can do. Static methods suffer from FP code completion problems (you can’t complete easily in the first arg of a function/procedure).


I think I am not mistaken in saying extension methods, like lambda functions, were invented primarily for the use case of Linq. Even if they weren't, that's how Linq is implemented, so extension methods serve more than that "one purpose" if you don't insist on writing C# in the style of C# 2.0.


They came out at the same time, I’m sure there was some influence between them (Mads Tergesen would know better). However, all the functionality added in could have been done with static methods, just with more verbose syntax. LINQ query syntax could have been special cases. Anyways, I like what they came up with, it’s very versatile.


It's not clear that would have been much less work


Why hate extension methods? Do you really want to write Enumerable.ToList(Enumerable.Select(Enumerable.Where(someList, e => e.someBool), e => new {a = e.x, b = e.y)) and so on?


That would suck, on the other hand, the extension methods make people create huge chains continuations of which comes from who knows where.

Best solution would have been a pipe operator if you ask me.


Could someone expand IoC for me please?



Do you practise TDD? If you did a lot of this would make more sense to you. TDD is actually quite fun when you get the hang of it (less mental burden as you push all the 'intent' onto the computer).


I don't see why TDD requires ruling out static methods and insisting on hiding everything behind an interface. Static methods are straightforward to test, certainly more than a class with multiple dependencies which need to be mocked. Usually the complaint is about coupling when calling static methods but these can be wrapped in a delegate if required.


Simply because you can't mock the static dependency, therefore that method is now dependent on the static class and you don't have any control over it. This is problematic - what if at some point later another developer adds a database call into the static method to do some logging? Now your testing will dirty whatever database you're using, as well as run 10x slower - and yet the test will still pass and everyone will be none the wiser as to what happened.

If you start using a custom delegate solution, then your code is not consistent with everything else that uses DI, making it harder to understand. I can understand interfaces are annoying when navigating code, but the IDE still helps with that even if it is a few more button clicks, and the pros outweigh the cons.


> that method is now dependent on the static class and you don't have any control over it.

I don't see how you have any less control over it than any other code you wrote. If you don't want it to write log statements, then don't do that. Most static methods are small and pure so don't need to write log statements anyway.

> Now your testing will dirty whatever database you're using, as well as run 10x slower.

I've never used a logging framework that didn't allow you to configure where log statements were written, or give you control over the logging threshold for individual classes. However if your method is writing logs then presumably there is a reason, which is just as useful in the tests. If you mock it out then you're testing against different code to the one you will actually run against.

> If you start using a custom delegate solution, then your code is not consistent with everything else that uses DI.

Passing functions as arguments directly is 'DI', just without the need to configure that through an external container. Reducing the amount of interfaces (often with a single implementation) and external configuration makes navigating the code easier.


I think you missed my point, it's not about the logging framework, its about the fact you don't control an external dependency during testing. Unit tests are meant to be reproducible, meaning they are done under controlled conditions.

> Most static methods are small and pure

This is very assuming, tests are a way of being specific about your intent.


> its about the fact you don't control an external dependency during testing

If your code is structured using small static functions, you don't have any dependencies in the first place, just arguments you are passed and transform. You will probably create interfaces for external services you depend on, but you can avoid needing to mock them if you express the transform directly.

> This is very assuming

I'm not assuming anything, since I wrote the static method and I also decided to call it, presumably for the result it calculates. Your argument appears to be that static methods could contain bad code but that applies to all code you depend on.


You mean that the tests will depend on the thing being tested? What a crime!

> what if at some point later another developer adds a database call into the static method to do some logging?

Then you have a developer that does not grasp the idea of functions, and how they can help you improve your code. That's a call for education, not for changing your tests.


The point is that tests are omnipresent, people aren't. I've worked at places where all sorts of dumb code has got through because there is no automation in place to stop it, and everyone else is too busy to do code reviews.


In Java, you can use PowerMock to mock or spy anything, even private static final things. I consider it a smell (though excessive mocking even without powermock is its own smell), but it's immensely valuable to get code you can't change (or fear changing because of its complexity and lack of tests, or simply don't have time to change because the refactoring would take a whole sprint) to have some tests.

You don't need interfaces for everything in order to do DI. Interfaces should be used only for having multiple implementations or to break dependency loops.

Other than that I'm in agreement, static methods generally aren't a good idea. They can all too easily balloon into big chunks of imperative code with inner dependencies (static or not) at 10 indentation levels deep. Non static methods can too, but not as easily, and you have more options for fixes/workarounds in those cases anyway. The only place they really make sense is as part of a set of pure primitive data transforms, and ought to be small.


In C# we have Moq that can mock normal classes, though it requires adding 'virtual' to every method you want to override which is a code smell too. In Java everything being virtual by default I guess it doesn't matter. We like to always keep an interface around as it gets the developer used to working that way and keeps the code consistent. Visual Studio provides a quick shortcut to auto gen the interface too.


> Now your testing will dirty whatever database you're using, as well as run 10x slower

It sounds like a problem is a few layers higher. Why is there a live database in your unit testing environment? Why are working credentials configured? If they're unit test, not integration tests, all db operations should be DId / mocked / whatever. Any call that isn't should fail, not take longer time. Db interaction is for the integration tests.


That's exactly my point, mock your external dependencies. Static calls don't allow you to do that.


In your language of choice:

    static_function(db, other, arguments) { ... }
    test { static_function(fake_db, 1, 2) }
You can even omit the db in the standard case if your language allows default keyword arguments. In almost every language, a method is just a fancy static call that takes extra arguments implicitly. (Closures are poor man's objects, objects are poor man's closures...)


So how exactly do you test that your SQL query does the right thing? That you're using the Twitter API correctly?


Testing a database, or an external web service, is an integration test. They can be as simple as:

    void TestCreateUser() {
        var repo = new UsersRepository();
        var mockUser = new User("John", "Smith");
        repo.AddUser(mockUser); // db call
        var addedUser = repo.GetUsers().Single(); // db call
        Assert.StructureIsEqual(mockUser, addedUser);
    }
For the Twitter web service, you might test that you successfully get a response, as you don't have control of what exactly comes back.


How is static code different from other noninjected code, like stuff in a method. Taken to the logical conclusion we'll have thousands or classes full of max 2 operations per method.


How many static classes are your methods using? And what is the problem with injecting this stuff at the top of the class instead? If you plan to write tests, you have to control your dependencies, and DI is the simpliest way to do that.


Because these tests become too detached from reality if you inject everything. A silly example to make the point:

IAdder { double Add(double a, double b) }

Test:

car mockAdder = ... // Mock 1+1=2 etc.


um... yes... this is actually what "pure" OO involves. The only reason we don't do this is because it's a nightmare to manage.


“What if ...” doesn’t pass YAGNI.


Problem is that the moment you start introducing delegates and crap like that is you're inventing a mechanism to work around your resistance to not using static methods rather than actually solving any problems.

There is no functional difference between a class with static methods and a class without, of which one instance is available to other classes.

Other than the fact that it isolates state, allows mocking and substitution and testing.


I disagree that delegates and higher-order function are 'crap' or in any way more complicated than introducing interfaces that are injected though a centralised container. You could just as easily turn that argument around and say mocking and an overuse of interfaces come from your resistance to using small static methods. In C# Linq is almost entirely based on static methods and delegates and that is not harder to test as a result.

Static methods usually don't rely on any hidden state at all. The example originally given was for a graph operation which could just take the input graph as an argument and return the result. When your code is composed of small independent functions you don't need mocking and substitution at all. In my experience most uses of mocks come from functions that do too much in the first place.


TDD is completely orthogonal as to whether you write functional code that doesn't have a default receiver for routines, or OO code.


Yeah there is some cargo cult aversion towards statics.

Static methods with no side effect are wonderful, but static state is really bad and static methods which perform IO are horrible because they cannot be mocked in a unittest.

But some people miss this distinction and just say static methods are bad for testing.


C# is the new Java... facepalm


Rifle is the new pistol? You can shoot yourself with both?


facepalm of enlightenment?


The abstraction is consistent though, and familiarity is a good thing when navigating a codebase which has N amount of other devs pushing to it every day.

I practise TDD for peace of mind - if I add new functionality to existing code I can be 99.9% sure I haven't made any regressions. When a client's system goes down on a friday, I can 99.9% guarantee it wasn't my code that is at fault. If I have to work at the weekend to update a production server, I'm 99.9% sure it'll go smoothly as my tests say it will.


Exactly this. Confidence is everything.

I can actually write entire features with appropriate test coverage from the ground up and they work first time and have close to zero defects in production.

It's amazing when you spend 5-6 days writing code that does nothing and at the last moment, everything slots together with a few integration tests and wham, feature done. Not talking trivial stuff here either; big integrations across several different providers/abstractions, bits of UI, the lot.

You see a lot of people arguing against this but I'm going to be honest, they churn out a lot of stuff that doesn't actually work.


> You see a lot of people arguing against this but I'm going to be honest, they churn out a lot of stuff that doesn't actually work.

My anecdata cancels out your anecdata. The TDD practitioners that I've met have, without exception, written code that worked fine for only the one case that they've tested. Example: They'd test a method for sending a message with the string "hello". Turns out the method didn't URL-encode the message before POST-ing it, and sending anything with a space was broken. They were confident and pushed the change.

Not saying you're wrong, just that TDD doesn't seem to work for everybody, and can even be a distraction.


That will be because you worked with dumbasses.

If you only test the expected outcome you are a dumbass.


How do you deal with http interactions in test? Do you mock it or maybe save it to disk to replay later?


That's an integration test really. The clients all have an abstraction around the http endpoints so nothing touches integration in unit tests. The advantage of this is you deal with transfer objects only in the code, no HTTP which would violate separation of concerns.

I use HttpMock myself in test cases which fires up an http server for personal projects. We use Wiremock commercially.


Here is how I handle it

1) Write a test that runs the service and saves output to a file. 2) Mock out the call to just return the data from the file and validate results. 3) If you need variations on this data just modify the file/data (often as part of the test)

I usually leave number 1 in the code but disabled since it often relies on remote data that may not be stable. Having the test run more than once is not very beneficial but being able to run it later and see what exactly has changed is great.


I also practice TDD, but with a different 'T' - Type Driven Design. I find them much easier to reason about with types and safer (you can't compile your code if it doesn't pass the type check). Just model your data as ADT and pattern matching accordingly.

Of course, types alone can't represent every error cases out there (especially the one related to number or string), so I still write Unit Test for those cases. But the number of Unit Tests needed is much lower.


That would be ATDD - Abstract Types Driven Design.

It completely fails on concrete types. That's the reason haskellers create a DSL for everything.


In this case, what's the difference if you write the test before or after though? You would still be covered. I don't lean in either directions in this argument, just curious to understand.


If you write the tests afterwards, you find where all the tight coupling and crap abstractions you accidentally did are afterwards.

That leads to you either not writing tests or having to refactor bits of it. It's easier to do this much earlier on.


The difference is night and day - writing tests first means you write 'testable code' from the beginning. Following the red, green, refactor mantra means that for every change to your code, you already have a failed test waiting to pass. The result is your test cases make a lot more sense and are of a superior quality.

To liken it to something you may be familiar with - when commenting your code, do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written? I'm sure you immediately know which approach results in better quality commenting, and it's the same with TDD.


> To liken it to something you may be familiar with - when commenting your code, do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written? I'm sure you immediately know which approach results in better quality commenting, and it's the same with TDD.

Not to take the analogy too far, but usually when writing a chunk of code I can keep it's behaviour in my head for a good amount of time and find it's best to add comments at the "let's clean this up for production" phase when you can take a step back and see what needs commented. If you comment as you go, you'll have to update your comments as the code changes and sometimes throw comments out which is a waste of time.

Likewise with tests, I'm not saying write them far into the future, but I think having to strictly stick to red/green/refactor is going to waste time. What's wrong with writing a small chunk of code then several tests when you're mostly happy with it? Or writing several tests at once then the code?


People just don't write comments or tests after, that's the problem. If you do then that's fine, but after trying both routes I actually find TDD to feel like less work - not having to wait on large build times and manually navigating the UI actually makes for a more fun experience. Instant feedback being the fun part. Additionally writing tests 'after' always feels like work to me and I end up hating it, especially when I didn't write it in a testable way to begin with.


> People just don't write comments or tests after, that's the problem.

Doesn't that get caught in code review anyway though? I find being forced to write tests first can be clunky and inefficient. Also, I've worked with people who insist on the "write the minimum thing that makes the test pass" mantra which I find really unnatural like you're programming with blinkers on. TDD takes the fun out of coding for me sometimes.

Generally I'd rather sketch out a chunk of the code to understand the problem space better, figure out the best abstractions, clean it up then write tests that target the parts that are most likely to have bugs or bugs that would have the biggest impact.

I find when you're writing tests first, you're being forced to write code without understanding the problem space yet and you don't have enough code yet to see the better abstractions. When you want to refactor, you've now got to refactor your tests as well which creates extra work which discourages you from refactoring. When the behaviour of the current chunk of code you're working on can still be kept in your head, I find the tests aren't helping all that much anyway so writing tests first can get in the way.


What you describe is the typical mindset against TDD, it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it, as TDD benefits are often not apparent to begin with, they only come after a couple of days work or weeks or months later or even years later.

You find that you need to do less mental work, as your tests make the required abstractions apparent for you. 'the minimum thing that makes the test pass' ends up being the complete solution, with full test coverage. Any refactoring done is safe from regressions, because of your comprehensive test suite. And when other colleagues inevitably break your code, you already have a test lying in wait to catch them in the act.


> Any refactoring done is safe from regressions, because of your comprehensive test suite.

As much as I like the idea of TDD, I have a problem with this part. When some refactoring is needed, or the approach changes, it seems like you have two choices. One is to write the new version from scratch using TDD. This wastes extra time. The other is to refactor which breaks all the guarantees you got before. Since both the code and the tests are changing, you may lose the old coverage and gain extra functionality/bugs.

And unfortunately in my experience, the first version of the code rarely survives until the deployment.


I'm not sure what approach you've described here, but it isn't TDD. In the case of adding new features to existing code, as you are continually running tests you will know straight away which you have broken. At this point you would fix them so you get all green again before continuing. In this way you incrementally modify the codebase. Remember unit tests are quite simple 'Arrange, Act, Assert' code pieces, so refactoring them is not a time sink.


refactoring != adding new features.

Also some refactorings are easier with tests, some are harder.

The kind @viraptor mentiones is the kind that spans more than one compoment. For example when you decide that a certain piece of logic was in the wrong place.

The kind of refactoring that becomes easier is when you don't need to change the (public) API of a component.

Take for example the bowling kata. If you want to support spares and strikes and you need extra bookkeeping, that's the easy kind of refactor where your tests will help you.

But if so far you have written your tests to support a single player and now you want to support two players who play frame by frame... Now you can throw away all the tests that affect more than the very first frame. (yes in the case of the bowling kata, you can design with multiple players in mind, but that's a lot harder in the real world when those requirements are not known yet)


> What you describe is the typical mindset against TDD, it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it, as TDD benefits are often not apparent to begin with, they only come after a couple of days work or weeks or months later or even years later.

I've been forced to follow TDD for several years and also been given the same kind of comments to downplay any reasoned arguments against it which I find frustrating to be honest. I don't see why the benefits wouldn't be immediately apparent.

> You find that you need to do less mental work, as your tests make the required abstractions apparent for you. 'the minimum thing that makes the test pass' ends up being the complete solution, with full test coverage. Any refactoring done is safe from regressions, because of your comprehensive test suite. And when other colleagues inevitably break your code, you already have a test lying in wait to catch them in the act.

You can do all of the above by writing tests at the end and checking code coverage as well.


"Any refactoring done is safe from regressions, because of your comprehensive test suite. "

With the right tests this works great. I have also seen the opposite where a test suite was extensive and tested the last details of the code. Then the refactor needed more time to figure out what the tests are doing than the actual refactoring. As often, moderation is the key to success.


Unit tests should follow a simple 'Arrange, Act, Assert' structure and test one single thing, described in it's title. I agree anything too complicated starts to defeat the point, especially when we are mainly after a quick feedback loop.


How to implement an SDK that performs real-time manipulation of audio without really knowing how to describe the correct result using TDD?

Only when I understand the problem, the SDK and the expected result I could start writing some tests.

TDD seems to be for business applications where you're using tried and tested technology to implement new problems.


> it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it

Scientologists make the same argument


Maybe writing code for exploration and production should be considered separate activities? The problem with these coding ideologies is that they assume there is only one type of programming, which is BS, the same as assuming a prototype is the same as a working product.


What's exploratory programming though? Unless you're writing something that's very similar to something you've written before and understand it well, most programming involves a lot of exploration.


Well, UX prototypes for one. In research, most projects never go into production, those that do do so without researcher code. Heck, even in a product team, if you are taking lots of technology risks in a project, you are going to want to work those out before production (and it isn’t uncommon to can the project because they can’t be worked out).


To be fair, TDD is quite good for exploratory programming, as it makes you think about the intent or your API up front.


Not really. It makes you commit to an API upfront, this is the exact opposite of what exploratory programming should be (noncommittal, keep everything open).


No with TDD you don't need to go in with a structure in mind, the structures arise as you write more tests and get a proper understanding of what components you'll require. Red, green, refactor - each refactor brings you closer to the final design.


That's the mantra often quoted but it always makes me think of the famous Sudoku example from Ron Jeffries. Basically as a mantra it falls down if you don't understand the problem domain. It's popular because it works for the sort of simple plumbing that makes up a lot of programming work. This problem is particularly true for anything creative you're trying to express as the requirements are often extremely fuzzy and require a lot of iteration.

If you don't know how to solve a problem you actually need to do some research and possibly try a bunch of different approaches. Over encumbering yourself with specific production focused methodologies hurts. If you're doing something genuinely new this can be months of effort.

After the fact you should go back and rewrite the solution in a TDD manner if you think it benefits your specific context.


> After the fact you should go back and rewrite the solution in a TDD manner if you think it benefits your specific context.

Why? Why not add tests to your existing code?


Because it was never meant for production at all.


That really isn’t exploratory programming. The end result should be code that you throw away en masse (it should in no case reach production). Otherwise, production practices will seep in, you’ll become attached to your code and the design it represents, hindering progress on the real design.

When I was a UX prototyper, none of my code ever made it into production.


>I find when you're writing tests first, you're being forced to write code without understanding the problem space yet and you don't have enough code yet to see the better abstractions.

That's why it's better to start with the highest level tests first and then move down an abstraction level once you have a clearer understanding of what abstractions you will need.


Can you do that with TDD though? Why not just sketch the code out first before you start writing tests?

I find TDD proponents don't take into account that writing tests can actually be really time consuming and challenging, and when you've got a lot of code that is tests, refactoring your tests becomes very tedious.


You can do that with TDD (it's called outside-in), although I agree that it is time consuming and challenging, especially without the right tools.


>do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written?

Define "all written". If we are talking about a new function - obviously you write you comment for it after the function ready to be commented on. And obviously you won't be commenting every string you put there, right?

Now, if we are talking about the whole new feature, that can consist of many functions and whatever - yeah, you usually comment your code in the process of writting the feature, rather than doing it at a later time, which will never come.


Comments / tests are a bad analogy. HN will over index on this and go down a rabbit hole.


I also find when following red, green, refactor that you end up producing more targeted unit tests that are more expressive of the code you are testing.

Trying to write unit tests afterwards lands me with something that appears as more of an afterthought or add on. It doesn't have to be this way I suppose, but it is more prone to.

This might be because I am more used to the red, green, refactor method though.


Is GUI code that 0.1%?

Because I am always keen to understand how to TDD GUI code and I don't mean the data model behind the pixels.


Visual tests are more general, and are more akin to putting up barriers on either side of a bowling lane so the bowling ball stays within it's lane (with room to move about still). For example when using Angular, you write 'Page Objects' that have methods such as .getTitle(), .clickListItem(3) and so on, and can then write assertions to make sure the UI changes as expected by inspecting properties [1].

I usually find I build a general page object first ('this text is somewhere on the page'), then write the UI, then make the test more specific if I can after (but it's an art, as too specific and you risk creating too many false negatives when you make UI changes).

(Also as you are interacting with the UI, these would be known as integration tests.)

[1] https://semaphoreci.com/community/tutorials/using-page-objec...


I don't think you can unit test GUIs, since by their nature all tests end up being integration tests. It's easy if you assign non-css (i.e. use a data-* attribute for identification instead of id or class since you want to keep those variable for stylesheet refactors) identifiers and just hard code the assumptions into the tests, like "when x is clicked y should be visible", or "when I enter 'foo' into the text field, the preview label should contain 'foo'". Ideally your assumptions about GUI functionality shouldn't change much throughout the lifetime of the project, and if you use static identifiers your tests should hold up during extensive refactoring.


That is my point of view, since my focus are mostly native GUIs.

Which is why it is my main question on TDD talks, that always use some library or CLI app as example.


To a certain degree you can unit test GUIs with tools like Ranorex or Selenium. The question is how much setup you need to get the GUI on the screen with the right data.


That isn't unit testing though, it's integration or e2e testing.


You can, but usually with lots of effort and cannot test UX and design requirements anyway, which is why I tend to make this question about full TDD based processes.


We use Ranorex for this: https://www.ranorex.com/


Interesting, thanks for the link.


I think so. I don't know how to unit test GUIs either.


I saw an enjoyable talk recently about snapshot testing. I don't know too much about testing generally but it seems like it could be relevant: https://facebook.github.io/jest/docs/en/snapshot-testing.htm... is the general idea but it doesn't have to be confined to jest/react

Edit: Slides from the talk I saw: http://slides.com/bahmutov/snapshot-testing


Past my edit window, but I want to add this - I feel that I introduced some confusion by missing one magic word in one special place. The first sentence of the last paragraph should be:

That said, if you go for functional style in OOP, i.e. shoving as much as you can into stateless static helper functions and most of the rest into dumb private stateless functions, (...)

Of course I do not mean you should abandon objects where there is a strong connection between a set of data items and operations that work on them, or where polymorphism is a right abstraction. But from my experience, quite a lot of code is made through transformations applied on simple data, and when you write that kind of code in a functional style (whether as static methods grouped in helper classes, or private methods within an implementation of your class), both quality and testability rises in lockstep. And my point is that quite a lot of code can be written this way even in an OOP project.


>That said, if you go for functional style in OOP, i.e. shoving as much as you can into static helper functions and most of the rest into dumb private stateless functions, you suddenly gain both a clean architecture and lots of test points to use in unit tests. So you can have testable code, but you have to chill out with the OOP thing a bit.

Wow, this is exactely totally opposite of how one can achieve testability in OOP! For more details I recommend excellent Misko Hevery's article "Static Methods are Death to Testability" [1]. Also, I'd argue that "functional style in OOP" is an oxymoron - you're either OO or something else (functional, imperative...)

[1] http://misko.hevery.com/2008/12/15/static-methods-are-death-...


I'm not sure if this article is clever satire.

> The basic issue with static methods is they are procedural code.

So is any object-oriented code. OOP is a subparadigm of procedural programming.

> Unit-testing needs seams, seams is where we prevent the execution of normal code path and is how we achieve isolation of the class under test. seams work through polymorphism, we override/implement class/interface and than wire the class under test differently in order to take control of the execution flow. With static methods there is nothing to override.

Why did it not occur to him that the function boundary is the "seam" he's trying to find?

I mean, `method(a, b)` is equivalent (as in: equally expressive, and usually implemented in the same way) as `a.method(b)`. Therefore, any problems with one case equally apply to the other case. If his problem is that `method(a, b)` may call other, non-mockable functions, then that criticism equally applies to `a.method(b)`.

(As I'm writing this, it occurs to me that the author may be suffering from the "OOP = Java" delusion.)


The OOP = Java trap is all too common, but the converse is also a trap: just because you've written OOP code in a different environment doesn't mean that pattern will work in Java.

Go with what the ecosystem supports, and you'll find your tooling helps you a lot more than if you fight against it by trying to force non-idiomatic structures. Your colleagues will appreciate it, too.


He probably meant no-side-effect static functions. I myself find using these a lot. For common CRUD web apps, you have Spring doing most of the stuff for you and you simply need to write stateless methods. However, for not-common requirements, you might need to use classes and OOP patterns to implement a complex logic.


I did. I even wrote the magic word a moment later, but forgot it there, and realized only past the edit window.


It seems the encouraged method is IoC these days, and that's just dreadful. IoC/Dependency resolution all over make it insanely hard to reason about code without running circles through the codebase.

For me, IoC seems invented almost entirely to make up for how difficult testing can be in particular languages. Which, sure, making up for shortcomings as good, but the necessity to use IoC for it feels bad.


This is how I felt about IoC as a junior dev 15 years ago, before I'd actually used it myself.

Nowadays the benefits are clear to me: more modular, testable code, and also lifecycle management.

Give it a try, you may well change your views.


I've used it a fair deal. I've found I prefer languages that don't require IoC to make code testable.

I agree that it's one of the sanest options when it's required, I just think that language design should incorporate testing ergonomics from the start.


> unnecessary abstraction and boilerplate patterns

That means you didn't actually change the code. It means you added unnecessary abstractions around your code in order not to change it.

Unit tests guide you towards simplicity. In my experience, the only times they haven't done that is when I have made some assumptions about what the code should be and not allowed the tests to drive me towards that simplicity.

http://blog.metaobject.com/2014/05/why-i-don-mock.html


    That said, if you go for functional style in OOP, i.e. 
    shoving as much as you can into static helper functions and
    most of the rest into dumb private stateless functions,

We had this at a company I worked at a while back - dozens of modules with nothing but static functions that all took a first argument of the same type. If only there was some kind of METHOD for declaring a whole bunch of functions that operated on the same data...


Until you get into polymorphism etc. this is just a style thing.

method(a,b) is equivalent to a.method(b) and exactly as much typing. You do save manually typing the extra part of the definition but 'eh'. A few languages treat these interchangeably.


people ... don’t want to change how they write code. They just want tests for it

Have you considered the possibility that those people are right? That's a reasonable conclusion to make if you are seeing lots of otherwise smart people that share an opinion that disagrees with yours.

There are lots of valid reasons to change the style in which you write code. In my mind, fitting somebody's fad testing scheme is not one of them.

Here's a second opinion from a guy who also likes tests, but doesn't think it's a good idea to structure your whole codebase just to accommodate them:

http://david.heinemeierhansson.com/2014/test-induced-design-...


I strongly agree with that, too. My current, experience-born belief is that if the only reason for introducing some architectural pattern is to accommodate testing better, the change is wrong and will likely hurt the code quality. Yes, you need to concede a little bit to allow for test points, but turning your code inside-out to have it go through three layers of indirection so that the middle one can be mocked easily? That's just stupid.


yea, you should be changing code style to increase modularity in a way that is conceptually coherent in terms of what is easy to hold in your head. Increased testability should fall out of that because you can think through "What invariant should hold true about X under conditions/inputs Y1...Y4?"


Code you can hold in your head usually has a smaller surface area. Fewer moving parts equals easier testing.


    Have you considered the possibility that those people are right?
Every time I’m looking at a code review with awful tests. I started out in statically typed languages and I can’t shake the feeling that we need to tool our way out of the testing conundrum.

Anything that is this hard to get right shouldn’t be the equal responsibility of every team member. For every other problem of this magnitude we have people who specialize and everyone else just has to be aware of the issues and consult when in doubt.

So it’s a struggle for me to try to get people do adhere to the strategy we’ve accepted without believing it’s the end all be all of software robustness. Because I’m not convinced. Nothing I’ve ever mastered in software has taken me half as long as testing, and that just ain’t right.

That said, I still like the structure about 80% of my tested code has. It usually does exactly what it says and nothing else. Local reasoning is a big deal to me.


The main problem is "obsession" as pointed out by that blog post you linked.

Obsession of "one size fits all" or "silver bullet". I believe the authors of agile manifesto wrote this disclaimer.

If it doesn't make sense to make unit tests to MVC controllers, then don't.

In my experience, management looking at code coverage not being 100% is one reason (although bad) that this "unit test everything" happened. I tried shouting this out, but team lead didn't have the ability to learn from a junior and use Sonar's configuration.


Usually splitting out into interfaces is done in static languages as that's the best type-safe way to do things. It's not a fad, it's been like that since the beginning.


There is absolutely no reason (except to fit into a particular pattern of testing) to turn everything into an interface[1]. That has nothing to do with type safety.

[1] Obviously some things do make sense to put behind interfaces, but I find that most Java developers go interface crazy and the code ends up being an unreadable mess.


In some situations unit tests with lots of mocks will bring a negative value. Imagine a situation where you want to refactor a big piece of code with many dependencies but you don't want to change its public interface.

If you mock everything, when you refactor, the test will break because the dependency structure will change, and the mocks are no longer relevant to the new implementation. You have to rewrite the tests. You did twice the testing work and more importantly you get absolutely no proctection against regressions because the tests for the 2 versions are not the same.

If you build integration tests, they can remain the same. Less work and actual protection for your refactor.


Testing internals forces future programmers of the codebase to maintain those invariants. All code is a liability.


Not in my experience. Convincing people to delete tests that only assert one invariant when the business changes its mind is easy. It’s the ones that have residual value after removing one invariant that trap people into spinning their wheels.


I agree with this.

Although, if you offshore development here in the third-world where internet gets slower everyday. Running an integration test that queries to Amazon RDB can take forever.

I hope this issue gets a spotlight and be noted that integration tests in third world countries is very very slow. And this high cost should be included in the estimates.

To give you an idea, here it takes AT LEAST 5 seconds to load a page from the amazon console. Lol, even the software companies owned by the ISPs/Telco here complains that their access to AWS is super slow. They said that bad routing is the main issue and for some reason the ISP isn't doing something about it.


> Running an integration test that queries to Amazon RDB can take forever.

Why wouldn't you run your test on aws if it needs to integrate with rdb anyway?

Or are your code changes so massive that "git push" is slow?


No massive.. But yes, git push takes a few seconds, but bearable. Once pushed, I'd need to do some SSH into the Jenkins server, so you run only your brand new integration test. Running the test there is super fast, but everything else including typing one character in PuTTY is slowed dowwn.

All this while you are expected to fix 10 tickets for the whole day plus anything that goes wrong in production.


I agree with everything but your conclusion, but I have an aversion to mocks that isn’t shared by everyone.

If the code changed due to a big behavioral shift then your integration and E2E tests aren’t safe. It’s more than twice the work at the higher layers because people get trapped by the Sunk Cost Fallacy. They try and try to save the old tests before they finally rewrite.

That is the observation that convinced me to stick to unit tests. People aren’t emotionally attached to individual unit tests.


Very good point.


This sort of discussion often gets confused because people have different ideas about what integration tests are and therefore talk past each other.

I generally avoid the term altogether and recommend testing stable API's (which are often public) and avoiding testing internal API's that are more likely to change. This assumes you have a stable API, but that's true of most libraries.


I think we are discussing what it means to test a stable API or an internal API. Not just testing them in general. We're talking about making architecture decisions on your code in the interest of test-ability. Regardless of the visibility of your API you will still need to unit test the logic will you not? Do you test your controllers and then the response from your service layer? Is all your logic in your actions?


Exactly - Integration tests !== E2E tests


Martin Fowler solves this by introducing:

SubcutaneousTest

https://martinfowler.com/bliki/SubcutaneousTest.html


Catchy.


>Okay. The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

I've seen what happens when a developer tries to abstract away a database in a database driven app so it can be "better unit tested". It's a goddamn mess.

If your app relies heavily on using a database, your app naturally integrates with a database then it makes no sense to test without it. You are intentionally avoiding testing in a way that will pick up bugs.

>Unit tests run faster, are written faster

Unit tests test less realistically. That means they don't catch bugs integration tests do.

They also often take longer to write and are more tightly coupled.

How coding this way came to be seen as a best practice is beyond me. Tight coupling and premature optimization is usually seen as bad practice in other areas.


> If your app relies heavily on using a database, your app naturally integrates with a database then it makes no sense to test without it. You are intentionally avoiding testing in a way that will pick up bugs.

Also, with Docker it's now actually feasible to automatically test against a real database at a reasonable speed. A Postgres container spins up in a couple of seconds, a SQL Server one in a little over four.


That has nothing to do with docker, really. I run postgres standalone on my laptop and it starts in < 1 second.


I guess they meant so your tests can start with a blank or reproducible state.

But you can of course achieve the same by running a script before your tests start. There are also some frameworks for doing this sort of thing too, such as Fixie for .NET


I've done "write a script to reset the database" before, although not for Pgsql. The effort and potential snags involved make it nowhere near as trivial as docker rm && docker run.

There are also other scenarios that become really simple with disposable DB instances. Want to test a remote data transfer feature? Just spin up two databases.


Drop-create scripts! One of my first epiphanies in the testing world.


Create only scripts can also be great in the right context.

Namely a context like travis-ci where you get a new clean environment each time.

Personally, I've develop few libraries/web UIs that relies on external software like a DNS server or an ldap directory.

And for those, I've a quick and dirty shell script that deploys bind/OpenLDAP.

It's far easier, faster and more accurate than mocking.

For example, what comes to my mind is the testing I do for all the SSL/TLS modes I support (SSL, StartTLS, certificate check disable or not, etc) in my ldap web application.

Travis also has available services in their build envs (stuff like redis, mongodb, mysql, cassandra...).


I do this with rsync to restore a snapshot of the database data folder.


Sure, but I presume that doesn't include installation time.


In another thread I talk about splitting deciding from doing and I find that strikes a very easy balance for database heavy code. Unit tests for the logic and just concede the data transport to higher level tests. Preferably with a local database full of test fixtures.


I work in a company where quite a few of the developers simply are incapable of writing anything but integration-tests.

The reason? They don’t “believe” in unit-tests. They don’t think unit-testing “works in the real world”.

They absolutely fail to accept that they need to write their code differently for automated testing to work well.

How do you change such a mindset?


For some reason, unit tests vs. integration tests reminds me of this image: https://pbs.twimg.com/media/CZX0O-tWQAAeaLi.jpg The unit tests past, but why bother with integration tests?

I wrote a component at work. Sure, there are some unit tests (the SIP parser working? Check. But for that component, unit tests only go so far as I need to query another program that actually implements the business logic (it goes SIP -> my program -> custom protocol [1] -> business logic program and back again). To mock the business logic program (I need to make sure to return the proper information per the SIP request) is to reimplement the business logic program, so when testing my component, we also use the business logic unit. The business logic also requires two more programs to run. At this point, there is no difference between a unit test and an integration test as I'm running five programs ... no, ... six, I do have to mock a cell phone (basically, respond to a custom protocol and make a web request), to test the "unit" that is my program.

Oh, and to make it even nicer, this is legacy C and C++ (C with classes) code.

[1] Legacy code. It works. At the rate of production deployments we (our team) gets, it would be around two years [2] to remove the custom protocol. So it stays.

[2] Have I mentioned the very scary SLAs?


Assuming you are a developer, start writing some.

Next bug you find that needs a unit/functional test (e.g. it is caused by a simple error in transformation in one function), write the test first as a table of inputs vs outputs, find it fails, fix the function, and leave the test in. Gradually, the code base will contain unit tests which are useful, people will see they are useful, and other people might start using them too where appropriate.

You are unlikely to persuade them without actually doing what you say is beneficial and exposing others to its benefits.


I agree. Tests for bug fixes are extremely valuable. Of such tests, unit tests are often very feasible.

A test accompanying a bug fix holds value in many ways.

Firstly, it demonstrates to those reviewing the change that the fix is suitable.

Secondly, the presence of a test encourages reviewers to consider what a test does and doesn't cover, sometimes resulting in comments regarding improvements that had not otherwise been considered.

Thirdly, and of most importance in the long term, a test for a bug fix serves to document oddities previously discovered that were for a time not known about.


I’m confident you know this, but just for the peanut gallery:

Tests that go along with bug fixes are some of the highest value tests, but they must be previously failing tests.

I can’t tell you how many times I’ve reviewed fixes with “tests” that are basically affirming the consequent; they assert something that was already true and it turns out they’re not actually fixing the reported bug.


It depends on what you unit test and why.

If it's for 100% test coverage: forget about it.

If you test private methods: you're doing something wrong.

What people usually see from the "unit test evangelists" are codebase for which you have tests for every method in the code. Then you do some refactoring and you have to rewrite tons of tests. And as those tests are just made to get 100% coverage you end-up with logic bugs because most unit tests have been written to go through the code, not check limits and edge-case. When you stumble upon this kind of test harness you can only think this as only cons (more to write upfront, less willingness to refactor) and no pro (the code is still brittle). Then your integration tests feel like your real harness: you can change anything in your code it'll tell you what has been broken when used.

Now if you consider your unit tests as a kind of integration tests for the API your classes present then you get the benefits of unit tests. But this mean testing only public methods. And mutation testing resilience is a better metric than test coverage.

Also: those tests do not replace a real documentation which can be a lot faster to read and understand than code.


People test private methods because edge cases occur in those private methods and the test for the edge cases do not belong in the unit test for the consumer of the private unit. If the consumer simply loops over a list of objects which it receives from the private unit, the consumer does not need to know that particular integer arguments are special cases in the private unit; that would be a leaky abstraction. However, it still makes sense to verify that you have correctly handled each special case via a unit test.

As for the difficulty or refactoring, if you refactor the private unit, you ensure that its tests continue to pass, since its consumers depend on that behavior: you ignore the failing tests of the consumers so long as the subordinate unit's tests are failing. If you eliminate the private unit, you eliminate its tests. Modifying the behavior of the private unit may be equivalent to eliminating the unit or refactoring it. The number of tests you will have to modify is equal to the number of units you modified the behavior of: the branch count of the private unit, or that plus its consumers. If each private consumer was responsible for testing all of the edge cases of the private unit, then you will have to change its branch count multiplied by the number of consumers worth of tests instead.

The distinction between private and public is wholly synthetic. It is a binary layering mechanism that does not map well onto most architectures which have many layers. From the perspective of many full architectures, everything down in the data layer is private: no customer will have direct access to the data layer. Yet you will still test the data layer.

The internals of a library are not special simply because the layering is thinner and binary.


My general theory is that if a private method is complex enough to need separate testing, it's usually complex enough to pull out into its own class and test as a separate public interface. That's 'interface' as in 'what a class exposes to its callers', not necessarily using an actual Java interface or making it part of the public API of the library.

A side-benefit is that the tests for the original class can be a lot simpler too, as I can just mock the responses from what used to be the class internals. Another benefit for libraries, is that it allows consumers to swap out your implementation. I've lost track of the times I've wanted to change something deep inside a library but it's implemented in a private method so I can't even override it.

This does lead to more, smaller, class files. But unless taken to extremes I've not found it to make things less comprehensible, and it definitely makes things more composable.


> If you test private methods: you're doing something wrong.

Maybe a silly question, but why?

If I refactor a class to pull a common piece of functionality into a private method, why would I not want a test for that?

One of the principle benefits of tests I see is allowing me to change the implementation without worrying about the behaviour, and I'm not sure why that wouldn't apply to private methods?


One reason why is because you should be testing the public behavior of a function/class not the details. The reason for this is because the public interface is what other parts of the codebase will come to rely on. Refactoring generally shouldn’t change the public interface as it will break other pieces of code within your codebase, or other codebases if it’s a library, and other systems if it’s a network api. So, if you test the public interface, generally refactors won’t break the tests.

Testing private functions also seems to be a smell that the overall setup of testing the class or function is too difficult. This can be because the class has too many branches in it, the argument list is too large, or too many other systems must be in place for it to function correctly. This, to me, indicates a public interface that is hard to use and will pass much of these issues on to the caller.

Lastly, if you are testing private functions to gain coverage then arguably the behavior in the private method isn’t actually useful to the public interface. The reason I say this is that testing the behavior of the class should end up touching all branch conditions inside the class or the public interface isn’t fully tested. By only testing the public interface it then also becomes easier to locate dead/unreachable code.

Hope that answers the why.


I would argue you absolutely need to be testing the internal details. That is the entire point of measuring branch coverage and performing mutation testing. Unit tests are not black box tests. They need to know that for special values, the unit has to follow a different code path but still produce sensible output. Reading the documentation of a function is not sufficient to determine what edge cases the unit has, but testing those edge cases is often critical to verifying that the unit adheres to its specified behavior under all conditions.

As for the smell, sometimes things are irreducibly complex. Some things in this world do require tedious book keeping. All the refactoring in the world cannot change the degrees of freedom of some problems.

Tests on consumers should not test branches of subordinate units. If you did this then the number of tests would explode exponentially with the number of branch conditions to handle all the corner cases. If a private unit produces a list of objects, but has special cases for some values of its argument, test those branches to verify it always produces the correct list. Then just make sure each caller does the correct thing with the list of objects. That is the purpose of separation of concerns: the consumer does not need to know that some values were special.


> the number of tests would explode exponentially with the number of branch conditions to handle all the corner cases

Then wouldn't you want to write something that was able to iterate through those edge-case interactions and ensure they are correct?


I'm trying to imagine what on earth your private methods can be doing that wouldn't be affected by the public interface.

There should be no situation where the same exact call in the public interface could take multiple different paths in the private method. The only thing I can think of that could make that happen would be some dependancy, which should be mocked at the top level to control these cases.


Some people call this functional testing.


Private methods are the internals of your classes. It may change a lot for performance or to make it easy to maintain, one method may become 3 or 4.

But people who use your class don't care. They input something in your public methods and expect something in return. The details of what happen inside should not matter. Adding tests there only help to slow you down and make the dev team resist needed changes. And when you add tests you increase the chances you make them useless or wrong.


Ah, I think I see. If I break the functionality by changing the implementation of a private class, that should be reflected in the public API unit tests.


That's how I see it (in Java at least), unit tests are for guaranteeing that your classes API does what it says it does.

In Python I am more loosely goosey about my unit tests and unit tests there are more for helping me write/think about tricky code.


If your private method is wrong, then your public methods will also be wrong. If your public methods are right, then it doesn't really matter what your private methods do..


I read it as using introspection/reflection to test what is essentially implementation details which are very likely to change.

This is how you write brittle tests which fail easily and cause high maintenance costs and reduced confidence in the unit-tests as a safety net.

Definitely an anti-pattern.


> If it's for 100% test coverage: forget about it.

Im a realist and don’t see any point or value in chasing that goal for a 20+ year old company code-base.

But I expect new modules to be fundamentally testable.

> if you test private methods: you're doing something wrong.

Agreed.

> What people usually see from the "unit test evangelists" are codebase for which you have tests for every method in the code.

Is that really so? It’s easy to be opposed to extremists of any form.

I only “evangelize” that business-logic should be tested and thus needs a code structure to isolate the business-logic from its dependencies (databases, services, factories, etc).

I find that perfectly reasonable.


> Then you do some refactoring and you have to rewrite tons of tests.

One of the core principles of TDD is that you write tests to facilitate refactoring---to make it _easy_ to refactor and have confidence in the system after doing so. I've been practicing TDD for ~8y and this is rarely a problem. TDD encourages good abstractions and architecture that lends itself well to composition and separation of concerns.

But if you change an implementation, of course a test is going to fail---you broke the code. It works exactly as designed. What you want to do is change the test first to reflect the new, desired implementation. Any tests that fail that you didn't expect to fail may represent a bug in your implementation.

Of course, I haven't seen the code, so I can't comment on it, and I won't try to do so.


"What you want to do is change the test first to reflect the new, desired implementation". Not sure if you meant this but this is exactly what is wrong with most unit tests that I have come across. They test the implementation and not the interface.

That's why I agree that the focus should mainly be on integration tests. Or at least functional tests. Ideally what you want is to have a system where all the state and logic is in a model (that model can include an external db). The gui should be as much as possible a function of the model i.e. model-view. Then you write the majority of your tests as integration tests against the model and include as many scenarios as you can think of. These tests should reflect the requirements/interface for the system and not the implementation. You should write some gui tests but these should be much less. They just need to verify that the ui reflects the model accurately. You shouldn't be testing scenarios as part of the gui tests.

I have come across too many code bases where the unit tests test that the code is what it is, rather than the code does what it should. Where 'what it should' == 'requirements/interface' == 'something close to integration tests'


I doubt that you can. There was a study a while back, and I apologize in advance because I do not have a link, that showed projects written with unittests took significantly longer to reach the market, but with significantly less bugs. However, overall time spend on the code was less. So conclusion was that unittests are a commitment to a long term goal of minimizing developer time, and the tradeoff is that it takes longer for the first version to be done.

That is, as far as I know, the only tangible evidence that unittests are good unless you need to get something out the door quickly (which sadly is most of it).

I'd argue that is not the main benefit of unittesting however. That is the way code is structured, and especially how dependencies are explicit, e.g. injected with constructor arguments.


That parallels my experiences. I got tired really early on with projects that ground to a halt because of brittleness and a lot of my focus is on building skill and confidence so that version 3 is no harder to ship than version 2 was. Every team I’ve left on good terms was more effective when I left than when I got there. The ones that fought me the whole way frustrate me and it shows.


Man, I really want to read that study!


Was this the microsoft study? I think it was a breakdown of their delivery of Vista?


> How do you change such a mindset?

IME you can't. If they even recognize that what they are doing is not unit testing then your doing well.

> They absolutely fail to accept that they need to write their code differently for automated testing to work well.

I've been thinking lately that having to code differently might be a fault of the tooling that's built up over the years and that they might be right. I've been getting back into c lately and had a look at the mock/stub options there which were very complicated and not very compelling compared to what I've been used to in the .net world. In the end I found the pre-processor was the best (for my project) option:

#ifdef test #define some_method mock_some_method #endif

The advantage has been that the code is written exactly (more or less) as it would have been if there were no tests. There are no interfaces to add, functions to virtualize or un-static and no dependencies to add, this all translates to no performance hit in the production code and the project being simpler all around.


Trying to "change" others' mindset is good recipe for frustration.


The keyword is "believe".

Such behavior is not science. In science, you don't believe, you understand not faith. You check evidence, you use reason.

A mindset that relies on "believe" but not on reason. It maybe because of decades of advertising. Can be a side-effect of participating in believe movements.

How do you change such a mindset? For me, it was reading lots of philosophy and atheist vs theist debates. For Socrates, he died for it.


Perhaps a well written, easy to follow guide on how to structure different types of code to simplify testing would be helpful. Know of any?


Working Effectively with Legacy Code is a good read: it presents what kind of code you want to attain and methods to get there from a crappy code base.

The definition of legacy code for the author (which I like) is: untested code. So the book is more about getting code in a testable state than random refactoring to get to Clean Code level.


That book is indeed good, at least on a personal level.

It has helped me refine what I consider good code and good effort wrt to testing.


Object Oriented Software Guided By Tests is good.


Sounds like Tableau in Seattle.


> Good lord. Why integration tests?

Because they can find bugs and errors which unit tests cannot.


Funnily enough for the reasons carefully explained in the article. i.e. cost vs benefit, fragility etc.

Nobody disputes that if they came for free then full unit test coverage would be a good thing. The area open to reasonable debate is whether they give the best bang per buck in terms of testing (as opposed to the role of tests in TDD - which is a different kettle of fish: http://www.drdobbs.com/tdd-is-about-design-not-testing/22921... )


This is true. Areas where I have full unit test coverage tend not to have any bugs. What a waste of time to have written all these tests!


Yeah, assuming you think it's not a bug when the backend API that you're mocking is accidentally removed or changes its response slightly.


You seem to be implying that I prefer all unit tests and no integration or system tests. Far from it.

A system test catches the backend API issue. A unit test can demonstrate that my component degrades gracefully if the backend API is not available (because I used a mock to provoke a timeout response).


There are plenty of systems out there that can't degrade particularly gracefully.

That said, the big issue with unit tests is they don't test the glue which is where a lot of issues happen. In languages with strong type systems, this is less of an issue.

Unit tests are great when you actually have complex logic with relevant corner cases, but when you're building web apps, 95%+ of your code is just boilerplate data munging


The problem is that it's almost always possible to achieve a similar or identical number of bugs with lower test coverage, and every test you write has a maintenance cost.

In my experience, the vast majority of test failures (after I've made a code change) end up being issues with the tests themselves, and not with the code change. If you're testing obviously-correct code, that's just more that can spuriously break later and require time and effort to fix and maintain.


I assume you are sarcastic, but cannot figure out what your actual point is. Are you disputing integration tests can find some types of errors which unit testing will not uncover?


I read your comment "Because they can find bugs and errors which unit tests cannot." to suggest that unit tests cannot find bugs.

You probably meant "Because they can find bugs and errors that unit tests cannot." to remove the ambiguity.

tl;dr, I am illiterate; but a little change to the original comment could make it easier.


Except mine had plenty of bugs nobody ever saw because I spotted them while writing the tests...


Unit tests and integration tests are tools, some tools are better at some tasks then others. The idea that a single tool is the only one you need is preposterous.

If you are writing a library write unit test, if your app mostly binds two libraries together unit tests are meaningless, write integration tests.


Functional, integration, and unit are all different types of tests. He's saying write more integration tests, not more functional tests.

For algorithms, I love how I can refactor the implementation and still have 100% confidence in the result if it tests sample input against expected output.


> Why integration tests?

Because they test that you are actually using some available network ports, have the correct database model in mind, didn't mistake the version of your libraries, got the deployment script right, and isn't just restarting everything in an infinite loop?

Or maybe because E2E tests actually test stuff your company cares about, instead of some made-up rules that you've got from nowhere?

Really, if you have unities, you should unit-test them. But your E2E tests should be the ones you really care about, and they should certainly not break randomly with unrelated development. If yours are lasting for 5 quarters, you may be doing something wrong.


There’s a pyramid for a reason. It only takes a couple of tests to make sure that your plumbing connects all the way through. You inspect all the bits when they are going in but in the end you still check that things end up where they are supposed to.

I’ve been doing automated testing for a while. It’s hard to learn, there aren’t many people to emulate. Well, there are people to emulate but the winning strategies are conterintuitive, so your gut fights you the entire time. It took me 8 years to feel confident in tests and my own test code routinely makes me sad because I find antipatterns and I should know better. Also other people copy all of my mistakes :/

I’ve seen a number independent groups two or more years into their testing adventure and the failure modes are not that different. Everyone pretty much makes the same mistakes I do, and it’s frustrating watching everyone go through the pain before they accept that something has to change and it’s probably them.

The best strat I know of for testing is to use inductive reasoning and sampling to verify. If you don’t like the plumbing analogy then this is the Logic version of the same thing. If A -> B and B -> C then A -> C. Only a couple of your tests should verify A -> C and the bulk should check every kind of A [edit] and every kind of B.

If you want to do things like this without making your code not ‘say’ anything (a huge pet peeve of mine, so I can empathize with your concerns) then there are a couple of things to do there. One is an old trick from Bertrand Meyer: split code that makes decisions from code that acts upon them. Beth is split leaves the code more legible, not less.

Most of the boundary conditions are in the decisions. And this code is side effect free you can test the hell out of it with no mocks. Getting every permutation is straightforward and you can count your tests and your conditional branches to figure out if you are done.

Once your code looks like this, adding and removing new rules to the system later is a snap. Even much later.


> There’s a pyramid for a reason.

Sorry, but I am still unconvinced people got that reason correctly.

Let's say you have that A -> B; B -> C pipeline. How many tests you should have on each step (and on the origin) depends completely on how much freedom that steps grants you. It is not something one can say generalities about.

For example, if you are writing an enterprise CRUD application, almost your entire freedom resides on the data mapping. That means that your tests should be equally divided between data validation and data storage/retrieval. And the second can only be done at the integration or E2E levels.

If you are writing a multi-client statefull server (like a threaded web server), the freedom concentrated on launching and reloading it is so large that you can't even reasonably test for it. You'd better design your software around proving this is correct and let testing for less problematic stuff.

My biggest issue with the unity test pushing isn't even that it forces a bad structure into the code (what it does), or that it's pushes for fragile and valueless code (what it also dies). It is that it's wrong at the larger level, oversimplifying stuff and letting people get out of the hook without thinking for themselves.


Because there isn't proper way to write unit tests for GUIs for example.

They can only test parts of its behavior, not everything, and are too brittle to any simple UI/UX change.


Why not just mock the UI drawing library? (I find this a very interesting question.)


Because you would be implementing a 100% of the UI features and still cannot prove if it meets the UI/UX design specs.


I think a lot of that is just the poverty of UI APIs and especially the imperative drawing paradigm. There's no reason in principle why we can't programmatically verify that the basics of the UI spec are fulfilled. If the whole UI layer is just impossible to verify then if we're at all serious about correctness then we should (hyperbolically) stop making UIs until we figure it out.


I can’t help but think this is because nobody writes testable GUI frameworks. You can’t build a castle on a swamp, unless you’re Monty Python.


You've missed the point. Sure, code can always be written better to facilitate testing, but ultimately, each component of the code still has to correctly call/be-called by other components. No class exists in a vacuum. Suppose you have class-A which interacts with class-B. I've seen people put a ton of effort into unit-testing A and B in isolation, and writing very elaborate mocks/fakes/stubs for A and B. Only to end up with bugs anyway because they made a mistake in their mock/fake assumptions. Instead, an integration test that allows A and B to interact directly, and tests their resulting behavior, would avoid all this wasted effort and bugs that come from mocking.

You suggest that instead of writing integration tests, this problem can be avoided by "writing better code". But how exactly would you rewrite the code to avoid the above problem? Declare that A and B should not interact at all, and move all their interactions into class-C? Now you've just given a new name to the same problem: "How do we adequately test class-C?" And once again, the correct answer is to ease up on the mocks and just write some integration tests.


No true Scotsman.

You might be speaking about those cases where people write large god classes, pervasively side-effectful code, zero API design, and a general lack of pure abstractions - then their tests would equally be bad. Tests are code, so one's ability to design programs would reflect on their tests and vice versa.

But a reasonable programmer who cares enough about what they do can still end up with a brittle test suite because of other factors.

Unit tests written in a dynamic language is a major drag on refactoring. It is not so much that we test the internals of the system or get tied up with the shape of code internal to an object. Even if you follow Sandi Metz's wonderful guidelines around testing: i) do not test internals, ii) test incoming queries by asserting on their return value, iii) test incoming commands by asserting on their side effects, iv) mock outgoing commands, and v) stub outgoing queries, you end up with a brittle test suite that is hard to refactor thanks to connascence.

Whenever you refactor your names, or shuffle the hierarchy of your domain elements, you are now left with a thankless chore of hunting and pecking your unit tests and making them reflect the new reality of your code. Here integration tests help you know that your system still works end to end, and unit tests simply remain that one thing that refuses to budge until you pay it its respects.

Unit testing complex views is still a hard problem. There are no well-defined stable "units" to speak of in an ever changing HTML UI. We have snapshot tests, we try to extract simple components on whom we can assert presence/absence of data, and we have integration tests that blindly railroads over everything and make sure the damn thing worked.

But in a different context unit testing is the one true answer. If your system is statically typed (say in Haskell or OCaml), and your functions are pure and compositional, you don't so much worry about mocking and stubbing. You can make simple assertions on pure functions and as the granularity of your functions increase, they end up covering more parts of your system and get closer to an integration test. Static types form another sort of guarantee, the most basic one being that it takes a class of bugs away in form of undefined data types, the system going into invalid states, and of course the clerical mistake of named connascence. We often abuse unit tests in dynamic languages to cover these scenarios, leading to huge test suites with very brittle tests.

I think it is important to call out that the value of unit tests are still contextual - "it depends" like everything in the world, and despite our best efforts, they can become hard to refactor. There is a case to be made for writing integration tests because they deliver more business value at a cheaper price than a pervasive set of unit tests when dealing with a highly effectful dynamic system. This lets us also think about other forms of testing like generative testing and snapshot testing that come to the same problem from different angles.


This is a great comment, not sure why it's at the bottom of the thread. Gets to the core values underlying the main religious beliefs about testing.


Yup, agree. Depending if you are working with a good static type system or with a dynamic language the value of unit tests can vary.

When working with dynamic languages I always end up writing a bunch of unit tests and a bunch of integration tests. I've experimented some with type hinting and static analysis in some dynamic languages but it's not the same as having the compiler make guarantees.


>Unit tests run faster, are written faster, and not only can they be fixed faster, they can be deleted and rewritten if the requirements change.

Honest question: Are most of your unit test failures due to bugs or due to refactoring (e.g. changing APIs)?

Most people I know who do unit testing have mostly the latter (easily over 80% of the time). At that point one feels like they are merely babysitting unit tests.

If unit tests have such high false positives, how useful are they?

Note I'm not saying that it's not possible to write unit tests that are relatively immune from refactors. But it is more challenging and, in my experience, quite rare to find a project that writes them this way.


I’ll be honest, things that don’t cost me mentally or emotionally don’t even register. I probably delete more unit tests than I know and it simply doesn’t ‘count’ because they’re one action and one assertion and they’re wrong so poof.

What I know is that when the dev team can’t adapt to shifting business requirements is a world of pain for everyone. I try to test a lot of business logic as low in the test tree as I can and when they change Always to Sometimes or Never to Twice, I roll with it. Because I know that Never and Always mean ‘ask me again the next time we court a big payday.’

What I remark on are the tests that get deleted when the business says they’re wrong. And those happen but the toll is easy with unit tests. You just ask and answer a different question and it’s fine.


I think many devs build applications only testing manually for a long time, and taking shortcuts (hacks) when something appears wrong. When they later want to write some unit tests, there are no proper units and that ball-of-mud-y code is hard to test. Of course, integration tests are still feasible because they're agnostic of the internal mess.

I've run into this a couple times and noticed at some point that writing unit tests while developing (not necessarily TDD) helps a lot in clarifying boundaries early on, and generally improves code quality.


We should probably have two completely different versions of this discussion for typesafe and type-risky languages, since typechecking is effectively a form of testing at both unit and integration level.


I suspect I’ll be shifting from Node to Rust or one of its contemporaries at some point in the near future. I’ve given dynamic languages a very fair chance, kept an open mind and adopted its strategies instead of writing Pascal in any language, but it has failed to impress me.

I want a statically typed language with reasonable affordances for FP, for the 20% of the code that is dead ugly when forced into object structure.


Is your decision mainly based on type-safety? We converted all of our nodejs to typescript primarily for easier refactoring, but it still hasn't fully satisfied our desire for change. We are thinking of switching to rust as well. We couldn't get past the Go generics argument, and would also prefer something staying towards the functional side. Any other languages you are considering?


Why does everyone rethink a working strategy. Write lots of unit tests that are fast. Write a good amount of integration tests that are relatively fast. Write fewer system integration tests that are slower. The testing pyramid works. He even talks about it in this post, and then ignores the point of it.

You write lots of unit tests because you can run them inline pre-commit or in a component build. If you integration tests are numerous and interesting enough, that won't work. They are better suited to gating feature branch merges. System integration (the actual name for "end to end") take longer and usually gate progressively more stable branches upstream, or nightlies depending on where you are.


Because for certain kind of project the strategy stops working.

I work as QE on a fairly large, ~7 years old project. Micorservice architecture has been attempted. We always merge to master, which means that everything more-or-less is a feature-branch merge. We have too many repositories to count.

And what we learned is, that most of the components we have are just too thin to allow for useful unit-test coverage. Almost everything is [Gui]--request->[Middleware]--request-->[Proxy]--request-->[Backend]->[Database].

In reality, [Middleware] and [Backend] probably should have been a single component, but devs wanted to do microservices, and be scalable, but they didn't really understand the bounded contexts of their services.

All of this leads us to a place, where unit-tests don't tell us much.

On the other hand, we managed to spawn [Middleware]->[Backend]->[Database], and we can run a useful integration tests-suite in ~2 minutes.

So, on one hand, if we desined this better, the good-old pyramid might be a working strategy. On the other hand, if I can get actual services running in minute, and test them end-to-end, I don't think I will bother with true unit-tests on my next projects. I.e. why mock the database, if I can spawn it seconds :-)


So, if I understand it correctly, Middleware and Backend should have been single component since it's one bounded context and splitting it makes one of those feature envy? Is there some benefit keeping these separate or is the cost of change too high at this point? If it's not about features, but more about API, have you tried Consumer-driven contract testing approach?


The reason was, you can have more instances of backend for a single middleware and that should have helped with scalability.

If we had the resources to do the refactoring, we would probably end up with two-three different backends for various contexts, and without the middle-man between the gui and the backends.

On the other hand, the cost of change is probably too high, and most probably this version of our product will be kept on minimum-resource life support.

We are looking for doing consumer-driven testing for our new set of services we are working on.


The units tell you tons. They just don't tell you the whole story. Trust me, try complex distributed systems testing in an environment the underlying services themselves are bug-prone and issuing stack traces all over because of poor fencing/bounds checking/et al.

You may think that way now regarding mocking the database, but where you will find yourself down the line is trying to devise a functional system integration test case for a slightly esoteric condition (deadlocks, timeouts). It's nice to have the scaffolding of a unit testing framework with robust mocks for those situations.

Edit: also, you should consider a gitflow workflow (dev -> integration -> master/stable) and make feature branch off of dev to insulate your master.


>Why does everyone rethink a working strategy. Write lots of unit tests that are fast.

* Because I want to avoid writing more code than necessary.

* Because I want to avoid writing tightly coupled code.

* Because I'd rather have a test that takes 2x longer and catches 5% more bugs. Premature optimization and all that.


I would recomnend not optimizing for less code. Optimize for reading less code.

Unit tests actually tend to favor highly uncoupled code while integration seem to favor more coupling with e2e favoring the most coupling. I believe this is because the higher the level of testing the fewer public interfaces are thought about at lower levels.

As for percentages about speed and coverage, that seems like a bad trade off of 5% gain for 100% slow down. Especially because test time compounds.


>I would recomnend not optimizing for less code.

That is a terrible recommendation. Unless writing less code comes at the expense of readability or coupling you should always aim to write less code instead of more.

>Unit tests actually tend to favor highly uncoupled code while integration seem to favor more coupling with e2e favoring the most coupling.

It's the exact opposite. End to end tests do not even necessarily couple to a language, let alone specific modules. They can be used to refactor virtually the entire code base without rewriting any test code.

That isn't to say that you should only use E2E tests. IMHO wherever there is a naturally loose coupling and a clean, relatively unchanging interface - that is a good place to cover with integration tests.

The worst thing to surround with tests is a module whose API you know you will be changing (which will break the test when you do).

>As for percentages about speed and coverage, that seems like a bad trade off of 5% gain for 100% slow down. Especially because test time compounds.

No, it's an excellent trade off. CPU time is dirt cheap and bugs are very expensive.

Moreover, you can run regression test suites while you eat, sleep and visit the water cooler so the absolute time does not really matter provided it catches bugs before release.


Lots of unit testing can cause tightly coupled code, if the units are too small and/or against internal APIs: The tests are tightly coupled to a piece of code which should have been able to change freely.


I don't know what to say... you have to write good code and good unit tests. I think talking how to do that is a bit outside of the scope here, but mocks for external apis & sensible function complexity metrics are good things.


the point is that blind 100% coverage cargo-cultism is not a working strategy.


Does anyone even believe in 100% coverage?


I never said anything about it, personally.

It's nice to shoot for if you're greenfield. Line coverage != path coverage and blind adherence to line coverage metrics isn't going to guarantee anything.


> The testing pyramid works.

The testing pyramid is built around a lot of assumptions which are often not true.

For example I run our ~10,000 integration tests in under two minutes on our large enterprise codebase. In recent years it has become possible to have fast integration tests.

I've worked on other apps that take 5+ minutes just to start up and integration tests can take hours.

Applying the same testing strategy to both does not make sense.


The pyramid isn't a law, it's just a heuristic that says to have more low level tests that are faster than high level tests that are slower. The unit/integration/system integration division tends to be correct, but isn't always. It just reminds us that there is time/compute scarcity, and to maximize those resources for optimal roi. And yes, a mobile app != a PaaS platform != system software and adapt the principles sensibly to the situation.

Seriously, though, I salute you on those integration test numbers. I assume containers are involved?


I go the complete opposite way.

I've tried various testing strategies over 15~ different companies in all sorts of environments, and unit tests are the only thing that really work (IF you can convince the team to do it...and that's a big IF).

The article starts with a point I agree with: the lower in the pyramid, the cheaper the tests but the lower the confidence level they bring. That's true.

Where I disagree is how much the difference on confidence and cost are.

I can bang out 500 unit tests faster than I can do just a few E2E tests in most large apps. They require almost no trial and error, no real engineering (I feel strongly that abstraction in unit tests is bad), and all around are so easy to write, I don't mind if I have to toss out 150 of time when I make a significant refactor.

E2E tests are amazingly brittle and require a careful understanding of the whole system. They're impossibly expensive to write. They're the only thing that tells you that stuff works though. So you want at least a few of these.

Integration tests are just flat out awkward: you need understanding of a significant portion of code you did not write or touch, they often require complex fixtures (because your test will go through several code paths and might depend on a lot of arguments), they're slower (because a lot of code run), and while you don't throw them away when changing implementation details (unless they involve side effects), you still throw them away when refactoring or changing public interfaces. I've worked with a lot of people who were very vocal about these being so much better, then in the same breath complain that they spent all day writing integration tests.

There's an exception here which is acceptance tests for libraries, especially when doing a full rewrite: the tests that tell you public interfaces used outside of the current context work (as opposed to public interface of objects used in the implementation). Eg: if I was to test lodash or react, that's how I'd do it.

Unit tests to be are about a lot more than "is this change breaking my code". And if that's all you care about, you're missing a big part of the point.

If you have 3 units, A, B and C. A calls B which calls C. If you have a test for A in the context of B, a text for B in the context of C, and a test for C, and they all pass, you know that A + B + C will work. But when writing the tests, you only had to care about itty bitty tiny pieces of code, which made things super cheap.

Then you get other huge benefit: the quality of the entire code base is higher (a side effect of it having testable interfaces all across), the reasoning behind each piece of code is explicit (no one wrote a function that work and they 're not sure why, else the test would be very hard to write), you automatically have a document representing "intentions".

And yes, if you change a module, even if its not expose to your customers, the public interface of that module has tests and the tests will break. But they usually take nothing but a few minutes (often a few seconds) to write. They're cheap enough to be disposable.

And once you have 80%ish unit test coverage, you actually have a very high confidence level. I've gone through major refactoring of multi-million line of code apps with almost no bugs on pure unit tests. You't think the 20% of untested code would be a source of bug, but statistically, that's just not how it happens.

In term of person-hour to ROI, pure unit tests just straight up win out.

The reason software engineers fight back to hard against them is that they're brain dead and repetitive to write, and they can't resist overengineering. "This is such a simple test for such a simple piece of code, why should I test it?!". That's the point. All unit tests should be like this.


The second group I worked with that was earnestly interested in mature testing developed the 5/8ths rule.

To move a test one level down the pyramid, it takes about 5x as many tests. But the tests run 8 times as fast. So moving a test down takes more than 35% off the run time, and it fails the build minutes sooner. If you drop it down two levels it's 60% off the run time.

Interesting enough on its own, but maintaining those tests after one requirements change, plus the cost of rewriting them in the first place, is less work than the cost of maintaining the original tests. We didn't come up with a number for this but the difference was measured in man-days and missed deadlines about once a month, and we were convinced by the evidence.

I also agree with both your 'braindead' comment and your 80% estimate. The big payoffs come between 75% and 85% and above 85% you start getting artifacts. That 'data' distract more than it helps.


Yup. I think one big issue is that a E2E or an integration test is useful on its own, while a single unit test is almost totally worthless. You don't have confidence of anything until at least 50% (and at 80% you have almost perfect confidence).

So when people get started, especially on an old code base, they feel its pointless and doesn't pay off. Can't blame them, I suppose.

Good that you bring up build time. I forgot to mention that. We have repos with thousands of tests where the whole suite runs in <1 minute and gives us very high confidence (actually the only other tests we run on that repo are visual regression tests for CSS, and even E2E tests don't catch those issues...). During that time I'm watching other teams waiting 20 minutes on their integration test suite. Nope nope nope.


One of the most transformative things I've come across for how to structure and test code has been Gary Bernhardt's talk on Boundaries [0]. I've watched it at least ten times. He also has an entire series on testing where he goes deeper into these ideas.

In this video, he talks of a concept called functional core, imperative shell. The functional core is your code that contains your core logic that can be easily unit tested because it just receives plain values from the outside world. The imperative shell is the outside world that talks to disks, databases, APIs, UIs, etc. and builds these values to be used in the core. I'll stop there—Gary's video will do 100x than I can do here :)

[0] https://www.destroyallsoftware.com/talks/boundaries


I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.

As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.

The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.

Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can't test your codebase reliably with them and they are also very costly to write, run and maintain!

The testing pyramid exists for a reason! It is a product of countless hours of research, testing and head scratching! You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that's why you chosen writing mostly integration tests.


Sounds good in theory. In practice there is one problem with having integrations tests only. The test are generally simple: they pass or they fail. A unit test tests just a small functionality, so when it fails, it's quite easy to find out the problem. When an integration test fails, then we can spend hours debugging the whole stack of layers trying to find out the real problem.

I had this situation once. Every failing integration test ended with hours spent on writing unit tests for all the places used by the test.


From my experience an integration test failure that requires significant efforts to investigate can only be covered with unit tests after one knows where the problem comes from. One cannot realistically write a bunch of unit tests and expect them to cover the problem unless one already knows about the problem.


It's called shotgun unit testing.


For me, one of the biggest issues with integration tests is the code coverage numbers mean nearly nothing. I've seen an "integration only" tester proudly display his single test with 90% coverage. I asked him to run it again and it was 2% because a condition changed.

So this means that for all the branches your code can take, an integration test is taking one specific one at each point all the way through for that test. All over branches, through the entire call stack are unverified.


Is there solid evidence to back up some of the assertions that have been made about testing?

It feels like an area where lots of people have opinions, and there are not much in the way of facts.


There are very serious books about software quality with actual data, but it's much easier to tell each other anecdotic experiences on the internet - in a weird mix of bragging and strawmen arguments. That's how our field is stagnating.


Microsoft put out a study where they had done TDD or at least extensive unit testing. I don't recall the numbers but development time took longer and there were a lot less bugs. Which is what I would have expected.


I think the answer would be it heavily depends on what you are doing. if you are creating a library that operates on a protocol, unit tests are necessary / extremly important.

if you are writing a ERP where a lot of your code NEEDS to operate WITH the database you are better of with integration tests, because mocking away the database would lead to so much bugs, especially if your database is extremly important (and not just a dumb datastore)

Edit: having any tests is always better than having none.


The puffing-billy [1] library is awesome, and has changed the way I write integration tests. I also use VCR [2], and now my entire application (both backend and front-end) is wrapped with a proxy that records and replays every request. I can run all my tests once using test Stripe API keys, a test Recaptcha response, or any other external services that I want to test. I don't have to mock anything, which is nice. Then everything is recorded, and I can run all my integration tests offline.

I've also really enjoyed using stripe-ruby-mock when testing specific webhooks, jobs, and controller actions. I don't always aim for 100% test coverage, but I try to write a LOT of tests for any code that deals with billing and subscriptions.

Ooh, I've also been enjoying rswag [4]. It's quite a cool idea - You write rspec tests for your API endpoints, and the tests also serve as a Swagger definition for your API. So when your tests pass, you can use the output to generate documentation or API clients for any language.

[1] https://github.com/oesmith/puffing-billy

[2] https://github.com/vcr/vcr

[3] https://github.com/rebelidealist/stripe-ruby-mock

[4] https://github.com/domaindrivendev/rswag


I think the testing pyramid reflects a false correlation — it seems to assert that higher up the pyramid tests are more expensive to write/maintain and longer to run.

In reality the execution time of a test says nothing about how hard the test is to write. Sometimes a very fast to execute unit test can be much harder to write/maintain than a longer running test that avoids mocking an api and perhaps utilizes abstractions in the test definition that are already written to support the program’s features.

I think test suite execution speed is the real metric to focus on for most projects — to get the most value, test suites should accelerate the time to useful feedback. Write tests in the simplest way that provides useful feedback into the behavior of the system and runs quickly enough that you can receive that feedback with low latency during development.

I quite like tools like jest and wallabyjs that use code coverage data to figure out which tests to rerun as code changes — means you can have a test suite that includes slow(ish) to execute tests but still get feedback quickly in reasonable time as you make changes to the code.


> to get the most value, test suites should accelerate the time to useful feedback

Well, they should also optimise the usefulness of the feedback they provide. Typically, tests higher up the pyramid are also more brittle (e.g. end-to-end tests might fire up an entire browser and Selenium), and thus are more likely to fail when in actuality, nothing is wrong. That's an additional reason for limiting the number of those tests.


Brittle tests seem not useful in general though aren't they?

I'm not sure its necessarily true that brittleness must correlate with height in pyramid or execution time -- in my experience brittleness correlates with selenium more than it does pyramid height (that's a statement about selenium more than it is a statement about any particular category of testing pyramid).

Its possible to write very useful non-brittle tests using something like headless chrome ...


No they're not.

But yes, Selenium is brittle. That said, Google engineers actually did some investigation into this, and although I think their methods were probably a bit heavyweight, they did conclude that it's mostly RAM use that leads to brittleness.

[1] https://testing.googleblog.com/2017/04/where-do-our-flaky-te...


Interesting thanks for the link!

I’m curious how many tests were in the small size range for that chart which provides evidence to show the size-flakiness correlation holds in tests that use tools associated with higher than average flakiness...

I’m also feeling like I want to have more clarity around the mechanism for measuring flakiness — the definition they use is that a test is flakey if it shows both failing and success runs with the “same code” — does “same code” refer to a freeze of only the codebase under test or also a statement about change to the tools in the testing environment ...?

I wonder what the test suites for tools like selenium/WebDriver look like ... do they track a concept of “meta-flakiness” to try and observe changes to test flakiness results caused by changes to the test tooling ...?


Yeah, good questions, the post leaves some to be desired. And meta-flakiness tooling actually sounds like it could be really useful!

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: