Hacker News new | past | comments | ask | show | jobs | submit login
Write tests. Not too many. Mostly integration (kentcdodds.com)
362 points by wslh on Oct 27, 2017 | hide | past | favorite | 331 comments



I'd take a slightly different take:

- Structure your code so it is mostly leaves.

- Unit test the leaves.

- Integration test the rest if needed.

I like this approach in part because making lots of leaves also adds to the "literate"-ness of the code. With lots of opportunities to name your primitives, the code is much closer to being self documenting.

Depending on the project and its requirements, I also think "lazy" testing has value. Any time you are looking at a block of code, suspicious that it's the source of a bug, write a test for it. If you're in an environment where bugs aren't costly, where attribution goes through few layers of code, and bugs are easily visible when they occur, this can save a lot of time.


I have adopted the same philosophy. A few resources on this, part of the so-called London school TDD:

- https://github.com/testdouble/contributing-tests/wiki/London... (and the rest of the Wiki)

- http://blog.testdouble.com/posts/2015-09-10-how-i-use-test-d...

- Most of the screencasts and articles at https://www.destroyallsoftware.com/screencasts (especially this brilliant talk https://www.destroyallsoftware.com/talks/boundaries)

- Integration Tests Are A Scam: https://www.youtube.com/watch?v=VDfX44fZoMc

All of these basically go the opposite way of the article's philosophy:

Not too many integration tests, mostly unit tests. Clearly define a contract between the boundaries of the code, and stub/mock on the contract. You'll be left with mostly pure functions at the leaves, which you'll unit test.


Thanks for the links, they make sense - I've always had trouble with blind "you should unit test" advice, but especially the video explains the reasoning very well :)


I’ve been practicing TDD for 6 years and this is exactly what I ended up doing. It’s a fantastic way to program.

My leaves are either pure functions (FP languages) or value objects that init themselves based on other value objects (OOP languages). These value objects have no methods, no computed properties, etc. Just inert data.

No mocks and no “header” interfaces needed.

On top of that I sprinkle a bunch of UI tests to verify it’s all properly wired up.

Works great!


- Structure your code so it is mostly leaves. - Unit test the leaves. - Integration test the rest if needed.

Exactly. You expressed my thoughts very succinctly. Though I feel the post tries to say the same just in a lot more words.


I didn't get that from the post at all, I thought the post advocates mostly for integration tests and I didn't see anything about refactoring code to make unit testing easier.


This is my exact mentality as well! In fact, I like it so much that I apply it to system design as well. Structure the pieces of code into a directed acyclic graph for great success. A tree structure that terminated leaves is a DAG.

https://en.m.wikipedia.org/wiki/Directed_acyclic_graph


> Integration test the rest if needed.

Is there any situation where there is integration, but no need to test it?

You seem to be suggesting that if the leaves are thoroughly tested, nothing can go wrong in their integration, but at the same time, I cannot imagine someone believing that.


Exactly, most bugs I see are in integration, mis-matches in data-models or state. But I also work on business-y applications which tend to be more integrations than local business logic.


Integration tests are always needed in some form, because you need to make sure the leaves are actually called, since unit tests are executed in a vacuum, the functions might work but might never be called at all, or might not work because of weird bugs that appear when only testing the whole tree.


Anyone can give me an example or explain a little more about "Structure your code so it is mostly leaves."?


Leaves in this context would be classes that have no dependencies.

If you need to create an object, can you pass the name of the class in? Or can the object be created elsewhere and passed in fresh? If you're making a call to a remote service (even your local DB) are you being passed a proxy object?

All of these references can then be provided as a test double or test spy, so long as they are strict about the interface they provide/expect, and you can exhaustively cover whatever internal edge cases you need with unit tests.

Don't _forget_ the integration tests, but my personal opinion is that it usually suffices to have one "success" and one "error" integration test to cover the whole stack, and then rely on unit tests to be more exhaustive about handling the possible error cases.


This is very interesting. I'm not 100% sure I understand. Any example of this or resources on this style?


So much people in this thread is talking about different domains and are not able to see that they need different rules.

It is not the same creating a library that is going to be used to launch a multi-billion rocket to Mars than developing a mostly graphical mobile app where requirements are changing daily as you A/B test your way into better business value.

The article has really good points and the reasons why they work. Apply them wisely. Take the right decision for your project. Don't be dogmatic.


I've programmed in many domains. There are only three places where I don't use heavy testing:

1. Very, very graphical programming - like SVG charting with animations. If it were static generation it would be easy, but throwing time into the mix makes the tests really hard plus if things go wrong in the future people can literally see it going wrong and complain, so I don't think it is worth the trouble.

2. Data analysis meant for static reporting. You know, those 2000 line SQL queries that barf out data that you pop into excel to munge through before typing up a 20 pager for upper management.

3. Small personal tools, like a CLI script that spits out equivalent yearly interest rates or what have you.

Everything else I test. Libraries, backend web apps, machine learning shit, compiled, whatever. It is too easy for codebases to turn into a hellscape without tests. You get too afraid to change things.

Should it change the API to the codebase? I usually don't think so, but occasionally I'll, say, make something an instance variable that I'd normally keep as a locally scoped variable. So I model test what I can and I integration test the rest. I think that he's right that integration tests cover a lot of functionality and I think he's right that mocks aren't usually great, but I think he's wrong about how much testing we should be doing. Most things should be tested most of the time.

It's cheaper. Why?

Because it costs money to hire support people and it costs money to validate their tickets and it costs money to fix the bugs and the bugs are harder to fix when the data is already in the system and the data is wrong.

It's also more profitable to write more tests. Why?

Because bugs mean lost customers and even when you keep the customer the feedback that you get from them is which problems you need to fix, not which functionality could be made better.


The fight against the TDD cargo cult is vicious, barely started, and far from over.


Please don't. If you have a substantive point to make, make it thoughtfully; if you don't, please don't comment until you do.

https://news.ycombinator.com/newsguidelines.html


Good lord. Why integration tests?

    I think the biggest thing you can do to write more integration tests is to just stop mocking so much stuff. 
Okay. The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

So they try to write E2E tests which work for about 5 or 6 quarters and then fall apart like a cheap bookcase. If you can find a new job before then, you never have to learn to write good tests!

I agree with the author that the trick is to stop using mocks all the time, but you don’t have to write integration tests to get rid of mocks. You have to write better code.

Usually if I have a unit test with more than one mock it’s bcause I’m too invested in the current shape of the code and I need to cleave the function in two, or change the structure of he question asked (eg, remake two methods into two other methods).

Almost always when I accept that the code is wrong, I end up with clearer code and easier tests.

Unit tests run faster, are written faster, and not only can they be fixed faster, they can be deleted and rewritten if the requirements change. The most painful thing to watch by far is someone spending hours trying to recycle an old test because they spent 3 hours on it last time and they’ll be damned if they’re going to just delete it now.


> The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

The biggest problem I see with people advocating for tests and employing TDD is that they do change how they write code to accommodate tests. This leads to inclusion of lots of unnecessary abstraction and boilerplate patterns that make code less readable and more bug-prone. OO world has spawned numerous non-solutions to turn your code inside-out so that it's easier to mock things, at the expense of code quality itself.

That said, if you go for functional style in OOP, i.e. shoving as much as you can into static helper functions and most of the rest into dumb private stateless functions, you suddenly gain both a clean architecture and lots of test points to use in unit tests. So you can have testable code, but you have to chill out with the OOP thing a bit.


> as much as you can into static helper functions and most of the rest into dumb private stateless functions

In our work we use C# and it is very hard, even next to impossible to make a static class pass a code review - given it's not for extension methods (which I hate... why not be explicit about the first parameter and stop acting as a part of the class </rant>). They just tell us to use IoC and move to the next point. I honestly don't know why. Our IoC library can treat a dependency as static or singleton, but those are also discouraged. Once I had a static class named GraphRequestHelpers* and the reviewer got really negative, FSM knows why. She told me that we need IoC to make everything testable and "Helper" in the name is a code-smell. Sounds cargo-culting to me but I have only 6 years of experience so who I am to know.

* Now we have RequestExtensions and everything is apparently perfect.


There is some cargo culting there but it's mostly correct.

Helper is a code smell as it's a sign of "we don't know what the responsibility of this is or what to call it so we'll just chuck a load of shit in this file and call it a helper". The methods in should belong to something and live on that class, not in an external class.

RequestExtensions is more shit than the original solution. Extension methods are even worse! Shoot the reviewer.


This is a matter of taste not fact. In functional languages the style is compositional with static functions everywhere. It works well. The keeping data and methods together thing is one approach. Sometimes it's great. Sometimes unnecessary.

For example would you argue against string formatting helpers? Or would they need to be written to an interface and added to myriad DI bucket lists?


It's not that simple and it's not a fact. I'm an advanced user of functional languages as well and have written an entire scheme implementation before. I only semi-agree. That's slightly disingenuous representation of functional languages which have more than a few pitfalls. They certainly aren't the silver bullet and they really do not scale to the same height and complexity of the problem domain as the OO languages do due to the nature of the abstraction you describe. Nothing is particularly explicit. I'd rather take the compromises of OO over the maintenance problems of a functional language.

String formats are data so they would be stored as constants so that they are interned. They can be stored in a const class which is a static class with no methods i.e.:

   sealed class StringFormats {
       public const DateFormatX = @"...";
   }
Also string formats for example tend to be owned by the respective objects so you can add overloads to the object to provide certain arbitrary representations. If the translation between an object and the string representation is complex, then you're really serializing it so that should be an abstracted concern.


To be clear I'm not saying functional is always better. I'm saying there are other possibilities and dogma is bad.

The Haskell community has its dogmas too. And it's fair share of "let's do this simpler" blog posts.

As an aside Haskell has many equivalents of dependency injection and sugars to help and you can "inject all the things" there too.

My point is to think of the problem you are trying to solve, rather than ticking off the SOLID / Martin Fowler etc. tick boxes.

Understanding OO patterns, SOLID etc is a good thing but being prepared to "break the rules" is good IMO too.


Breaking the rules is good when appropriate. Problem is those rules are pretty amazingly good. I went through a weird phase of denial and ended up back where I started before I applied the aforementioned rules.

Every exception I have shot myself.


> Shoot the reviewer

Duly noted! Although I'll try talking to her first, I'm sure there's more behind the decision :)

One of the methods that was inside takes a request, extracts the body and returns the parsed graph from the body. It's used by many controllers from many projects. I don't know where to put such a thing, hence the request extension.


Always ask for the reason before slating it :)

Usually that's a single responsibility class:

   interface IGraphParser {
       Graph Parse(Request request);
   }
Inject that into the caller via the container then you can mock the thing that calls it and just return a static Graph object, which you can't do with a simple extension method (which is why it sucks).


Extension methods are useful for only one reason: they trigger code completion for browsing what this object can do. Static methods suffer from FP code completion problems (you can’t complete easily in the first arg of a function/procedure).


I think I am not mistaken in saying extension methods, like lambda functions, were invented primarily for the use case of Linq. Even if they weren't, that's how Linq is implemented, so extension methods serve more than that "one purpose" if you don't insist on writing C# in the style of C# 2.0.


They came out at the same time, I’m sure there was some influence between them (Mads Tergesen would know better). However, all the functionality added in could have been done with static methods, just with more verbose syntax. LINQ query syntax could have been special cases. Anyways, I like what they came up with, it’s very versatile.


It's not clear that would have been much less work


Why hate extension methods? Do you really want to write Enumerable.ToList(Enumerable.Select(Enumerable.Where(someList, e => e.someBool), e => new {a = e.x, b = e.y)) and so on?


That would suck, on the other hand, the extension methods make people create huge chains continuations of which comes from who knows where.

Best solution would have been a pipe operator if you ask me.


Could someone expand IoC for me please?



Do you practise TDD? If you did a lot of this would make more sense to you. TDD is actually quite fun when you get the hang of it (less mental burden as you push all the 'intent' onto the computer).


I don't see why TDD requires ruling out static methods and insisting on hiding everything behind an interface. Static methods are straightforward to test, certainly more than a class with multiple dependencies which need to be mocked. Usually the complaint is about coupling when calling static methods but these can be wrapped in a delegate if required.


Simply because you can't mock the static dependency, therefore that method is now dependent on the static class and you don't have any control over it. This is problematic - what if at some point later another developer adds a database call into the static method to do some logging? Now your testing will dirty whatever database you're using, as well as run 10x slower - and yet the test will still pass and everyone will be none the wiser as to what happened.

If you start using a custom delegate solution, then your code is not consistent with everything else that uses DI, making it harder to understand. I can understand interfaces are annoying when navigating code, but the IDE still helps with that even if it is a few more button clicks, and the pros outweigh the cons.


> that method is now dependent on the static class and you don't have any control over it.

I don't see how you have any less control over it than any other code you wrote. If you don't want it to write log statements, then don't do that. Most static methods are small and pure so don't need to write log statements anyway.

> Now your testing will dirty whatever database you're using, as well as run 10x slower.

I've never used a logging framework that didn't allow you to configure where log statements were written, or give you control over the logging threshold for individual classes. However if your method is writing logs then presumably there is a reason, which is just as useful in the tests. If you mock it out then you're testing against different code to the one you will actually run against.

> If you start using a custom delegate solution, then your code is not consistent with everything else that uses DI.

Passing functions as arguments directly is 'DI', just without the need to configure that through an external container. Reducing the amount of interfaces (often with a single implementation) and external configuration makes navigating the code easier.


I think you missed my point, it's not about the logging framework, its about the fact you don't control an external dependency during testing. Unit tests are meant to be reproducible, meaning they are done under controlled conditions.

> Most static methods are small and pure

This is very assuming, tests are a way of being specific about your intent.


> its about the fact you don't control an external dependency during testing

If your code is structured using small static functions, you don't have any dependencies in the first place, just arguments you are passed and transform. You will probably create interfaces for external services you depend on, but you can avoid needing to mock them if you express the transform directly.

> This is very assuming

I'm not assuming anything, since I wrote the static method and I also decided to call it, presumably for the result it calculates. Your argument appears to be that static methods could contain bad code but that applies to all code you depend on.


You mean that the tests will depend on the thing being tested? What a crime!

> what if at some point later another developer adds a database call into the static method to do some logging?

Then you have a developer that does not grasp the idea of functions, and how they can help you improve your code. That's a call for education, not for changing your tests.


The point is that tests are omnipresent, people aren't. I've worked at places where all sorts of dumb code has got through because there is no automation in place to stop it, and everyone else is too busy to do code reviews.


In Java, you can use PowerMock to mock or spy anything, even private static final things. I consider it a smell (though excessive mocking even without powermock is its own smell), but it's immensely valuable to get code you can't change (or fear changing because of its complexity and lack of tests, or simply don't have time to change because the refactoring would take a whole sprint) to have some tests.

You don't need interfaces for everything in order to do DI. Interfaces should be used only for having multiple implementations or to break dependency loops.

Other than that I'm in agreement, static methods generally aren't a good idea. They can all too easily balloon into big chunks of imperative code with inner dependencies (static or not) at 10 indentation levels deep. Non static methods can too, but not as easily, and you have more options for fixes/workarounds in those cases anyway. The only place they really make sense is as part of a set of pure primitive data transforms, and ought to be small.


In C# we have Moq that can mock normal classes, though it requires adding 'virtual' to every method you want to override which is a code smell too. In Java everything being virtual by default I guess it doesn't matter. We like to always keep an interface around as it gets the developer used to working that way and keeps the code consistent. Visual Studio provides a quick shortcut to auto gen the interface too.


> Now your testing will dirty whatever database you're using, as well as run 10x slower

It sounds like a problem is a few layers higher. Why is there a live database in your unit testing environment? Why are working credentials configured? If they're unit test, not integration tests, all db operations should be DId / mocked / whatever. Any call that isn't should fail, not take longer time. Db interaction is for the integration tests.


That's exactly my point, mock your external dependencies. Static calls don't allow you to do that.


In your language of choice:

    static_function(db, other, arguments) { ... }
    test { static_function(fake_db, 1, 2) }
You can even omit the db in the standard case if your language allows default keyword arguments. In almost every language, a method is just a fancy static call that takes extra arguments implicitly. (Closures are poor man's objects, objects are poor man's closures...)


So how exactly do you test that your SQL query does the right thing? That you're using the Twitter API correctly?


Testing a database, or an external web service, is an integration test. They can be as simple as:

    void TestCreateUser() {
        var repo = new UsersRepository();
        var mockUser = new User("John", "Smith");
        repo.AddUser(mockUser); // db call
        var addedUser = repo.GetUsers().Single(); // db call
        Assert.StructureIsEqual(mockUser, addedUser);
    }
For the Twitter web service, you might test that you successfully get a response, as you don't have control of what exactly comes back.


How is static code different from other noninjected code, like stuff in a method. Taken to the logical conclusion we'll have thousands or classes full of max 2 operations per method.


How many static classes are your methods using? And what is the problem with injecting this stuff at the top of the class instead? If you plan to write tests, you have to control your dependencies, and DI is the simpliest way to do that.


Because these tests become too detached from reality if you inject everything. A silly example to make the point:

IAdder { double Add(double a, double b) }

Test:

car mockAdder = ... // Mock 1+1=2 etc.


um... yes... this is actually what "pure" OO involves. The only reason we don't do this is because it's a nightmare to manage.


“What if ...” doesn’t pass YAGNI.


Problem is that the moment you start introducing delegates and crap like that is you're inventing a mechanism to work around your resistance to not using static methods rather than actually solving any problems.

There is no functional difference between a class with static methods and a class without, of which one instance is available to other classes.

Other than the fact that it isolates state, allows mocking and substitution and testing.


I disagree that delegates and higher-order function are 'crap' or in any way more complicated than introducing interfaces that are injected though a centralised container. You could just as easily turn that argument around and say mocking and an overuse of interfaces come from your resistance to using small static methods. In C# Linq is almost entirely based on static methods and delegates and that is not harder to test as a result.

Static methods usually don't rely on any hidden state at all. The example originally given was for a graph operation which could just take the input graph as an argument and return the result. When your code is composed of small independent functions you don't need mocking and substitution at all. In my experience most uses of mocks come from functions that do too much in the first place.


TDD is completely orthogonal as to whether you write functional code that doesn't have a default receiver for routines, or OO code.


Yeah there is some cargo cult aversion towards statics.

Static methods with no side effect are wonderful, but static state is really bad and static methods which perform IO are horrible because they cannot be mocked in a unittest.

But some people miss this distinction and just say static methods are bad for testing.


C# is the new Java... facepalm


Rifle is the new pistol? You can shoot yourself with both?


facepalm of enlightenment?


The abstraction is consistent though, and familiarity is a good thing when navigating a codebase which has N amount of other devs pushing to it every day.

I practise TDD for peace of mind - if I add new functionality to existing code I can be 99.9% sure I haven't made any regressions. When a client's system goes down on a friday, I can 99.9% guarantee it wasn't my code that is at fault. If I have to work at the weekend to update a production server, I'm 99.9% sure it'll go smoothly as my tests say it will.


Exactly this. Confidence is everything.

I can actually write entire features with appropriate test coverage from the ground up and they work first time and have close to zero defects in production.

It's amazing when you spend 5-6 days writing code that does nothing and at the last moment, everything slots together with a few integration tests and wham, feature done. Not talking trivial stuff here either; big integrations across several different providers/abstractions, bits of UI, the lot.

You see a lot of people arguing against this but I'm going to be honest, they churn out a lot of stuff that doesn't actually work.


> You see a lot of people arguing against this but I'm going to be honest, they churn out a lot of stuff that doesn't actually work.

My anecdata cancels out your anecdata. The TDD practitioners that I've met have, without exception, written code that worked fine for only the one case that they've tested. Example: They'd test a method for sending a message with the string "hello". Turns out the method didn't URL-encode the message before POST-ing it, and sending anything with a space was broken. They were confident and pushed the change.

Not saying you're wrong, just that TDD doesn't seem to work for everybody, and can even be a distraction.


That will be because you worked with dumbasses.

If you only test the expected outcome you are a dumbass.


How do you deal with http interactions in test? Do you mock it or maybe save it to disk to replay later?


That's an integration test really. The clients all have an abstraction around the http endpoints so nothing touches integration in unit tests. The advantage of this is you deal with transfer objects only in the code, no HTTP which would violate separation of concerns.

I use HttpMock myself in test cases which fires up an http server for personal projects. We use Wiremock commercially.


Here is how I handle it

1) Write a test that runs the service and saves output to a file. 2) Mock out the call to just return the data from the file and validate results. 3) If you need variations on this data just modify the file/data (often as part of the test)

I usually leave number 1 in the code but disabled since it often relies on remote data that may not be stable. Having the test run more than once is not very beneficial but being able to run it later and see what exactly has changed is great.


In this case, what's the difference if you write the test before or after though? You would still be covered. I don't lean in either directions in this argument, just curious to understand.


If you write the tests afterwards, you find where all the tight coupling and crap abstractions you accidentally did are afterwards.

That leads to you either not writing tests or having to refactor bits of it. It's easier to do this much earlier on.


The difference is night and day - writing tests first means you write 'testable code' from the beginning. Following the red, green, refactor mantra means that for every change to your code, you already have a failed test waiting to pass. The result is your test cases make a lot more sense and are of a superior quality.

To liken it to something you may be familiar with - when commenting your code, do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written? I'm sure you immediately know which approach results in better quality commenting, and it's the same with TDD.


> To liken it to something you may be familiar with - when commenting your code, do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written? I'm sure you immediately know which approach results in better quality commenting, and it's the same with TDD.

Not to take the analogy too far, but usually when writing a chunk of code I can keep it's behaviour in my head for a good amount of time and find it's best to add comments at the "let's clean this up for production" phase when you can take a step back and see what needs commented. If you comment as you go, you'll have to update your comments as the code changes and sometimes throw comments out which is a waste of time.

Likewise with tests, I'm not saying write them far into the future, but I think having to strictly stick to red/green/refactor is going to waste time. What's wrong with writing a small chunk of code then several tests when you're mostly happy with it? Or writing several tests at once then the code?


People just don't write comments or tests after, that's the problem. If you do then that's fine, but after trying both routes I actually find TDD to feel like less work - not having to wait on large build times and manually navigating the UI actually makes for a more fun experience. Instant feedback being the fun part. Additionally writing tests 'after' always feels like work to me and I end up hating it, especially when I didn't write it in a testable way to begin with.


> People just don't write comments or tests after, that's the problem.

Doesn't that get caught in code review anyway though? I find being forced to write tests first can be clunky and inefficient. Also, I've worked with people who insist on the "write the minimum thing that makes the test pass" mantra which I find really unnatural like you're programming with blinkers on. TDD takes the fun out of coding for me sometimes.

Generally I'd rather sketch out a chunk of the code to understand the problem space better, figure out the best abstractions, clean it up then write tests that target the parts that are most likely to have bugs or bugs that would have the biggest impact.

I find when you're writing tests first, you're being forced to write code without understanding the problem space yet and you don't have enough code yet to see the better abstractions. When you want to refactor, you've now got to refactor your tests as well which creates extra work which discourages you from refactoring. When the behaviour of the current chunk of code you're working on can still be kept in your head, I find the tests aren't helping all that much anyway so writing tests first can get in the way.


What you describe is the typical mindset against TDD, it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it, as TDD benefits are often not apparent to begin with, they only come after a couple of days work or weeks or months later or even years later.

You find that you need to do less mental work, as your tests make the required abstractions apparent for you. 'the minimum thing that makes the test pass' ends up being the complete solution, with full test coverage. Any refactoring done is safe from regressions, because of your comprehensive test suite. And when other colleagues inevitably break your code, you already have a test lying in wait to catch them in the act.


> Any refactoring done is safe from regressions, because of your comprehensive test suite.

As much as I like the idea of TDD, I have a problem with this part. When some refactoring is needed, or the approach changes, it seems like you have two choices. One is to write the new version from scratch using TDD. This wastes extra time. The other is to refactor which breaks all the guarantees you got before. Since both the code and the tests are changing, you may lose the old coverage and gain extra functionality/bugs.

And unfortunately in my experience, the first version of the code rarely survives until the deployment.


I'm not sure what approach you've described here, but it isn't TDD. In the case of adding new features to existing code, as you are continually running tests you will know straight away which you have broken. At this point you would fix them so you get all green again before continuing. In this way you incrementally modify the codebase. Remember unit tests are quite simple 'Arrange, Act, Assert' code pieces, so refactoring them is not a time sink.


refactoring != adding new features.

Also some refactorings are easier with tests, some are harder.

The kind @viraptor mentiones is the kind that spans more than one compoment. For example when you decide that a certain piece of logic was in the wrong place.

The kind of refactoring that becomes easier is when you don't need to change the (public) API of a component.

Take for example the bowling kata. If you want to support spares and strikes and you need extra bookkeeping, that's the easy kind of refactor where your tests will help you.

But if so far you have written your tests to support a single player and now you want to support two players who play frame by frame... Now you can throw away all the tests that affect more than the very first frame. (yes in the case of the bowling kata, you can design with multiple players in mind, but that's a lot harder in the real world when those requirements are not known yet)


> What you describe is the typical mindset against TDD, it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it, as TDD benefits are often not apparent to begin with, they only come after a couple of days work or weeks or months later or even years later.

I've been forced to follow TDD for several years and also been given the same kind of comments to downplay any reasoned arguments against it which I find frustrating to be honest. I don't see why the benefits wouldn't be immediately apparent.

> You find that you need to do less mental work, as your tests make the required abstractions apparent for you. 'the minimum thing that makes the test pass' ends up being the complete solution, with full test coverage. Any refactoring done is safe from regressions, because of your comprehensive test suite. And when other colleagues inevitably break your code, you already have a test lying in wait to catch them in the act.

You can do all of the above by writing tests at the end and checking code coverage as well.


"Any refactoring done is safe from regressions, because of your comprehensive test suite. "

With the right tests this works great. I have also seen the opposite where a test suite was extensive and tested the last details of the code. Then the refactor needed more time to figure out what the tests are doing than the actual refactoring. As often, moderation is the key to success.


Unit tests should follow a simple 'Arrange, Act, Assert' structure and test one single thing, described in it's title. I agree anything too complicated starts to defeat the point, especially when we are mainly after a quick feedback loop.


How to implement an SDK that performs real-time manipulation of audio without really knowing how to describe the correct result using TDD?

Only when I understand the problem, the SDK and the expected result I could start writing some tests.

TDD seems to be for business applications where you're using tried and tested technology to implement new problems.


> it's difficult to explain the benefits, and really you just have to experience them for yourself. Changing your mindset is difficult, I know, why change what works right? My only tip is to keep an open mind about it

Scientologists make the same argument


Maybe writing code for exploration and production should be considered separate activities? The problem with these coding ideologies is that they assume there is only one type of programming, which is BS, the same as assuming a prototype is the same as a working product.


What's exploratory programming though? Unless you're writing something that's very similar to something you've written before and understand it well, most programming involves a lot of exploration.


Well, UX prototypes for one. In research, most projects never go into production, those that do do so without researcher code. Heck, even in a product team, if you are taking lots of technology risks in a project, you are going to want to work those out before production (and it isn’t uncommon to can the project because they can’t be worked out).


To be fair, TDD is quite good for exploratory programming, as it makes you think about the intent or your API up front.


Not really. It makes you commit to an API upfront, this is the exact opposite of what exploratory programming should be (noncommittal, keep everything open).


No with TDD you don't need to go in with a structure in mind, the structures arise as you write more tests and get a proper understanding of what components you'll require. Red, green, refactor - each refactor brings you closer to the final design.


That's the mantra often quoted but it always makes me think of the famous Sudoku example from Ron Jeffries. Basically as a mantra it falls down if you don't understand the problem domain. It's popular because it works for the sort of simple plumbing that makes up a lot of programming work. This problem is particularly true for anything creative you're trying to express as the requirements are often extremely fuzzy and require a lot of iteration.

If you don't know how to solve a problem you actually need to do some research and possibly try a bunch of different approaches. Over encumbering yourself with specific production focused methodologies hurts. If you're doing something genuinely new this can be months of effort.

After the fact you should go back and rewrite the solution in a TDD manner if you think it benefits your specific context.


> After the fact you should go back and rewrite the solution in a TDD manner if you think it benefits your specific context.

Why? Why not add tests to your existing code?


Because it was never meant for production at all.


That really isn’t exploratory programming. The end result should be code that you throw away en masse (it should in no case reach production). Otherwise, production practices will seep in, you’ll become attached to your code and the design it represents, hindering progress on the real design.

When I was a UX prototyper, none of my code ever made it into production.


>I find when you're writing tests first, you're being forced to write code without understanding the problem space yet and you don't have enough code yet to see the better abstractions.

That's why it's better to start with the highest level tests first and then move down an abstraction level once you have a clearer understanding of what abstractions you will need.


Can you do that with TDD though? Why not just sketch the code out first before you start writing tests?

I find TDD proponents don't take into account that writing tests can actually be really time consuming and challenging, and when you've got a lot of code that is tests, refactoring your tests becomes very tedious.


You can do that with TDD (it's called outside-in), although I agree that it is time consuming and challenging, especially without the right tools.


>do you think it's better to add comments in as you write the code? Or add in the comments at a later date after the code is all written?

Define "all written". If we are talking about a new function - obviously you write you comment for it after the function ready to be commented on. And obviously you won't be commenting every string you put there, right?

Now, if we are talking about the whole new feature, that can consist of many functions and whatever - yeah, you usually comment your code in the process of writting the feature, rather than doing it at a later time, which will never come.


Comments / tests are a bad analogy. HN will over index on this and go down a rabbit hole.


I also find when following red, green, refactor that you end up producing more targeted unit tests that are more expressive of the code you are testing.

Trying to write unit tests afterwards lands me with something that appears as more of an afterthought or add on. It doesn't have to be this way I suppose, but it is more prone to.

This might be because I am more used to the red, green, refactor method though.


I also practice TDD, but with a different 'T' - Type Driven Design. I find them much easier to reason about with types and safer (you can't compile your code if it doesn't pass the type check). Just model your data as ADT and pattern matching accordingly.

Of course, types alone can't represent every error cases out there (especially the one related to number or string), so I still write Unit Test for those cases. But the number of Unit Tests needed is much lower.


That would be ATDD - Abstract Types Driven Design.

It completely fails on concrete types. That's the reason haskellers create a DSL for everything.


Is GUI code that 0.1%?

Because I am always keen to understand how to TDD GUI code and I don't mean the data model behind the pixels.


Visual tests are more general, and are more akin to putting up barriers on either side of a bowling lane so the bowling ball stays within it's lane (with room to move about still). For example when using Angular, you write 'Page Objects' that have methods such as .getTitle(), .clickListItem(3) and so on, and can then write assertions to make sure the UI changes as expected by inspecting properties [1].

I usually find I build a general page object first ('this text is somewhere on the page'), then write the UI, then make the test more specific if I can after (but it's an art, as too specific and you risk creating too many false negatives when you make UI changes).

(Also as you are interacting with the UI, these would be known as integration tests.)

[1] https://semaphoreci.com/community/tutorials/using-page-objec...


I don't think you can unit test GUIs, since by their nature all tests end up being integration tests. It's easy if you assign non-css (i.e. use a data-* attribute for identification instead of id or class since you want to keep those variable for stylesheet refactors) identifiers and just hard code the assumptions into the tests, like "when x is clicked y should be visible", or "when I enter 'foo' into the text field, the preview label should contain 'foo'". Ideally your assumptions about GUI functionality shouldn't change much throughout the lifetime of the project, and if you use static identifiers your tests should hold up during extensive refactoring.


That is my point of view, since my focus are mostly native GUIs.

Which is why it is my main question on TDD talks, that always use some library or CLI app as example.


To a certain degree you can unit test GUIs with tools like Ranorex or Selenium. The question is how much setup you need to get the GUI on the screen with the right data.


That isn't unit testing though, it's integration or e2e testing.


You can, but usually with lots of effort and cannot test UX and design requirements anyway, which is why I tend to make this question about full TDD based processes.


We use Ranorex for this: https://www.ranorex.com/


Interesting, thanks for the link.


I think so. I don't know how to unit test GUIs either.


I saw an enjoyable talk recently about snapshot testing. I don't know too much about testing generally but it seems like it could be relevant: https://facebook.github.io/jest/docs/en/snapshot-testing.htm... is the general idea but it doesn't have to be confined to jest/react

Edit: Slides from the talk I saw: http://slides.com/bahmutov/snapshot-testing


Past my edit window, but I want to add this - I feel that I introduced some confusion by missing one magic word in one special place. The first sentence of the last paragraph should be:

That said, if you go for functional style in OOP, i.e. shoving as much as you can into stateless static helper functions and most of the rest into dumb private stateless functions, (...)

Of course I do not mean you should abandon objects where there is a strong connection between a set of data items and operations that work on them, or where polymorphism is a right abstraction. But from my experience, quite a lot of code is made through transformations applied on simple data, and when you write that kind of code in a functional style (whether as static methods grouped in helper classes, or private methods within an implementation of your class), both quality and testability rises in lockstep. And my point is that quite a lot of code can be written this way even in an OOP project.


>That said, if you go for functional style in OOP, i.e. shoving as much as you can into static helper functions and most of the rest into dumb private stateless functions, you suddenly gain both a clean architecture and lots of test points to use in unit tests. So you can have testable code, but you have to chill out with the OOP thing a bit.

Wow, this is exactely totally opposite of how one can achieve testability in OOP! For more details I recommend excellent Misko Hevery's article "Static Methods are Death to Testability" [1]. Also, I'd argue that "functional style in OOP" is an oxymoron - you're either OO or something else (functional, imperative...)

[1] http://misko.hevery.com/2008/12/15/static-methods-are-death-...


I'm not sure if this article is clever satire.

> The basic issue with static methods is they are procedural code.

So is any object-oriented code. OOP is a subparadigm of procedural programming.

> Unit-testing needs seams, seams is where we prevent the execution of normal code path and is how we achieve isolation of the class under test. seams work through polymorphism, we override/implement class/interface and than wire the class under test differently in order to take control of the execution flow. With static methods there is nothing to override.

Why did it not occur to him that the function boundary is the "seam" he's trying to find?

I mean, `method(a, b)` is equivalent (as in: equally expressive, and usually implemented in the same way) as `a.method(b)`. Therefore, any problems with one case equally apply to the other case. If his problem is that `method(a, b)` may call other, non-mockable functions, then that criticism equally applies to `a.method(b)`.

(As I'm writing this, it occurs to me that the author may be suffering from the "OOP = Java" delusion.)


The OOP = Java trap is all too common, but the converse is also a trap: just because you've written OOP code in a different environment doesn't mean that pattern will work in Java.

Go with what the ecosystem supports, and you'll find your tooling helps you a lot more than if you fight against it by trying to force non-idiomatic structures. Your colleagues will appreciate it, too.


He probably meant no-side-effect static functions. I myself find using these a lot. For common CRUD web apps, you have Spring doing most of the stuff for you and you simply need to write stateless methods. However, for not-common requirements, you might need to use classes and OOP patterns to implement a complex logic.


I did. I even wrote the magic word a moment later, but forgot it there, and realized only past the edit window.


It seems the encouraged method is IoC these days, and that's just dreadful. IoC/Dependency resolution all over make it insanely hard to reason about code without running circles through the codebase.

For me, IoC seems invented almost entirely to make up for how difficult testing can be in particular languages. Which, sure, making up for shortcomings as good, but the necessity to use IoC for it feels bad.


This is how I felt about IoC as a junior dev 15 years ago, before I'd actually used it myself.

Nowadays the benefits are clear to me: more modular, testable code, and also lifecycle management.

Give it a try, you may well change your views.


I've used it a fair deal. I've found I prefer languages that don't require IoC to make code testable.

I agree that it's one of the sanest options when it's required, I just think that language design should incorporate testing ergonomics from the start.


> unnecessary abstraction and boilerplate patterns

That means you didn't actually change the code. It means you added unnecessary abstractions around your code in order not to change it.

Unit tests guide you towards simplicity. In my experience, the only times they haven't done that is when I have made some assumptions about what the code should be and not allowed the tests to drive me towards that simplicity.

http://blog.metaobject.com/2014/05/why-i-don-mock.html


    That said, if you go for functional style in OOP, i.e. 
    shoving as much as you can into static helper functions and
    most of the rest into dumb private stateless functions,

We had this at a company I worked at a while back - dozens of modules with nothing but static functions that all took a first argument of the same type. If only there was some kind of METHOD for declaring a whole bunch of functions that operated on the same data...


Until you get into polymorphism etc. this is just a style thing.

method(a,b) is equivalent to a.method(b) and exactly as much typing. You do save manually typing the extra part of the definition but 'eh'. A few languages treat these interchangeably.


people ... don’t want to change how they write code. They just want tests for it

Have you considered the possibility that those people are right? That's a reasonable conclusion to make if you are seeing lots of otherwise smart people that share an opinion that disagrees with yours.

There are lots of valid reasons to change the style in which you write code. In my mind, fitting somebody's fad testing scheme is not one of them.

Here's a second opinion from a guy who also likes tests, but doesn't think it's a good idea to structure your whole codebase just to accommodate them:

http://david.heinemeierhansson.com/2014/test-induced-design-...


I strongly agree with that, too. My current, experience-born belief is that if the only reason for introducing some architectural pattern is to accommodate testing better, the change is wrong and will likely hurt the code quality. Yes, you need to concede a little bit to allow for test points, but turning your code inside-out to have it go through three layers of indirection so that the middle one can be mocked easily? That's just stupid.


yea, you should be changing code style to increase modularity in a way that is conceptually coherent in terms of what is easy to hold in your head. Increased testability should fall out of that because you can think through "What invariant should hold true about X under conditions/inputs Y1...Y4?"


Code you can hold in your head usually has a smaller surface area. Fewer moving parts equals easier testing.


    Have you considered the possibility that those people are right?
Every time I’m looking at a code review with awful tests. I started out in statically typed languages and I can’t shake the feeling that we need to tool our way out of the testing conundrum.

Anything that is this hard to get right shouldn’t be the equal responsibility of every team member. For every other problem of this magnitude we have people who specialize and everyone else just has to be aware of the issues and consult when in doubt.

So it’s a struggle for me to try to get people do adhere to the strategy we’ve accepted without believing it’s the end all be all of software robustness. Because I’m not convinced. Nothing I’ve ever mastered in software has taken me half as long as testing, and that just ain’t right.

That said, I still like the structure about 80% of my tested code has. It usually does exactly what it says and nothing else. Local reasoning is a big deal to me.


The main problem is "obsession" as pointed out by that blog post you linked.

Obsession of "one size fits all" or "silver bullet". I believe the authors of agile manifesto wrote this disclaimer.

If it doesn't make sense to make unit tests to MVC controllers, then don't.

In my experience, management looking at code coverage not being 100% is one reason (although bad) that this "unit test everything" happened. I tried shouting this out, but team lead didn't have the ability to learn from a junior and use Sonar's configuration.


Usually splitting out into interfaces is done in static languages as that's the best type-safe way to do things. It's not a fad, it's been like that since the beginning.


There is absolutely no reason (except to fit into a particular pattern of testing) to turn everything into an interface[1]. That has nothing to do with type safety.

[1] Obviously some things do make sense to put behind interfaces, but I find that most Java developers go interface crazy and the code ends up being an unreadable mess.


In some situations unit tests with lots of mocks will bring a negative value. Imagine a situation where you want to refactor a big piece of code with many dependencies but you don't want to change its public interface.

If you mock everything, when you refactor, the test will break because the dependency structure will change, and the mocks are no longer relevant to the new implementation. You have to rewrite the tests. You did twice the testing work and more importantly you get absolutely no proctection against regressions because the tests for the 2 versions are not the same.

If you build integration tests, they can remain the same. Less work and actual protection for your refactor.


Testing internals forces future programmers of the codebase to maintain those invariants. All code is a liability.


Not in my experience. Convincing people to delete tests that only assert one invariant when the business changes its mind is easy. It’s the ones that have residual value after removing one invariant that trap people into spinning their wheels.


I agree with this.

Although, if you offshore development here in the third-world where internet gets slower everyday. Running an integration test that queries to Amazon RDB can take forever.

I hope this issue gets a spotlight and be noted that integration tests in third world countries is very very slow. And this high cost should be included in the estimates.

To give you an idea, here it takes AT LEAST 5 seconds to load a page from the amazon console. Lol, even the software companies owned by the ISPs/Telco here complains that their access to AWS is super slow. They said that bad routing is the main issue and for some reason the ISP isn't doing something about it.


> Running an integration test that queries to Amazon RDB can take forever.

Why wouldn't you run your test on aws if it needs to integrate with rdb anyway?

Or are your code changes so massive that "git push" is slow?


No massive.. But yes, git push takes a few seconds, but bearable. Once pushed, I'd need to do some SSH into the Jenkins server, so you run only your brand new integration test. Running the test there is super fast, but everything else including typing one character in PuTTY is slowed dowwn.

All this while you are expected to fix 10 tickets for the whole day plus anything that goes wrong in production.


I agree with everything but your conclusion, but I have an aversion to mocks that isn’t shared by everyone.

If the code changed due to a big behavioral shift then your integration and E2E tests aren’t safe. It’s more than twice the work at the higher layers because people get trapped by the Sunk Cost Fallacy. They try and try to save the old tests before they finally rewrite.

That is the observation that convinced me to stick to unit tests. People aren’t emotionally attached to individual unit tests.


Very good point.


This sort of discussion often gets confused because people have different ideas about what integration tests are and therefore talk past each other.

I generally avoid the term altogether and recommend testing stable API's (which are often public) and avoiding testing internal API's that are more likely to change. This assumes you have a stable API, but that's true of most libraries.


I think we are discussing what it means to test a stable API or an internal API. Not just testing them in general. We're talking about making architecture decisions on your code in the interest of test-ability. Regardless of the visibility of your API you will still need to unit test the logic will you not? Do you test your controllers and then the response from your service layer? Is all your logic in your actions?


Exactly - Integration tests !== E2E tests


Martin Fowler solves this by introducing:

SubcutaneousTest

https://martinfowler.com/bliki/SubcutaneousTest.html


Catchy.


>Okay. The biggest problem I see with people trying to write unit tests is that they don’t want to change how they write code. They just want tests for it. It’s like watching an OO person try their hardest to write OO code in a functional language.

I've seen what happens when a developer tries to abstract away a database in a database driven app so it can be "better unit tested". It's a goddamn mess.

If your app relies heavily on using a database, your app naturally integrates with a database then it makes no sense to test without it. You are intentionally avoiding testing in a way that will pick up bugs.

>Unit tests run faster, are written faster

Unit tests test less realistically. That means they don't catch bugs integration tests do.

They also often take longer to write and are more tightly coupled.

How coding this way came to be seen as a best practice is beyond me. Tight coupling and premature optimization is usually seen as bad practice in other areas.


> If your app relies heavily on using a database, your app naturally integrates with a database then it makes no sense to test without it. You are intentionally avoiding testing in a way that will pick up bugs.

Also, with Docker it's now actually feasible to automatically test against a real database at a reasonable speed. A Postgres container spins up in a couple of seconds, a SQL Server one in a little over four.


That has nothing to do with docker, really. I run postgres standalone on my laptop and it starts in < 1 second.


I guess they meant so your tests can start with a blank or reproducible state.

But you can of course achieve the same by running a script before your tests start. There are also some frameworks for doing this sort of thing too, such as Fixie for .NET


I've done "write a script to reset the database" before, although not for Pgsql. The effort and potential snags involved make it nowhere near as trivial as docker rm && docker run.

There are also other scenarios that become really simple with disposable DB instances. Want to test a remote data transfer feature? Just spin up two databases.


Drop-create scripts! One of my first epiphanies in the testing world.


Create only scripts can also be great in the right context.

Namely a context like travis-ci where you get a new clean environment each time.

Personally, I've develop few libraries/web UIs that relies on external software like a DNS server or an ldap directory.

And for those, I've a quick and dirty shell script that deploys bind/OpenLDAP.

It's far easier, faster and more accurate than mocking.

For example, what comes to my mind is the testing I do for all the SSL/TLS modes I support (SSL, StartTLS, certificate check disable or not, etc) in my ldap web application.

Travis also has available services in their build envs (stuff like redis, mongodb, mysql, cassandra...).


I do this with rsync to restore a snapshot of the database data folder.


Sure, but I presume that doesn't include installation time.


In another thread I talk about splitting deciding from doing and I find that strikes a very easy balance for database heavy code. Unit tests for the logic and just concede the data transport to higher level tests. Preferably with a local database full of test fixtures.


I work in a company where quite a few of the developers simply are incapable of writing anything but integration-tests.

The reason? They don’t “believe” in unit-tests. They don’t think unit-testing “works in the real world”.

They absolutely fail to accept that they need to write their code differently for automated testing to work well.

How do you change such a mindset?


For some reason, unit tests vs. integration tests reminds me of this image: https://pbs.twimg.com/media/CZX0O-tWQAAeaLi.jpg The unit tests past, but why bother with integration tests?

I wrote a component at work. Sure, there are some unit tests (the SIP parser working? Check. But for that component, unit tests only go so far as I need to query another program that actually implements the business logic (it goes SIP -> my program -> custom protocol [1] -> business logic program and back again). To mock the business logic program (I need to make sure to return the proper information per the SIP request) is to reimplement the business logic program, so when testing my component, we also use the business logic unit. The business logic also requires two more programs to run. At this point, there is no difference between a unit test and an integration test as I'm running five programs ... no, ... six, I do have to mock a cell phone (basically, respond to a custom protocol and make a web request), to test the "unit" that is my program.

Oh, and to make it even nicer, this is legacy C and C++ (C with classes) code.

[1] Legacy code. It works. At the rate of production deployments we (our team) gets, it would be around two years [2] to remove the custom protocol. So it stays.

[2] Have I mentioned the very scary SLAs?


Assuming you are a developer, start writing some.

Next bug you find that needs a unit/functional test (e.g. it is caused by a simple error in transformation in one function), write the test first as a table of inputs vs outputs, find it fails, fix the function, and leave the test in. Gradually, the code base will contain unit tests which are useful, people will see they are useful, and other people might start using them too where appropriate.

You are unlikely to persuade them without actually doing what you say is beneficial and exposing others to its benefits.


I agree. Tests for bug fixes are extremely valuable. Of such tests, unit tests are often very feasible.

A test accompanying a bug fix holds value in many ways.

Firstly, it demonstrates to those reviewing the change that the fix is suitable.

Secondly, the presence of a test encourages reviewers to consider what a test does and doesn't cover, sometimes resulting in comments regarding improvements that had not otherwise been considered.

Thirdly, and of most importance in the long term, a test for a bug fix serves to document oddities previously discovered that were for a time not known about.


I’m confident you know this, but just for the peanut gallery:

Tests that go along with bug fixes are some of the highest value tests, but they must be previously failing tests.

I can’t tell you how many times I’ve reviewed fixes with “tests” that are basically affirming the consequent; they assert something that was already true and it turns out they’re not actually fixing the reported bug.


It depends on what you unit test and why.

If it's for 100% test coverage: forget about it.

If you test private methods: you're doing something wrong.

What people usually see from the "unit test evangelists" are codebase for which you have tests for every method in the code. Then you do some refactoring and you have to rewrite tons of tests. And as those tests are just made to get 100% coverage you end-up with logic bugs because most unit tests have been written to go through the code, not check limits and edge-case. When you stumble upon this kind of test harness you can only think this as only cons (more to write upfront, less willingness to refactor) and no pro (the code is still brittle). Then your integration tests feel like your real harness: you can change anything in your code it'll tell you what has been broken when used.

Now if you consider your unit tests as a kind of integration tests for the API your classes present then you get the benefits of unit tests. But this mean testing only public methods. And mutation testing resilience is a better metric than test coverage.

Also: those tests do not replace a real documentation which can be a lot faster to read and understand than code.


People test private methods because edge cases occur in those private methods and the test for the edge cases do not belong in the unit test for the consumer of the private unit. If the consumer simply loops over a list of objects which it receives from the private unit, the consumer does not need to know that particular integer arguments are special cases in the private unit; that would be a leaky abstraction. However, it still makes sense to verify that you have correctly handled each special case via a unit test.

As for the difficulty or refactoring, if you refactor the private unit, you ensure that its tests continue to pass, since its consumers depend on that behavior: you ignore the failing tests of the consumers so long as the subordinate unit's tests are failing. If you eliminate the private unit, you eliminate its tests. Modifying the behavior of the private unit may be equivalent to eliminating the unit or refactoring it. The number of tests you will have to modify is equal to the number of units you modified the behavior of: the branch count of the private unit, or that plus its consumers. If each private consumer was responsible for testing all of the edge cases of the private unit, then you will have to change its branch count multiplied by the number of consumers worth of tests instead.

The distinction between private and public is wholly synthetic. It is a binary layering mechanism that does not map well onto most architectures which have many layers. From the perspective of many full architectures, everything down in the data layer is private: no customer will have direct access to the data layer. Yet you will still test the data layer.

The internals of a library are not special simply because the layering is thinner and binary.


My general theory is that if a private method is complex enough to need separate testing, it's usually complex enough to pull out into its own class and test as a separate public interface. That's 'interface' as in 'what a class exposes to its callers', not necessarily using an actual Java interface or making it part of the public API of the library.

A side-benefit is that the tests for the original class can be a lot simpler too, as I can just mock the responses from what used to be the class internals. Another benefit for libraries, is that it allows consumers to swap out your implementation. I've lost track of the times I've wanted to change something deep inside a library but it's implemented in a private method so I can't even override it.

This does lead to more, smaller, class files. But unless taken to extremes I've not found it to make things less comprehensible, and it definitely makes things more composable.


> If you test private methods: you're doing something wrong.

Maybe a silly question, but why?

If I refactor a class to pull a common piece of functionality into a private method, why would I not want a test for that?

One of the principle benefits of tests I see is allowing me to change the implementation without worrying about the behaviour, and I'm not sure why that wouldn't apply to private methods?


One reason why is because you should be testing the public behavior of a function/class not the details. The reason for this is because the public interface is what other parts of the codebase will come to rely on. Refactoring generally shouldn’t change the public interface as it will break other pieces of code within your codebase, or other codebases if it’s a library, and other systems if it’s a network api. So, if you test the public interface, generally refactors won’t break the tests.

Testing private functions also seems to be a smell that the overall setup of testing the class or function is too difficult. This can be because the class has too many branches in it, the argument list is too large, or too many other systems must be in place for it to function correctly. This, to me, indicates a public interface that is hard to use and will pass much of these issues on to the caller.

Lastly, if you are testing private functions to gain coverage then arguably the behavior in the private method isn’t actually useful to the public interface. The reason I say this is that testing the behavior of the class should end up touching all branch conditions inside the class or the public interface isn’t fully tested. By only testing the public interface it then also becomes easier to locate dead/unreachable code.

Hope that answers the why.


I would argue you absolutely need to be testing the internal details. That is the entire point of measuring branch coverage and performing mutation testing. Unit tests are not black box tests. They need to know that for special values, the unit has to follow a different code path but still produce sensible output. Reading the documentation of a function is not sufficient to determine what edge cases the unit has, but testing those edge cases is often critical to verifying that the unit adheres to its specified behavior under all conditions.

As for the smell, sometimes things are irreducibly complex. Some things in this world do require tedious book keeping. All the refactoring in the world cannot change the degrees of freedom of some problems.

Tests on consumers should not test branches of subordinate units. If you did this then the number of tests would explode exponentially with the number of branch conditions to handle all the corner cases. If a private unit produces a list of objects, but has special cases for some values of its argument, test those branches to verify it always produces the correct list. Then just make sure each caller does the correct thing with the list of objects. That is the purpose of separation of concerns: the consumer does not need to know that some values were special.


> the number of tests would explode exponentially with the number of branch conditions to handle all the corner cases

Then wouldn't you want to write something that was able to iterate through those edge-case interactions and ensure they are correct?


I'm trying to imagine what on earth your private methods can be doing that wouldn't be affected by the public interface.

There should be no situation where the same exact call in the public interface could take multiple different paths in the private method. The only thing I can think of that could make that happen would be some dependancy, which should be mocked at the top level to control these cases.


Some people call this functional testing.


Private methods are the internals of your classes. It may change a lot for performance or to make it easy to maintain, one method may become 3 or 4.

But people who use your class don't care. They input something in your public methods and expect something in return. The details of what happen inside should not matter. Adding tests there only help to slow you down and make the dev team resist needed changes. And when you add tests you increase the chances you make them useless or wrong.


Ah, I think I see. If I break the functionality by changing the implementation of a private class, that should be reflected in the public API unit tests.


That's how I see it (in Java at least), unit tests are for guaranteeing that your classes API does what it says it does.

In Python I am more loosely goosey about my unit tests and unit tests there are more for helping me write/think about tricky code.


If your private method is wrong, then your public methods will also be wrong. If your public methods are right, then it doesn't really matter what your private methods do..


I read it as using introspection/reflection to test what is essentially implementation details which are very likely to change.

This is how you write brittle tests which fail easily and cause high maintenance costs and reduced confidence in the unit-tests as a safety net.

Definitely an anti-pattern.


> If it's for 100% test coverage: forget about it.

Im a realist and don’t see any point or value in chasing that goal for a 20+ year old company code-base.

But I expect new modules to be fundamentally testable.

> if you test private methods: you're doing something wrong.

Agreed.

> What people usually see from the "unit test evangelists" are codebase for which you have tests for every method in the code.

Is that really so? It’s easy to be opposed to extremists of any form.

I only “evangelize” that business-logic should be tested and thus needs a code structure to isolate the business-logic from its dependencies (databases, services, factories, etc).

I find that perfectly reasonable.


> Then you do some refactoring and you have to rewrite tons of tests.

One of the core principles of TDD is that you write tests to facilitate refactoring---to make it _easy_ to refactor and have confidence in the system after doing so. I've been practicing TDD for ~8y and this is rarely a problem. TDD encourages good abstractions and architecture that lends itself well to composition and separation of concerns.

But if you change an implementation, of course a test is going to fail---you broke the code. It works exactly as designed. What you want to do is change the test first to reflect the new, desired implementation. Any tests that fail that you didn't expect to fail may represent a bug in your implementation.

Of course, I haven't seen the code, so I can't comment on it, and I won't try to do so.


"What you want to do is change the test first to reflect the new, desired implementation". Not sure if you meant this but this is exactly what is wrong with most unit tests that I have come across. They test the implementation and not the interface.

That's why I agree that the focus should mainly be on integration tests. Or at least functional tests. Ideally what you want is to have a system where all the state and logic is in a model (that model can include an external db). The gui should be as much as possible a function of the model i.e. model-view. Then you write the majority of your tests as integration tests against the model and include as many scenarios as you can think of. These tests should reflect the requirements/interface for the system and not the implementation. You should write some gui tests but these should be much less. They just need to verify that the ui reflects the model accurately. You shouldn't be testing scenarios as part of the gui tests.

I have come across too many code bases where the unit tests test that the code is what it is, rather than the code does what it should. Where 'what it should' == 'requirements/interface' == 'something close to integration tests'


I doubt that you can. There was a study a while back, and I apologize in advance because I do not have a link, that showed projects written with unittests took significantly longer to reach the market, but with significantly less bugs. However, overall time spend on the code was less. So conclusion was that unittests are a commitment to a long term goal of minimizing developer time, and the tradeoff is that it takes longer for the first version to be done.

That is, as far as I know, the only tangible evidence that unittests are good unless you need to get something out the door quickly (which sadly is most of it).

I'd argue that is not the main benefit of unittesting however. That is the way code is structured, and especially how dependencies are explicit, e.g. injected with constructor arguments.


That parallels my experiences. I got tired really early on with projects that ground to a halt because of brittleness and a lot of my focus is on building skill and confidence so that version 3 is no harder to ship than version 2 was. Every team I’ve left on good terms was more effective when I left than when I got there. The ones that fought me the whole way frustrate me and it shows.


Man, I really want to read that study!


Was this the microsoft study? I think it was a breakdown of their delivery of Vista?


> How do you change such a mindset?

IME you can't. If they even recognize that what they are doing is not unit testing then your doing well.

> They absolutely fail to accept that they need to write their code differently for automated testing to work well.

I've been thinking lately that having to code differently might be a fault of the tooling that's built up over the years and that they might be right. I've been getting back into c lately and had a look at the mock/stub options there which were very complicated and not very compelling compared to what I've been used to in the .net world. In the end I found the pre-processor was the best (for my project) option:

#ifdef test #define some_method mock_some_method #endif

The advantage has been that the code is written exactly (more or less) as it would have been if there were no tests. There are no interfaces to add, functions to virtualize or un-static and no dependencies to add, this all translates to no performance hit in the production code and the project being simpler all around.


Trying to "change" others' mindset is good recipe for frustration.


The keyword is "believe".

Such behavior is not science. In science, you don't believe, you understand not faith. You check evidence, you use reason.

A mindset that relies on "believe" but not on reason. It maybe because of decades of advertising. Can be a side-effect of participating in believe movements.

How do you change such a mindset? For me, it was reading lots of philosophy and atheist vs theist debates. For Socrates, he died for it.


Perhaps a well written, easy to follow guide on how to structure different types of code to simplify testing would be helpful. Know of any?


Working Effectively with Legacy Code is a good read: it presents what kind of code you want to attain and methods to get there from a crappy code base.

The definition of legacy code for the author (which I like) is: untested code. So the book is more about getting code in a testable state than random refactoring to get to Clean Code level.


That book is indeed good, at least on a personal level.

It has helped me refine what I consider good code and good effort wrt to testing.


Object Oriented Software Guided By Tests is good.


Sounds like Tableau in Seattle.


> Good lord. Why integration tests?

Because they can find bugs and errors which unit tests cannot.


Funnily enough for the reasons carefully explained in the article. i.e. cost vs benefit, fragility etc.

Nobody disputes that if they came for free then full unit test coverage would be a good thing. The area open to reasonable debate is whether they give the best bang per buck in terms of testing (as opposed to the role of tests in TDD - which is a different kettle of fish: http://www.drdobbs.com/tdd-is-about-design-not-testing/22921... )


This is true. Areas where I have full unit test coverage tend not to have any bugs. What a waste of time to have written all these tests!


Yeah, assuming you think it's not a bug when the backend API that you're mocking is accidentally removed or changes its response slightly.


You seem to be implying that I prefer all unit tests and no integration or system tests. Far from it.

A system test catches the backend API issue. A unit test can demonstrate that my component degrades gracefully if the backend API is not available (because I used a mock to provoke a timeout response).


There are plenty of systems out there that can't degrade particularly gracefully.

That said, the big issue with unit tests is they don't test the glue which is where a lot of issues happen. In languages with strong type systems, this is less of an issue.

Unit tests are great when you actually have complex logic with relevant corner cases, but when you're building web apps, 95%+ of your code is just boilerplate data munging


The problem is that it's almost always possible to achieve a similar or identical number of bugs with lower test coverage, and every test you write has a maintenance cost.

In my experience, the vast majority of test failures (after I've made a code change) end up being issues with the tests themselves, and not with the code change. If you're testing obviously-correct code, that's just more that can spuriously break later and require time and effort to fix and maintain.


I assume you are sarcastic, but cannot figure out what your actual point is. Are you disputing integration tests can find some types of errors which unit testing will not uncover?


I read your comment "Because they can find bugs and errors which unit tests cannot." to suggest that unit tests cannot find bugs.

You probably meant "Because they can find bugs and errors that unit tests cannot." to remove the ambiguity.

tl;dr, I am illiterate; but a little change to the original comment could make it easier.


Except mine had plenty of bugs nobody ever saw because I spotted them while writing the tests...


Unit tests and integration tests are tools, some tools are better at some tasks then others. The idea that a single tool is the only one you need is preposterous.

If you are writing a library write unit test, if your app mostly binds two libraries together unit tests are meaningless, write integration tests.


Functional, integration, and unit are all different types of tests. He's saying write more integration tests, not more functional tests.

For algorithms, I love how I can refactor the implementation and still have 100% confidence in the result if it tests sample input against expected output.


> Why integration tests?

Because they test that you are actually using some available network ports, have the correct database model in mind, didn't mistake the version of your libraries, got the deployment script right, and isn't just restarting everything in an infinite loop?

Or maybe because E2E tests actually test stuff your company cares about, instead of some made-up rules that you've got from nowhere?

Really, if you have unities, you should unit-test them. But your E2E tests should be the ones you really care about, and they should certainly not break randomly with unrelated development. If yours are lasting for 5 quarters, you may be doing something wrong.


There’s a pyramid for a reason. It only takes a couple of tests to make sure that your plumbing connects all the way through. You inspect all the bits when they are going in but in the end you still check that things end up where they are supposed to.

I’ve been doing automated testing for a while. It’s hard to learn, there aren’t many people to emulate. Well, there are people to emulate but the winning strategies are conterintuitive, so your gut fights you the entire time. It took me 8 years to feel confident in tests and my own test code routinely makes me sad because I find antipatterns and I should know better. Also other people copy all of my mistakes :/

I’ve seen a number independent groups two or more years into their testing adventure and the failure modes are not that different. Everyone pretty much makes the same mistakes I do, and it’s frustrating watching everyone go through the pain before they accept that something has to change and it’s probably them.

The best strat I know of for testing is to use inductive reasoning and sampling to verify. If you don’t like the plumbing analogy then this is the Logic version of the same thing. If A -> B and B -> C then A -> C. Only a couple of your tests should verify A -> C and the bulk should check every kind of A [edit] and every kind of B.

If you want to do things like this without making your code not ‘say’ anything (a huge pet peeve of mine, so I can empathize with your concerns) then there are a couple of things to do there. One is an old trick from Bertrand Meyer: split code that makes decisions from code that acts upon them. Beth is split leaves the code more legible, not less.

Most of the boundary conditions are in the decisions. And this code is side effect free you can test the hell out of it with no mocks. Getting every permutation is straightforward and you can count your tests and your conditional branches to figure out if you are done.

Once your code looks like this, adding and removing new rules to the system later is a snap. Even much later.


> There’s a pyramid for a reason.

Sorry, but I am still unconvinced people got that reason correctly.

Let's say you have that A -> B; B -> C pipeline. How many tests you should have on each step (and on the origin) depends completely on how much freedom that steps grants you. It is not something one can say generalities about.

For example, if you are writing an enterprise CRUD application, almost your entire freedom resides on the data mapping. That means that your tests should be equally divided between data validation and data storage/retrieval. And the second can only be done at the integration or E2E levels.

If you are writing a multi-client statefull server (like a threaded web server), the freedom concentrated on launching and reloading it is so large that you can't even reasonably test for it. You'd better design your software around proving this is correct and let testing for less problematic stuff.

My biggest issue with the unity test pushing isn't even that it forces a bad structure into the code (what it does), or that it's pushes for fragile and valueless code (what it also dies). It is that it's wrong at the larger level, oversimplifying stuff and letting people get out of the hook without thinking for themselves.


Because there isn't proper way to write unit tests for GUIs for example.

They can only test parts of its behavior, not everything, and are too brittle to any simple UI/UX change.


Why not just mock the UI drawing library? (I find this a very interesting question.)


Because you would be implementing a 100% of the UI features and still cannot prove if it meets the UI/UX design specs.


I think a lot of that is just the poverty of UI APIs and especially the imperative drawing paradigm. There's no reason in principle why we can't programmatically verify that the basics of the UI spec are fulfilled. If the whole UI layer is just impossible to verify then if we're at all serious about correctness then we should (hyperbolically) stop making UIs until we figure it out.


I can’t help but think this is because nobody writes testable GUI frameworks. You can’t build a castle on a swamp, unless you’re Monty Python.


You've missed the point. Sure, code can always be written better to facilitate testing, but ultimately, each component of the code still has to correctly call/be-called by other components. No class exists in a vacuum. Suppose you have class-A which interacts with class-B. I've seen people put a ton of effort into unit-testing A and B in isolation, and writing very elaborate mocks/fakes/stubs for A and B. Only to end up with bugs anyway because they made a mistake in their mock/fake assumptions. Instead, an integration test that allows A and B to interact directly, and tests their resulting behavior, would avoid all this wasted effort and bugs that come from mocking.

You suggest that instead of writing integration tests, this problem can be avoided by "writing better code". But how exactly would you rewrite the code to avoid the above problem? Declare that A and B should not interact at all, and move all their interactions into class-C? Now you've just given a new name to the same problem: "How do we adequately test class-C?" And once again, the correct answer is to ease up on the mocks and just write some integration tests.


No true Scotsman.

You might be speaking about those cases where people write large god classes, pervasively side-effectful code, zero API design, and a general lack of pure abstractions - then their tests would equally be bad. Tests are code, so one's ability to design programs would reflect on their tests and vice versa.

But a reasonable programmer who cares enough about what they do can still end up with a brittle test suite because of other factors.

Unit tests written in a dynamic language is a major drag on refactoring. It is not so much that we test the internals of the system or get tied up with the shape of code internal to an object. Even if you follow Sandi Metz's wonderful guidelines around testing: i) do not test internals, ii) test incoming queries by asserting on their return value, iii) test incoming commands by asserting on their side effects, iv) mock outgoing commands, and v) stub outgoing queries, you end up with a brittle test suite that is hard to refactor thanks to connascence.

Whenever you refactor your names, or shuffle the hierarchy of your domain elements, you are now left with a thankless chore of hunting and pecking your unit tests and making them reflect the new reality of your code. Here integration tests help you know that your system still works end to end, and unit tests simply remain that one thing that refuses to budge until you pay it its respects.

Unit testing complex views is still a hard problem. There are no well-defined stable "units" to speak of in an ever changing HTML UI. We have snapshot tests, we try to extract simple components on whom we can assert presence/absence of data, and we have integration tests that blindly railroads over everything and make sure the damn thing worked.

But in a different context unit testing is the one true answer. If your system is statically typed (say in Haskell or OCaml), and your functions are pure and compositional, you don't so much worry about mocking and stubbing. You can make simple assertions on pure functions and as the granularity of your functions increase, they end up covering more parts of your system and get closer to an integration test. Static types form another sort of guarantee, the most basic one being that it takes a class of bugs away in form of undefined data types, the system going into invalid states, and of course the clerical mistake of named connascence. We often abuse unit tests in dynamic languages to cover these scenarios, leading to huge test suites with very brittle tests.

I think it is important to call out that the value of unit tests are still contextual - "it depends" like everything in the world, and despite our best efforts, they can become hard to refactor. There is a case to be made for writing integration tests because they deliver more business value at a cheaper price than a pervasive set of unit tests when dealing with a highly effectful dynamic system. This lets us also think about other forms of testing like generative testing and snapshot testing that come to the same problem from different angles.


This is a great comment, not sure why it's at the bottom of the thread. Gets to the core values underlying the main religious beliefs about testing.


Yup, agree. Depending if you are working with a good static type system or with a dynamic language the value of unit tests can vary.

When working with dynamic languages I always end up writing a bunch of unit tests and a bunch of integration tests. I've experimented some with type hinting and static analysis in some dynamic languages but it's not the same as having the compiler make guarantees.


>Unit tests run faster, are written faster, and not only can they be fixed faster, they can be deleted and rewritten if the requirements change.

Honest question: Are most of your unit test failures due to bugs or due to refactoring (e.g. changing APIs)?

Most people I know who do unit testing have mostly the latter (easily over 80% of the time). At that point one feels like they are merely babysitting unit tests.

If unit tests have such high false positives, how useful are they?

Note I'm not saying that it's not possible to write unit tests that are relatively immune from refactors. But it is more challenging and, in my experience, quite rare to find a project that writes them this way.


I’ll be honest, things that don’t cost me mentally or emotionally don’t even register. I probably delete more unit tests than I know and it simply doesn’t ‘count’ because they’re one action and one assertion and they’re wrong so poof.

What I know is that when the dev team can’t adapt to shifting business requirements is a world of pain for everyone. I try to test a lot of business logic as low in the test tree as I can and when they change Always to Sometimes or Never to Twice, I roll with it. Because I know that Never and Always mean ‘ask me again the next time we court a big payday.’

What I remark on are the tests that get deleted when the business says they’re wrong. And those happen but the toll is easy with unit tests. You just ask and answer a different question and it’s fine.


I think many devs build applications only testing manually for a long time, and taking shortcuts (hacks) when something appears wrong. When they later want to write some unit tests, there are no proper units and that ball-of-mud-y code is hard to test. Of course, integration tests are still feasible because they're agnostic of the internal mess.

I've run into this a couple times and noticed at some point that writing unit tests while developing (not necessarily TDD) helps a lot in clarifying boundaries early on, and generally improves code quality.


We should probably have two completely different versions of this discussion for typesafe and type-risky languages, since typechecking is effectively a form of testing at both unit and integration level.


I suspect I’ll be shifting from Node to Rust or one of its contemporaries at some point in the near future. I’ve given dynamic languages a very fair chance, kept an open mind and adopted its strategies instead of writing Pascal in any language, but it has failed to impress me.

I want a statically typed language with reasonable affordances for FP, for the 20% of the code that is dead ugly when forced into object structure.


Is your decision mainly based on type-safety? We converted all of our nodejs to typescript primarily for easier refactoring, but it still hasn't fully satisfied our desire for change. We are thinking of switching to rust as well. We couldn't get past the Go generics argument, and would also prefer something staying towards the functional side. Any other languages you are considering?


Why does everyone rethink a working strategy. Write lots of unit tests that are fast. Write a good amount of integration tests that are relatively fast. Write fewer system integration tests that are slower. The testing pyramid works. He even talks about it in this post, and then ignores the point of it.

You write lots of unit tests because you can run them inline pre-commit or in a component build. If you integration tests are numerous and interesting enough, that won't work. They are better suited to gating feature branch merges. System integration (the actual name for "end to end") take longer and usually gate progressively more stable branches upstream, or nightlies depending on where you are.


Because for certain kind of project the strategy stops working.

I work as QE on a fairly large, ~7 years old project. Micorservice architecture has been attempted. We always merge to master, which means that everything more-or-less is a feature-branch merge. We have too many repositories to count.

And what we learned is, that most of the components we have are just too thin to allow for useful unit-test coverage. Almost everything is [Gui]--request->[Middleware]--request-->[Proxy]--request-->[Backend]->[Database].

In reality, [Middleware] and [Backend] probably should have been a single component, but devs wanted to do microservices, and be scalable, but they didn't really understand the bounded contexts of their services.

All of this leads us to a place, where unit-tests don't tell us much.

On the other hand, we managed to spawn [Middleware]->[Backend]->[Database], and we can run a useful integration tests-suite in ~2 minutes.

So, on one hand, if we desined this better, the good-old pyramid might be a working strategy. On the other hand, if I can get actual services running in minute, and test them end-to-end, I don't think I will bother with true unit-tests on my next projects. I.e. why mock the database, if I can spawn it seconds :-)


So, if I understand it correctly, Middleware and Backend should have been single component since it's one bounded context and splitting it makes one of those feature envy? Is there some benefit keeping these separate or is the cost of change too high at this point? If it's not about features, but more about API, have you tried Consumer-driven contract testing approach?


The reason was, you can have more instances of backend for a single middleware and that should have helped with scalability.

If we had the resources to do the refactoring, we would probably end up with two-three different backends for various contexts, and without the middle-man between the gui and the backends.

On the other hand, the cost of change is probably too high, and most probably this version of our product will be kept on minimum-resource life support.

We are looking for doing consumer-driven testing for our new set of services we are working on.


The units tell you tons. They just don't tell you the whole story. Trust me, try complex distributed systems testing in an environment the underlying services themselves are bug-prone and issuing stack traces all over because of poor fencing/bounds checking/et al.

You may think that way now regarding mocking the database, but where you will find yourself down the line is trying to devise a functional system integration test case for a slightly esoteric condition (deadlocks, timeouts). It's nice to have the scaffolding of a unit testing framework with robust mocks for those situations.

Edit: also, you should consider a gitflow workflow (dev -> integration -> master/stable) and make feature branch off of dev to insulate your master.


>Why does everyone rethink a working strategy. Write lots of unit tests that are fast.

* Because I want to avoid writing more code than necessary.

* Because I want to avoid writing tightly coupled code.

* Because I'd rather have a test that takes 2x longer and catches 5% more bugs. Premature optimization and all that.


I would recomnend not optimizing for less code. Optimize for reading less code.

Unit tests actually tend to favor highly uncoupled code while integration seem to favor more coupling with e2e favoring the most coupling. I believe this is because the higher the level of testing the fewer public interfaces are thought about at lower levels.

As for percentages about speed and coverage, that seems like a bad trade off of 5% gain for 100% slow down. Especially because test time compounds.


>I would recomnend not optimizing for less code.

That is a terrible recommendation. Unless writing less code comes at the expense of readability or coupling you should always aim to write less code instead of more.

>Unit tests actually tend to favor highly uncoupled code while integration seem to favor more coupling with e2e favoring the most coupling.

It's the exact opposite. End to end tests do not even necessarily couple to a language, let alone specific modules. They can be used to refactor virtually the entire code base without rewriting any test code.

That isn't to say that you should only use E2E tests. IMHO wherever there is a naturally loose coupling and a clean, relatively unchanging interface - that is a good place to cover with integration tests.

The worst thing to surround with tests is a module whose API you know you will be changing (which will break the test when you do).

>As for percentages about speed and coverage, that seems like a bad trade off of 5% gain for 100% slow down. Especially because test time compounds.

No, it's an excellent trade off. CPU time is dirt cheap and bugs are very expensive.

Moreover, you can run regression test suites while you eat, sleep and visit the water cooler so the absolute time does not really matter provided it catches bugs before release.


Lots of unit testing can cause tightly coupled code, if the units are too small and/or against internal APIs: The tests are tightly coupled to a piece of code which should have been able to change freely.


I don't know what to say... you have to write good code and good unit tests. I think talking how to do that is a bit outside of the scope here, but mocks for external apis & sensible function complexity metrics are good things.


the point is that blind 100% coverage cargo-cultism is not a working strategy.


Does anyone even believe in 100% coverage?


I never said anything about it, personally.

It's nice to shoot for if you're greenfield. Line coverage != path coverage and blind adherence to line coverage metrics isn't going to guarantee anything.


> The testing pyramid works.

The testing pyramid is built around a lot of assumptions which are often not true.

For example I run our ~10,000 integration tests in under two minutes on our large enterprise codebase. In recent years it has become possible to have fast integration tests.

I've worked on other apps that take 5+ minutes just to start up and integration tests can take hours.

Applying the same testing strategy to both does not make sense.


The pyramid isn't a law, it's just a heuristic that says to have more low level tests that are faster than high level tests that are slower. The unit/integration/system integration division tends to be correct, but isn't always. It just reminds us that there is time/compute scarcity, and to maximize those resources for optimal roi. And yes, a mobile app != a PaaS platform != system software and adapt the principles sensibly to the situation.

Seriously, though, I salute you on those integration test numbers. I assume containers are involved?


I go the complete opposite way.

I've tried various testing strategies over 15~ different companies in all sorts of environments, and unit tests are the only thing that really work (IF you can convince the team to do it...and that's a big IF).

The article starts with a point I agree with: the lower in the pyramid, the cheaper the tests but the lower the confidence level they bring. That's true.

Where I disagree is how much the difference on confidence and cost are.

I can bang out 500 unit tests faster than I can do just a few E2E tests in most large apps. They require almost no trial and error, no real engineering (I feel strongly that abstraction in unit tests is bad), and all around are so easy to write, I don't mind if I have to toss out 150 of time when I make a significant refactor.

E2E tests are amazingly brittle and require a careful understanding of the whole system. They're impossibly expensive to write. They're the only thing that tells you that stuff works though. So you want at least a few of these.

Integration tests are just flat out awkward: you need understanding of a significant portion of code you did not write or touch, they often require complex fixtures (because your test will go through several code paths and might depend on a lot of arguments), they're slower (because a lot of code run), and while you don't throw them away when changing implementation details (unless they involve side effects), you still throw them away when refactoring or changing public interfaces. I've worked with a lot of people who were very vocal about these being so much better, then in the same breath complain that they spent all day writing integration tests.

There's an exception here which is acceptance tests for libraries, especially when doing a full rewrite: the tests that tell you public interfaces used outside of the current context work (as opposed to public interface of objects used in the implementation). Eg: if I was to test lodash or react, that's how I'd do it.

Unit tests to be are about a lot more than "is this change breaking my code". And if that's all you care about, you're missing a big part of the point.

If you have 3 units, A, B and C. A calls B which calls C. If you have a test for A in the context of B, a text for B in the context of C, and a test for C, and they all pass, you know that A + B + C will work. But when writing the tests, you only had to care about itty bitty tiny pieces of code, which made things super cheap.

Then you get other huge benefit: the quality of the entire code base is higher (a side effect of it having testable interfaces all across), the reasoning behind each piece of code is explicit (no one wrote a function that work and they 're not sure why, else the test would be very hard to write), you automatically have a document representing "intentions".

And yes, if you change a module, even if its not expose to your customers, the public interface of that module has tests and the tests will break. But they usually take nothing but a few minutes (often a few seconds) to write. They're cheap enough to be disposable.

And once you have 80%ish unit test coverage, you actually have a very high confidence level. I've gone through major refactoring of multi-million line of code apps with almost no bugs on pure unit tests. You't think the 20% of untested code would be a source of bug, but statistically, that's just not how it happens.

In term of person-hour to ROI, pure unit tests just straight up win out.

The reason software engineers fight back to hard against them is that they're brain dead and repetitive to write, and they can't resist overengineering. "This is such a simple test for such a simple piece of code, why should I test it?!". That's the point. All unit tests should be like this.


The second group I worked with that was earnestly interested in mature testing developed the 5/8ths rule.

To move a test one level down the pyramid, it takes about 5x as many tests. But the tests run 8 times as fast. So moving a test down takes more than 35% off the run time, and it fails the build minutes sooner. If you drop it down two levels it's 60% off the run time.

Interesting enough on its own, but maintaining those tests after one requirements change, plus the cost of rewriting them in the first place, is less work than the cost of maintaining the original tests. We didn't come up with a number for this but the difference was measured in man-days and missed deadlines about once a month, and we were convinced by the evidence.

I also agree with both your 'braindead' comment and your 80% estimate. The big payoffs come between 75% and 85% and above 85% you start getting artifacts. That 'data' distract more than it helps.


Yup. I think one big issue is that a E2E or an integration test is useful on its own, while a single unit test is almost totally worthless. You don't have confidence of anything until at least 50% (and at 80% you have almost perfect confidence).

So when people get started, especially on an old code base, they feel its pointless and doesn't pay off. Can't blame them, I suppose.

Good that you bring up build time. I forgot to mention that. We have repos with thousands of tests where the whole suite runs in <1 minute and gives us very high confidence (actually the only other tests we run on that repo are visual regression tests for CSS, and even E2E tests don't catch those issues...). During that time I'm watching other teams waiting 20 minutes on their integration test suite. Nope nope nope.


One of the most transformative things I've come across for how to structure and test code has been Gary Bernhardt's talk on Boundaries [0]. I've watched it at least ten times. He also has an entire series on testing where he goes deeper into these ideas.

In this video, he talks of a concept called functional core, imperative shell. The functional core is your code that contains your core logic that can be easily unit tested because it just receives plain values from the outside world. The imperative shell is the outside world that talks to disks, databases, APIs, UIs, etc. and builds these values to be used in the core. I'll stop there—Gary's video will do 100x than I can do here :)

[0] https://www.destroyallsoftware.com/talks/boundaries


I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.

As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.

The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.

Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can't test your codebase reliably with them and they are also very costly to write, run and maintain!

The testing pyramid exists for a reason! It is a product of countless hours of research, testing and head scratching! You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that's why you chosen writing mostly integration tests.


Sounds good in theory. In practice there is one problem with having integrations tests only. The test are generally simple: they pass or they fail. A unit test tests just a small functionality, so when it fails, it's quite easy to find out the problem. When an integration test fails, then we can spend hours debugging the whole stack of layers trying to find out the real problem.

I had this situation once. Every failing integration test ended with hours spent on writing unit tests for all the places used by the test.


From my experience an integration test failure that requires significant efforts to investigate can only be covered with unit tests after one knows where the problem comes from. One cannot realistically write a bunch of unit tests and expect them to cover the problem unless one already knows about the problem.


It's called shotgun unit testing.


For me, one of the biggest issues with integration tests is the code coverage numbers mean nearly nothing. I've seen an "integration only" tester proudly display his single test with 90% coverage. I asked him to run it again and it was 2% because a condition changed.

So this means that for all the branches your code can take, an integration test is taking one specific one at each point all the way through for that test. All over branches, through the entire call stack are unverified.


Is there solid evidence to back up some of the assertions that have been made about testing?

It feels like an area where lots of people have opinions, and there are not much in the way of facts.


There are very serious books about software quality with actual data, but it's much easier to tell each other anecdotic experiences on the internet - in a weird mix of bragging and strawmen arguments. That's how our field is stagnating.


Microsoft put out a study where they had done TDD or at least extensive unit testing. I don't recall the numbers but development time took longer and there were a lot less bugs. Which is what I would have expected.


I think the answer would be it heavily depends on what you are doing. if you are creating a library that operates on a protocol, unit tests are necessary / extremly important.

if you are writing a ERP where a lot of your code NEEDS to operate WITH the database you are better of with integration tests, because mocking away the database would lead to so much bugs, especially if your database is extremly important (and not just a dumb datastore)

Edit: having any tests is always better than having none.


The puffing-billy [1] library is awesome, and has changed the way I write integration tests. I also use VCR [2], and now my entire application (both backend and front-end) is wrapped with a proxy that records and replays every request. I can run all my tests once using test Stripe API keys, a test Recaptcha response, or any other external services that I want to test. I don't have to mock anything, which is nice. Then everything is recorded, and I can run all my integration tests offline.

I've also really enjoyed using stripe-ruby-mock when testing specific webhooks, jobs, and controller actions. I don't always aim for 100% test coverage, but I try to write a LOT of tests for any code that deals with billing and subscriptions.

Ooh, I've also been enjoying rswag [4]. It's quite a cool idea - You write rspec tests for your API endpoints, and the tests also serve as a Swagger definition for your API. So when your tests pass, you can use the output to generate documentation or API clients for any language.

[1] https://github.com/oesmith/puffing-billy

[2] https://github.com/vcr/vcr

[3] https://github.com/rebelidealist/stripe-ruby-mock

[4] https://github.com/domaindrivendev/rswag


I think the testing pyramid reflects a false correlation — it seems to assert that higher up the pyramid tests are more expensive to write/maintain and longer to run.

In reality the execution time of a test says nothing about how hard the test is to write. Sometimes a very fast to execute unit test can be much harder to write/maintain than a longer running test that avoids mocking an api and perhaps utilizes abstractions in the test definition that are already written to support the program’s features.

I think test suite execution speed is the real metric to focus on for most projects — to get the most value, test suites should accelerate the time to useful feedback. Write tests in the simplest way that provides useful feedback into the behavior of the system and runs quickly enough that you can receive that feedback with low latency during development.

I quite like tools like jest and wallabyjs that use code coverage data to figure out which tests to rerun as code changes — means you can have a test suite that includes slow(ish) to execute tests but still get feedback quickly in reasonable time as you make changes to the code.


> to get the most value, test suites should accelerate the time to useful feedback

Well, they should also optimise the usefulness of the feedback they provide. Typically, tests higher up the pyramid are also more brittle (e.g. end-to-end tests might fire up an entire browser and Selenium), and thus are more likely to fail when in actuality, nothing is wrong. That's an additional reason for limiting the number of those tests.


Brittle tests seem not useful in general though aren't they?

I'm not sure its necessarily true that brittleness must correlate with height in pyramid or execution time -- in my experience brittleness correlates with selenium more than it does pyramid height (that's a statement about selenium more than it is a statement about any particular category of testing pyramid).

Its possible to write very useful non-brittle tests using something like headless chrome ...


No they're not.

But yes, Selenium is brittle. That said, Google engineers actually did some investigation into this, and although I think their methods were probably a bit heavyweight, they did conclude that it's mostly RAM use that leads to brittleness.

[1] https://testing.googleblog.com/2017/04/where-do-our-flaky-te...


Interesting thanks for the link!

I’m curious how many tests were in the small size range for that chart which provides evidence to show the size-flakiness correlation holds in tests that use tools associated with higher than average flakiness...

I’m also feeling like I want to have more clarity around the mechanism for measuring flakiness — the definition they use is that a test is flakey if it shows both failing and success runs with the “same code” — does “same code” refer to a freeze of only the codebase under test or also a statement about change to the tools in the testing environment ...?

I wonder what the test suites for tools like selenium/WebDriver look like ... do they track a concept of “meta-flakiness” to try and observe changes to test flakiness results caused by changes to the test tooling ...?


Yeah, good questions, the post leaves some to be desired. And meta-flakiness tooling actually sounds like it could be really useful!


Integration testing is especially important when talking to a database. People seem to like mocking the data, but that completely misses the subtleties of how databases actually work, including aspects such as concurrency, transaction isolation levels, locking and such.

There should be a few well crafted tests that modify the database from several parallel threads, and afterwards verify that no invariants have been broken. This is pretty much the only opportunity to catch a race condition in a controlled environment.


This is the part I don't get when people preach the gospel of ultra-isolated test suites where nothing talks to an external system. The most nontrivial parts of your code, the most likely to break, are those that talk to external systems.

Building a mock of that system sophisticated enough to sufficiently capture even a significant fraction of the "real" failure modes is just as, if not more error-prone and daunting a task.

Totally isolated tests make it impossible to test the most failure-prone parts of your code in anything approaching a satisfactory manner. Maybe I'm just misunderstanding people's advice but it drives me crazy when I see that.


A lot of the ultra-isolated test advocates seem to come from the kind of development world where you can exercise most of the value without needing to interact with external dependencies. With aggressive enough mocking, you can deceive yourself into thinking that any application is in that domain.

I know I have done it, and the application worked PERFECTLY until I hooked it up to a real database.


As always it depends. There are projects where integration is the hard part and others where the business logic is impossible to get right without unit tests.

Unit tests also have the added benefit that they tell you exactly the reason they failed (because there can only be one) whereas integration tests can fail for multiple reasons.


Yeah, but usually an integration test tells you what line it failed on, which gets you fairly close to knowing the exact reason it failed.


Just reading the comments, I agree with the common sentiment that you should have more unit tests than integration tests, but I have come around to the way of thinking that if I only have time to write a few tests then I would rather write E2E tests. This way, at the very least your entire stack is being exercised, and you have a way of ensuring that the happy path is passing consistently, which is the most important flow for an application (even if I'm personally more interested in keeping other flows sane). While I prefer unit tests due to their simplicity, speed and the speed at which they can aid debugging, these days I will only implement them after I have added a few E2E tests.


Behavior driven development (BDD) and test driven development (TDD) enable much faster coding when done well in my experience, and that includes full coverage fast unit tests, functional tests, benchmark tests, and integration tests.

Unit tests are worth their weight in gold for quickly finding issues, both in our team's code and especially in cases of subtle changes among language releases, or unanticipated input changes, or dependency changes that are supposed to work but don't.

IMHO unit tests lead to better functional approaches, better long range maintainability, better security, and much better handling of corner cases.


I think the main selling point for me on TDD was breaking the boring build loop where I would have to wait for the project to compile, the webpage to load, click on all the buttons to get to the bit I'm testing, and then finally see if my code worked. Then repeat as many times as it fails.

With TDD testing functionality is always just one button click away - I actually have fun again at work.


For large code bases in dynamic languages I think the advantage grows larger over time. Even a modest amount of test coverage will allow you to build new features without breaking existing features.

It's a lot faster to catch bugs when the tests run in your local development environment or in the CI pipeline than to wait until Q/A (hopefully) catches them in end-to-end integration testing and sends it back to you for rework.

The other advantage I've noticed is that the tests can serve as an additional form of documentation. New team members can look through the tests to see how code is supposed to work from the examples there. For some web API projects I've even been able to generate the documentation from the test cases.


> enable much faster coding

Faster than what?


I disagree with this article.

Software projects can be extremely large and have wildly different requirements. Software that needs to operate at high scale and require high reliability will have differenet requirements for say - a web application that has low traffic.

I think making rules of thumbs like the title of this article defeats the purpose of one of the essential tasks of being an engineer - making good tradeoffs between different approaches to solving problems. It's not hard to see that some projects will require more unit testing, some may require more integration testing and others may require more of both.


The problem with this kind of advise is that projects are all very different - there is no one fit way to test it.


This is the only true statement that can be made about this whole integration-/ unit testing debate but it will never be popular because it doesn't seem to be simple enough.

Most people like black and white guidelines. 'Unit testing is good, integration testing is bad', something along those lines. Simple to remember, simple to apply. Unfortunately it doesn't match reality.

The truth is, every testing strategy is a tradeoff. Integration tests are great at catching regressions and verifying business requirements, but they can take a lot of time to construct, tend to be slow and don't give you much clues where to find the error when they fail.

Unit tests are fast, easy to build and when they are well written they will point you right to the point of your code that is wrong. However, the more specific they are, the more they tend to test your implementation choices instead of real business requirements.

But even the above is not always right. When you are building a library, you might be able to create unit tests that test your api and then you have the best of both worlds, your unit tests do actually verify your business requirements. On the other hand, it might be that you are building an application which doesn't depend on a lot of data and which can be orchestrated pretty well in a test set up, and therefore integration tests are actually easy to build; in that case you'd like to focus more on integration tests.

Tl;Dr; there is no silver bullet, in the end the only right thing to do is let your test strategy depend on the characteristics of the project.


Absolutely - this is the key. There is a huge difference in the software for a game engine vs a website vs an MIS report vs a trading system vs an autopilot. You can't blanket statement which testing technique is better for all software.


Why are most people using huge hero-graphics (495kb?!) for short blog postings nowadays?

On topic: Testing gets worse if the codebase which needs to be tested is garbage. My experience with other developers who learn testing don't need to learn how testing works - they need to learn basic development rules: Components, loose coupling, dependency injection, ...


Integration tests are very important and in many cases mandatory. I'd argue though unit tests with good coverage are more important. The latter helps you make sure your application has the correct output. Nothing worst than having a piece of software you think is working and silently introducing errors. The former (integration tests) makes sure your application starts and probably works in some scenarios.

With docker and containers, integration testing is easier than ever.

I've written a very simple integration testing tool (a quick hack to be honest) that executes commands, checks their exit code, the output through regexp, etc and produces a nice html report.

https://github.com/landoop/coyote

Originally written for testing some configuration tools and scripts, we quickly found a use for it for our Kafka connector collection (Stream Reactor). Our connector integration tests are at github as well:

https://github.com/Landoop/kafka-connectors-tests

Obviously unit tests make sure that the connectors work as expected. Integration tests catch simpler errors that can be showstoppers. For example an internally renamed configuration option that didn't trickle down to the configuration parser, class shadowing issues between the connectors, unexpected errors in logs that the developers should check out etc. As the tests are written by people who don't do development, they also expose problems in the documentation. In some cases they provide us an easy way to quickly run locally a connector and catch some issues manually, like extensive CPU usage or way too much logging (e.g a function with a log statement was inside a loop that run hundreds or thousands of times per second).

It gives us confidence in what we ship with every release.


I am currently in a project where we do not use TDD, even remotely, but we do have almost 100% coverage, since we need it for certification purposes.

These tests are fairly easy to write, actually, with a test framework that helps mock everything automatically.

Then we have some internal tools written that makes these tests worth a lot less, we produce a lot of the test data and then just record the outputs as "correct". We have no real idea if they are correct, but at least the tests as they are now works as regression tests. This is what happens when you realize you need unit tests for tens of thousands of line of code in a few weeks' time, instead of writing the tests when you write the code.

There are of course testing on higher levels for this industrial system, multiple levels even and I have to say that the unit tests are fairly useless in comparison.


Why did I just have to scroll three pages to get past a pineapple. Nobody wins. It takes extra effort to add. There is no ad revenue. The reader gets annoyed. WHY?!?!?!


On a hobby project recently I've only written backend integration tests. I found these to be extremely useful though since they touch directly or indirectly to all the critical parts of the software, so if something's broken it will eventually be caught there. Also since it's relatively high level there's also rarely a need to change them whenever I change something to the backend. All in all it's definitely a time saver, both in terms of catching bugs and maintaining the tests.

I think there's still a value in detailed unit tests but mostly for library code, when you want to test each function properly with various inputs.


I agree very much with this. I'd add one thing, adding tests and testing your code are not the same thing. You should write tests, mostly integration tests, testing public well defined boundaries of your class, your component, your service. Mock only IO if needed, but not other classes. But also tests the rest, but no need to add a test for them. Just run the code, try out the private functions, make sure they work.

And also go read Testivus: https://www.artima.com/weblogs/viewpost.jsp?thread=204677


I trust my own sense of when unit-tests and integration tests are appropriate.

If I'm working on a bit of code that has actual internal logic, then I'm happy to write unit tests so I can iterate on the test with scenarios while I code.

OTOH, if I'm writing a glorified CRUD pass through, api > model -> db-entity > do a thing > return result. There is nothing to unit test. Writing a few integration tests that call the API gives me much more confidence that everything is hooked up correctly than mocking a bunch of shit and asserting that a method was called that was obviously called.


The one thing that I didn't see mentioned is the fact that you can test very specific branches of code with higher-level testing, but the same cannot be done in reverse. In fact, higher-level tests, especially tests driven with real--world data, routinely execute branches of code in ways that are unforeseen by the developer and unlikely to have been tested at a lower level. I suspect that fuzz testing will some day remove much of the need for lower-level testing.


I started writing a response to this ... but it got a bit too long and became my first ever Medium post: https://medium.com/@danrspen/write-tests-a-sensible-amount-o...


I don't practice TDD (see Norvig, Jeffers, and Sudoku) but I do like unit tests. I also like integration, e2e, property, mutation, and fuzz tests. I like proving specs with TLA+. Sometimes type theory proofs are useful. There's a lot of ways to improve quality. Some of those I've only been able to toy with personally, though, because the dirty secret of software engineering is that for most things we don't need very high levels, we just need most customers to not be mad. So that leads to articles like this, where people claim their experience showed the most bang for buck with a certain approach, despite not always even trying other approaches...

> One thing that it doesn’t show though is that as you move up the pyramid, the confidence quotient of each form of testing increases. You get more bang for your buck. So while E2E tests may be slower and more expensive than unit tests, they bring you much more confidence that your application is working as intended.

In my experience this is wrong. Working with an inverse pyramid, you'd think confidence would be high, bugs few. It's the exact opposite.

> just stop mocking so much stuff

Here here. But this gets into subtle arguments over "is this really a unit test if it's using RealFoo even though we're trying to test Bar, such that if RealFoo breaks this test will also break but not because of Bar?" I'm not too strict on my definition of unit test, my best attempt is something like "relatively small, isolated from the broader module/library/application, runs fast, easy to find root failure when test fails, avoids testing the implementation rather than behavior (often hard), and asserts something." It leaves open the possibility for technical 'integrations' but there's a pretty big space of possibilities between a minor integration to avoid mostly useless mocking and suddenly requiring the whole application server to have started up before you can do anything.


Reading all the comments reminds me why I really don't like working in large enterprise teams. People argue a lot about semantics ("what is a unit test?") and the only right way to do things but there is almost nothing practical to learn from.

I have found that in some projects unit tests are really easy to write and helpful but in others you spend more time on writing mocks and dependency injection than writing stable code.

I don't even know what I want to say exactly other than that we should focus more on practical solutions to real problems and less on debating semantics.


I feel like there's definitely an art to mocking. Rather than mocking out the module being called (A calls B, A_test mocks B), you want to mock out its dependencies (A calls B calls C, A_test mocks C). But if you swap out said dependency, now you have to go update other modules' integration tests to mock out the new service.


I think this is poor advice because it doesn't give context.

I think unit tests are good for testing logic, and integration tests are good for testing functionality. Lots of complex logic? Then write lots of unit tests. Got a service that just wraps a database? Then you're going to want to write a lot of integration tests.


Does anyone have any good resources on writing front-end integration tests? I've had nothing but terrible experiences with selenium. And I recently tried Nightmare (which is Electron-based), but it wasn't much better.


I too have had selenium nightmares but have landed someplace pretty good. Here are some quick tips. YMMV.

Don’t think “integration” — think “full stack” these will find configuration and connectivity bugs more than business logic bugs. These can’t be your only tests.

They need to run as part of your CD/CI pipeline automatically, otherwise, they won’t get run and will decay from disuse.

Headless browsers (HTMLUnit and PhantomJS) are easier to work with than “real” browsers. Haven’t used Chrome headless yet.

Front end bugs are often fiddly and visual. Screenshots + human review can be a cost effective supplement to manual testing, but can never replace it.

Good error logging and reporting is also key. If your front end tests break something, having the backend tell you what broke will save you time.

I tend to keep these “full stack” tests to happy path scenarios, as they are slower to write and to run than lower level integration tests.

Good luck.


I have done a little with NightmareJS and really liked it. A bit of syntatic sugar over Mocha with the option to drop down when required. Very succinct, I thought, and easy to start with. I, too, prefer to avoid Selenium, though.


Unit tests force you to think about good design. Integration tests don't. If you only use integration tests, you'll end up with a big ball of mud.

Certainly I wouldn't want to touch a codebase by the author of this post.


This is exactly the wrong way round. Only by thinking about interfaces can you avoid a big ball of mud, and interfaces are what integration testing tests.


Clean architecture: https://smile.amazon.com/Clean-Architecture-Craftsmans-Softw...

Anything that touches the real world should be as small as possible.

I've been writing code using tests for about 4+ years and I now can't think of writing code any other way.

I would be scared of refactoring. Also I'm testing the code anyway - why not write it down so it gets done every time?

Run integration tests too for sanity - they're testing the integration of things.

To me: coding without tests is like going caving without a flash light. You don't really know what your code does until you run tests. Your confidence rises when more code is covered.

No it isn't perfect - but not writing tests is not better.


The problem with always relying on testing is you lose your ability to create without the test crutch. Everything is red/green and your brain can get a lazy. Balance is the best.


What is science without tests? I say coding is similar.


> You should very rarely have to change tests when you refactor code.

I'm going to say this is poor, if not outright dangerous, advice. I'd argue the opposite: every time the code changes, it would be great if a test somewhere broke.

Hell, if the tests originally were for a gigantic class with a ton of mocks but the code changed so there was lower coupling and the dependencies were injectable or separated in different modules, the tests should be changed.

> even a strongly typed language should have tests

Indeed, yet I've seen so many people in the strong typing "camp" mentioning "the compiler catches some bugs right away!" as an advantage, which always makes me think "... and?"

> It doesn’t matter if your button component calls the onClick handler if that handler doesn't make the right request with the right data!

... But there should be another unit test for that handler, checking that it's using the right request with the right data.

In general, I agree with some points and disagree on others.


> every time the code changes, it would be great if a test somewhere broke

So, it doesn't matter if your code is correct or not?

> "the compiler catches some bugs right away!" as an advantage, which always makes me think "... and?"

And that leaves more time to design things right and test the stuff that actually matters.


> So, it doesn't matter if your code is correct or not?

I'm going to need you to walk me so I can see how that part you quoted implies that the code being correct doesn't matter. I honestly don't see the correlation.

> And that leaves more time to design things right and test the stuff that actually matters.

I'm not gonna get into a religious flamewar; if you prefer static typing, all the power to you and I'm not interested in convincing you otherwise.

However, I fail to see how having to write "int", "str", etc before or after a variable name impacts the design process in any way.

As for testing:

- A test in Go:

    func TestAvg(t *testing.T) {
    	for _, tt := range []struct {
    		Nos    []int
    		Result int
    	}{
    		{Nos: []int{2, 4}, Result: 3},
    		{Nos: []int{1, 2, 5}, Result: 2},
    		{Nos: []int{1}, Result: 1},
    		{Nos: []int{}, Result: 0},
    		{Nos: []int{2, -2}, Result: 0},
    	} {
    		if avg := Average(tt.Nos...); avg != tt.Result {
    			t.Fatalf("expected average of %v to be %d, got %d\n", tt.Nos, tt.Result, avg)
    		}
    	}
    }
- The exact same test in python:

    def test_average():
        for param in [
        	{'nos': (2, 4), 'res': 3},
        	{'nos': (1, 2, 5), 'res': 2},
        	{'nos': (1,), 'res': 1},
        	{'nos': (), 'res': 0},
        	{'nos': (2, -2), 'res': 0},
        ]:
            assert average(*param['nos']) == param['res']
In a real situation, tests like this one need to be written and the fact that in Go we're specifying types doesn't change a thing, so I'm not convinced that not writing the types means that in a dynamic language I have to test stuff that doesn't matter. For that matter, I'm having a hard time imagining a scenario where the type wouldn't be tested anyway so back to my "... and?"

But, again, I'm also not trying to convince you otherwise. Different approaches for different folks.


> I'm going to need you to walk me so I can see how that part you quoted implies that the code being correct doesn't matter.

Well, that quote was your entire sentence. It wasn't out of context. So:

> every time the code changes, it would be great if a test somewhere broke

There isn't anything anywhere about the code being incorrect. Thus it is irrelevant.

About the types, their entire point is that they save you from writing the tests. What the compiler proves, you do not test. If your compiler won't prove anything, you'll have to write all the tests.

As an example, you forgot to test if `'nos': ("", None)` yields the correct error.


> As an example, you forgot to test if `'nos': ("", None)` yields the correct error.

It isn't in the go code either.

> There isn't anything anywhere about the code being incorrect. Thus it is irrelevant.

Why, exactly? I'm not following your inference there; again, walk me through that one.

Let me be more explicit: what exactly did you think I meant with that line? Again, I don't see how it implies that "it doesn't matter whether the code is correct".

> About the types, their entire point is that they save you from writing the tests.

Ok, this is getting circular so I'll just ask you to give me an example of a test that would be absolutely required in a dynamic language but not in a static one.

Mind you, that isn't going to convince me one way or another in the "static vs dynamic" flamewar since I subscribe to the idea that more tests is better. I'm asking mostly out of curiosity.


The original tweet is a takeoff on the Michael Pollan maxim right? "Eat food. Not too much. Mostly plants"


This is my version: Don't stop writing tests. Not until 100% coverage. Mostly unit.

Especially if you are developing a library. An untested code branch is a ticking time bomb, in my book.

An application for the end users indeed does benefit from integration tests, a lot. The problem is running them efficiently. If they take an hour to run, nobody will care to analyse them.


Analyse? Shouldn't tests just pass?


After a point they just fix them to make them pass. Integration tests are really easy to cheat when the people writing them and using them are both really demotivated.



Testing.

My current theory for why there is so much confusion about testing is because developers are not often taught the difference between specification (theorems) and implementation (proofs) and why you want to separate the two.

It seems like business and investors want us to write code, more of it, and faster. Value, value, value!

So my question is: what is valuable?

Do you value your customers' data? Do you value their time? Their safety? Your brand and reputation? If you answered yes to any of these (and the other questions I may of forgotten or elided) then you should be encouraging your developers to write specifications.

One such specification, and a weak one that developers can write and maintain on their own without involving stakeholders, are unit tests. It's a weak form of specification for a library/module/component you would like to have because it specifies properties and behaviors by example. The spec gives an example of use and the expected outcomes. A good test is a verbose Hoare triple: given some context, when this method is called, then this result is expected.

Your implementation of that specification is what you're after and it's the reason why you should write the tests first. Writing tests first has little to do with productivity or design. You should write them first because your implementation should prove the theorems in your specification. Theory first. Proof after.

Sometimes you have to revise your theories after attempting the proof.. but that's a story for another day.

But we can write better specifications! Unit tests are weak because each test only demonstrates a single expectation. It doesn't quantify over the space of possible inputs! If you want to write a better specification for your parser or transformation try property based testing. Use a library like QuickCheck. You give it a theorem that quantifies over the input space of your function under test and it will find out for you if your proof holds (limited only by how many examples you want to try... say 10000). It's not a proof that your implementation is correct but its a much stronger guarantee than a suite of unit tests and it doesn't cost you that much more to use it.

Integration tests though. I'm not sure if I agree with the advice. You should definitely write them but they can become a time sink and cost quite a bit to run if you need to test a non-trivial system. Where they lack is at level of quantification and this is where the real errors lie in software systems with many components. Integration tests, like unit tests, are proof by example. You write a specification for how a given configuration of components should interact and you supply a context and run the test and see if your assertions hold. It's wise to be aware that it won't catch most safety errors and it will never catch liveness errors.

Safety errors having to do with correctness of expected values over the lifetime of a computation.

Liveness having to do with maintaining invariants over the lifetime of a computation.

So integration tests are useful, do write them, but I wouldn't advise spending most of your time on them if you're dealing with more than 2 or 3 components.

Once you have messaging and co-ordination you're going to want a stronger specification and that would probably look more like a theorem written in a language that can be verified by something called a model checker. Something like TLA+ is making good progress breaking into industry.

... this is getting long. To summarize: developers should be taught and given time to write specifications. Most errors in software arise from poor, incorrect, or missing specifications. Weak specifications are better than none. Think of tests as specifications, write them first, and prove your software meets those specifications. Then you can change your implementation as you refine your specification and write better, faster, more reliable software.


> Liveness having to do with maintaining invariants over the lifetime of a computation.

Nit: liveness is about reaching 'good' states over the lifetime of your computation. If your invariant is violated, even if it's a multistate invariant, it's still a safety error. Liveness would be something like "x is eventually true", or "the program always terminates."

It's not just specifications we need. We also need better tests. "Better" here doesn't mean "integration" or "acceptance", it means things like "fuzzing through contracts" or "comparing snapshots" or "rules-based state machines". Testing is vast and we're not very good at it.


> I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea. The problem is that you get diminishing returns on our tests as the coverage increases much beyond 70%...

I call bullshit.

I work on V8, on JITs and WebAssembly. 70% coverage for these code bases would be absurdly low. We would never ship code that is that poorly tested, and you shouldn't either.

> You may also find yourself testing implementation details just so you can make sure you get that one line of code that’s hard to reproduce in a test environment. You really want to avoid testing implementation details because it doesn’t give you very much confidence that your application is working and it slows you down when refactoring. You should very rarely have to change tests when you refactor code.

What in the. serious. fuck. Of course tests test implementation details. Because _implementation details_ are where the goddamn bugs are.

> ... Maintaining tests like this actually really slow you and your team down.

That's the whole _point_. It slows you down in the short term but it keeps you from experiencing a full-on system meltdown when everything seems to be breaking at once.

Please don't follow the advice of this. It's total crap.

If you've never worked on a system that has survived more than 3 years, sure, go right ahead, run against the wall. But when you work on a system survives 5, 10 (V8), or 20 years (HotSpot JVM), then you really, really want to have good tests.


> What in the. serious. fuck. Of course tests test implementation details. Because _implementation details_ are where the goddamn bugs are.

Testing is about testing inputs and outputs, not implementations details. Bad testing checks if you called this function or if you accessed this data. Good test checks that for a given input you get the expected output.

> Please don't follow the advice of this. It's total crap.

The general tone of your comment is non-constructive, lacks accuracy and appeals more to your status as "working in V8" than to real reasoning. It is a shame that it is so high in this thread. :(

You may have some good points, but just by reading your comment I can't get them.


If you have Google money and work on compilers, surely tests are more important than for say, a B2C app with lots of UI.


So many developers are so myopic that they can’t imagine a scenario different enough from their lived experience that people would want to do things differently.

When people say “in the real world” they really mean, “in my insular world”


I totally agree with you on this one! The tone is absolutely immature and frankly disgusting. I would never hire a person with this kind of attitude, frankly because it is the ultimate creativity-killer in a team.

This type of attitude is creating a culture where everyone is "to scared to learn" by being to scared to comment on things, or even to ask a simple question.

So sad to see this...


Sorry for the harsh tone--probably an overreaction--but I think creativity is not the primary variable to optimize for in software development. I recognize that this opinion is a product of where I've gotten stuck in the software stack.

However the OP's advice was basically "don't write tests because they slow you down." They even said that 100% code coverage was "a really bad idea". The OP's attitude was flippant, dangerous, and IMO, stupid. Don't follow this advice.


We're all human and overreact, so no worries. I agree Creativity may not be primary always, but it definitely needs space to co-exist with the more logical-perspectives. After all, at some point in time people would not even be creative enough to come up with solutions like TDD etc.

Having said that, to be more OT. I agree with you regards the code coverage. However, the quality of Tests should be raised, as i would rather be in a position where people are aware that stuff can break and be on their toes, then to have false sense of confidence with 100% code coverage using poorly written tests (for example, tests so complex they need testing on their own).

And sometimes, it is more important to get an MVP out there, and collect the users feedback then the actual tests.

My point, its a dynamic world, and it doesn't hurt to listen to each other more!


The author of the post agrees with you that library code is different:

> I should mention that almost all of my open source projects have 100% code coverage. This is because most of my open source projects are smaller libraries and tools that are reusable in many different situations (a breakage could lead to a serious problem in a lot of consuming projects) and they’re relatively easy to get 100% code coverage on anyway.


The thing is, everything that survives becomes a library, as someone will always find a way to try to reuse it.


Let them write the comprehensive test suite, then. Your reasoning is a classic context for YAGNI.


> I work on V8, on JITs and WebAssembly. 70% coverage for these code bases would be absurdly low. We would never ship code that is that poorly tested, and you shouldn't either.

Tell that to a gamedev company.


As a counter, in this thread [1] a Rare employee says they follow TDD and it's revolutionised their development process.

[1] https://news.ycombinator.com/item?id=14802333


A CRUD app being written for the nth time does not require the same coverage as something as complex as the JVM.


No offence, but you work in a bubble of sorts. In enterprise we are absolutely expected to run against a wall - preferably fast.


The bubble happens to be at the bottom of everything everyone runs. If we--or kernel folks for that matter--applied the advice of the article to our development practices, our system meltdown would be your system meltdown.

Sure, you have requirements from management. So do architects and engineers for building bridges. Yet they still have a duty to build bridges that don't fall down.


OK, so don't then. You are still in a bubble, important or otherwise. You can't expect me to throw out the service manual for my old Volvo because NASA wouldn't build a space probe to those standards.


Are you suggesting that the Linux kernel is unit tested? Last I checked, I couldn't find them. And discussions online say the same (e.g., https://news.ycombinator.com/item?id=9543336 and https://news.ycombinator.com/item?id=9544306). Kernel bugs tend to show up in userspace.

Fortunately, integration-oriented projects have arisen more recently such as https://kernelci.org/ and https://github.com/os-autoinst/openQA/

Anyhow, it would make me a bit sad if the v8 team is writing unit tests tightly coupled to the implementation. I've messed around with the codebase and I didn't see tests like that - could you point to some?



The ones I glanced at appear to test output without much mocking, which isn't as tightly coupled as unit tests commonly end up.


Running against the wall in the enterprise is what management expect but in my experience it’s down to developers “holding their nerve” to use the proper process (unit tests, PRs etc).

It’s not even necessarily experience because I see the experienced devs do this all the time. But your good developer will know that when the feature is so important that it needs “rushing”, aka straight to prod or, skip QA etc then that’s exactly the time to stand firm and write those unit tests instead of throwing crap over the wall and making it next weeks defect.


Enterprise software being known for its code quality.


I find a lot of enterprises follow TDD.


Oh yeah, some are extremely vulnerable to predatory TDD evangelists. You won't find many functional programming ones though.


"Eat food. Not too much. Mostly plants."

-Michael Pollan


I knew the title sounded familiar. Thanks for reminding me of where it came from!


So much for "deep" & "profound".


What? Does it take away from it somehow that it's a twist on someone else's famous phrase? I immediately recognized the reference despite hardly being a Pollan-head, so I doubt anyone's idea was to conceal the connection.


The title, by the way, is a play on Michael Pollan's famous essay "Unhappy Meals". The top line becomes the subtitle, the lesson, and eventually the title everyone googles for: "Eat food. Not too much. Mostly plants."

http://www.nytimes.com/2007/01/28/magazine/28nutritionism.t....


and keep as much logic away from templates, that way you can unit test the model.

problem is templating frameworks are too smart.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: