Hacker News new | past | comments | ask | show | jobs | submit login
The big TDD misunderstanding (2022) (linkedrecords.com)
125 points by WolfOliver on Nov 19, 2023 | hide | past | favorite | 184 comments



I also wasn't aware that "unit" referred to an isolated test, not to the SUT. I usually distinguish tests by their relative level, since "unit" can be arbitrary and bring up endless discussions about what it actually means. So low-level tests are those that test a single method or class, and integration and E2E tests confirm the functionality at a higher level.

I disagree with the premise that "unit", or low-level tests, are not useful because they test the implementation. These are the tests that check every single branch in the code, every possible happy and sad path, use invalid inputs, etc. The reason they're so useful is because they should a) run very quickly, and b) not require any external state or setup, i.e. the traditional "unit". This does lead to a lot of work maintaining them whenever the implementation changes, but this is a necessary chore because of the value they provide. If I'm only relying on high-level integration and E2E tests, because there's much fewer of them and they are slower and more expensive to run, I might miss a low-level bug that is only manifested under very specific conditions.

This is why I still think that the traditional test pyramid is the best model to follow. Every new school of thought since then is a reaction towards the chore of maintaining "unit" tests. Yet I think we can all agree that projects like SQLite are much better for having very high testing standards[1]. I'm not saying that every project needs to do the same, but we can certainly follow their lead and aspire to that goal.

[1]: https://www.sqlite.org/testing.html


I've never had issues with integration tests running with real databases -- they never felt slow or incurred any significant amount of time for me.

I also don't think unit tests bring as much value as integration tests. In fact, a lot of times unit tests are IMO useless or just make your code harder to change. The more towards testing implementation the worse it gets IMO, unless I really really care that something is done in a very peculiar way, which is not very often.

My opinion will be of course biased by my past experiences, but this has worked well for me so far with both monoliths and microservices, from e-shops and real estate marketplaces to IoTs.


> I've never had issues with integration tests running with real databases -- they never felt slow or incurred any significant amount of time for me.

They might not be slow individually, but if you have thousands of them, even a runtime of a couple of seconds adds up considerably. Especially if they're not parallelized, or parallelizable. Also, since they depend on an external service, they're tedious to execute, so now Docker becomes a requirement for every environment they run in, including slow CI machines. Then there is external state you need to think about, to ensure tests are isolated and don't clobber each other, expensive setup/teardown, ensuring that they cleanup after they're done, etc. It's all complexity that you don't have, or shouldn't have, with low-level tests.

That's not to say that such tests shouldn't exist, of course, but that they shouldn't be the primary test type a project relies on.

> I also don't think unit tests bring as much value as integration tests. In fact, a lot of times unit tests are IMO useless or just make your code harder to change.

You're repeating the same argument as TFA, which is what I disagree with. IME I _much_ preferred working on codebases with a high coverage from low-level tests, than on those that mostly rely on higher level ones. This is because with more lower level tests there is a higher degree of confidence that a change won't inadvertently break something a higher level test is not accounting for. Yes, this means larger refactorings also means having to update your tests, but this is a trade-off worth making. Besides, nowadays it's becoming easier to just have an AI maintain tests for you, so this argument is quickly losing ground.

> My opinion will be of course biased by my past experiences

Sure, as all our opinions are, and this is fine. There is no golden rule that should be strictly followed about this, regardless of what some authority claims. I've also worked on codebases that use your approach, and it has worked "well" to some extent, but with more code coverage I always had more confidence in my work, and quicker and convenient test suites ensured that I would rely on this safety net more often.


unit vs integration is not low vs high level, that's a mischaracterization.


Do you set up a new database schema for each unit test? If yes, that tends to be slow if you have many tests (hundreds or thousands), and if no, then you risk getting stateful dependencies between tests and they aren’t really unit tests anymore.


Not OP, but I do this with SQLite in memory DB. If each unit only adds the tables the test needs, this is super fast, probably around 2ms per table.


Unless you use SQLite in production as well, this can give you a false sense of security, as there can be subtle behavior differences between DBMSs. Ideally you should always test with the same DBMS and same version as used in production.

This is the same risk with mocking/stubbing, but for integration tests it's important that you're testing with the actual system used in production.


Yep. I have very much been bitten by these differences. Particularly for some of the funkier SQL statements where SQLite and PG handle arrays very differently.

In our case, we decided it was a worthy trade off in terms of developer feedback times on our python microservices, because most of the SQL there is INSERT only. In our golang services which do 90% of the interesting SQL, we spin up PG in a docker container.


Don't need to do that, just run each test in it's own transaction, no?


How do you guarantee the database contents is in the right state for the given test? Dropping all tables and recreating them with the respective test data is the only sure way. And if you run tests in parallel, you need parallel database instances.


With fixturing.


That doesn’t explain anything. I’m saying that setting up a new schema would be the fixture.


> I've never had issues with integration tests running with real databases -- they never felt slow or incurred any significant amount of time for me.

I've come around on this. I used to mock the DB, especially when it was being used as a dumb store, and now I just recreate the DB in SQLite, and see the DB as part of the system being tested, rather than something to mock.

However, I think it's important to note that it wasn't until improved SQLite capabilities, SSDs (and sometimes docker if I really need postgres) all came together that this actually practical. Previously using an actual DB would have blown out my test runtimes by a factor of 10x.

> I also don't think unit tests bring as much value as integration tests. In fact, a lot of times unit tests are IMO useless or just make your code harder to change. The more towards testing implementation the worse it gets IMO, unless I really really care that something is done in a very peculiar way, which is not very often.

I see this a slightly different way. My concept of a unit (ignoring the article) has expanded to be what makes sense for a given test. This may be a class or set of classes where there's a well crafted set of inputs and outputs, but where there's a tricky set of inputs and outputs (anything involving date calculation for example) I'll often write a set of tests for just that function. I'd probably call all of these "unit tests" however.

To me, an integration test involves testing that disparate vertical parts of a SUT work together. I haven't seen many of these in the wild.


I once worked at a place that demanded we write unit tests for every new method. Something that was simply a getter or setter? New unit test. I'd argue that the code was covered by tests on other code, where it was actually used. This would result in more and more useless arguments. Eventually, I just moved on. The company is no longer in business anyway.


I'd argue if you have "getters and setters" you're already doing it wrong, because you're employing OOP. Just use a raw data structure and write functions that accept it as a parameter. Other things being equal, the code will be less complex, cleaner, and easier to test.


I follow the rule that if you have code that was written, it should be tested.

> Something that was simply a getter or setter? New unit test.

Bring in a framework to generate those. You shouldn't be writing them, and if you are it's because they're sepecial.


I think it depends on what exactly the code does.

We have some custom rounding routines (to ensure consistent results). That's the kind of stuff you want to have lots and lots of unit tests for, testing all the paths, edge cases and so on.

We also have a complex price calculation module, which depends on lots of tables stored in the DB as well as some fixed logic to do its job. Sure we could test all the individual pieces of code, but like Lego pieces it's how you put them together that matters so IMO integration testing is more useful.

So we do a mix. We have low-level unit testing for low-level library style code, and focus more on integration testing for higher-level modules and business logic.


I take a similar approach in .NET. I try to build these Lego pieces as traditional classes - no dependencies (except maybe a logger), just inputs and outputs. And then I have a few key "services" which tie everything together. So the service will pull some data from an API and maybe pull some data from an API, then pass it to these pure classes for processing.

I don't unit test the service, I integration test the api itself which indirectly tests the service. Mock the third party API, spin up a real db with testcontainers.

And then I unit test the pure classes. This makes it much easier to test the logic itself, and the service doesn't really have any logic - it just calls a method to get some data, then another method to process it and then returns it. I could quite easily use mocks to test that it calls the right methods with the right parameters etc, but the integration tests test that stuff implicitly, without being a hindrance to refactoring and similar work.


This is of course the correct answer - it depends on the context of your code. A single dogmatic approach to testing will not work equally well across all problem domains.

Simple stateless components, hitting a defined wire protocol or file format, utilizing certain API’s, testing numerical stuff, implies unit testing will go far.

Stateful components, complex multi-class flows, and heavily data driven domains will often benefit from higher level integration/end to end tests.


I just wrote a sibling comment and then realized you just stated exactly the same I wanted to say, but with more concrete examples :)

That's exactly the sweet spot: complex, self-contained logic units might benefit from low-level unit testing, but for the most part, what you're interested to know is if the whole thing works or not. Just IMHO, of course...


I believe the recent-ish reactions against the chore of maintaining the most lower level of unit tests, is because with years and experience we might be going through an industry tendency where we collectively learn that those chores are not worth it.

100% code coverage is a red herring.

If you're in essence testing things that are part of the private implementation, only through indirect second effects of the public surface... then I'd say you went too far.

What you want to do is to know that the system functions as it should. "I might miss a low-level bug that is only manifested under very specific conditions." means to me that there's a whole-system condition that it's possible to occur and thus should be added to the higher level tests.

Not that lower level unit tests are not useful, but I'd say only for intricate and isolated pieces of code that are difficult to verify. Otherwise, most software is a changing entity because we tend to not know what we actually want out of it, thus its lower level details tend to evolve a lot over time, and we shouldn't have two implementations of it (first one the code itself, second one a myriad tiny tests tightly coupled to the former)


You should be very skeptical of anyone that claims they have 100% test coverage.

Only under very rare circumstances is 100% test coverage is even possible, let alone done. Typically when people say coverage they mean "code line coverage", as opposed to the more useful "code path coverage". Since it's combinatorially expensive to enumerate all possible code paths, you rarely see 100% code path coverage in a production system. You might see it for testing vary narrow ADTs, for example; booleans or floats. But you'll almost never see it for black boxes which take more than one simply defined input doing cheap work.


I think the point isn't about 100% coverage - that's obviously a lie, because if even one line in a million line project isn't covered, you don't have 100% coverage. I think claiming to have >50% code coverage is already suspicious. Unless you're writing life-critical code or have some amazing test automation technology I've never heard about, I don't buy it.


Agree with both. Recently, youtube channel ThePrimeagen was talking about this, and coincidentally put up a very silly but clarifying example, luckily he also posted to Twitter so here it is just for fun [1]:

  function foo(num: number): number {
    const a = [1];
    let sum = 0;
    for (let i = 0; i < num; ++i) {
      sum += a[i];
    }
    return sum;
  }

  test("foo", () => {
    expect(foo(1)).toEqual(1);
  });

  100% test coverage
  100% still bugged af
[1]: https://nitter.net/ThePrimeagen/status/1639250735505235975


> 100% still bugged af

That isn't necessarily a problem. The primary purpose of your tests is to document the API. As long as someone is able to determine what the function is for, and how it is meant to be used, along with other intents and information the author needs to convey to users of the API, the goal is met.

However, I don't see how the given test actually helps document anything, so it is not a good example in the slightest. 100% coverage doesn't indicate that you did a good job, but if there are code paths not touched then you know you screwed up your documenting big time.


That's a contrived example that misses the point, and I think poisons the discussion.

Code coverage, and especially _line_ coverage, doesn't tell you anything about the quality of the tests. It's just a basic metric that tells you how much of the code is being, well, covered, by at least one test. Most projects don't even achieve 100% line coverage, and, like most of you here, I agree that reaching that is often not very productive.

But, if in addition to line coverage you also keep track of branch coverage, and use other types of tests (fuzz, contract, behavior, performance, etc.), then your chances of catching a bug during development are much higher. After all, let's not forget that catching bugs early, or ideally not even committing them, is the entire purpose of testing. All of this work takes a lot of effort and discipline, of course, which is why most projects don't bother with it. But, again, you can't argue that the ROI is not worth it, when projects like SQLite are one of the most stable programs in existence, in a large part due to their very high testing practices. This doesn't mean that this is not worth pursuing, or that the entire practice is wrong for some reason.

Arguing that coverage is not a meaningful metric is a reflection of laziness IMO.


I've been asking a few people about what range is good, and a lot say 90 is great, 70 is ideal to balance maintenance cost.

The answers vary by a lot it seems


To me, unit tests primary value is in libraries or components where you want confidence before you build on top of them.

You can sidestep them in favour of higher level tests when the only place they're being used is in one single component you control.

But once you start wanting to reuse a piece of code with confidence across components, unit tests become more and more important. Same as more people are involved.

Often the natural time to fill in lacking unit tests is as an alternative to ad hoc debugging.


> I also wasn't aware that "unit" referred to an isolated test

It never did. "Unit test" in programming has always had the meaning it does now: it's a test of a unit of code.

But "unit test" was originally used in electronics, and the meaning in electronics was a bit closer to what the author suggests. The author is being a bit fanciful (aka lying) by excluding this context and pretending that we all don't really understand what Kent Beck et. al. were talking about.


Yes.

<< I call them "unit tests" but they don't match the accepted definition of unit tests very well. >>

I'm not entirely certain it's fair to accuse the author of lying; ignorance derived from limited exposure to materials outside the bubble (rather than deceit) is the more likely culprit here.

(Not helped at all by the fact that much of the TDD/XP origin story is pre-Google, and requires a different set of research patterns to track down.)


this kind of reckless disregard for whether what you are saying is true or not is a kind of lie that is, if anything, even more corrosive than lies by people who know the truth; at least they have a plan in mind to achieve a goal of some benefit to someone


He links to where he got the notion from. I don’t think it’s that clear-cut.


> pretending that we all don't really understand what Kent Beck et. al. were talking about.

Here's what Kent Beck has to say about testing: https://stackoverflow.com/a/153565

--- start quote ---

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence

--- end quote ---


good thinking but irrelevant to the question at hand


> I also wasn't aware that "unit" referred to an isolated test, not to the SUT.

I'm with you. That claim is unsubstantiated. It seems to trace to the belief that the first unit tests were XUnit family, thus were SUnit for Scheme. But Kent Beck made it pretty clear that SUnit "units" were classes.

https://web.archive.org/web/20150315073817/http://www.xprogr...

There were unit tests before that. SUnit took its name from common parlance, not vice versa. It was a strange naming convention, given that the unit testing framework could be used to test anything and not just units. Much like the slightly older Test Anything Protocol (TAP) could.

> [on unit tests] This does lead to a lot of work maintaining them whenever the implementation changes, but this is a necessary chore because of the value they provide.

I disagree. Unit tests can still be behavioral. Then they change whenever the behavior changes. They should still work with a mere implementation change.

> This is why I still think that the traditional test pyramid is the best model to follow.

I'll disagree a little with that, too. I think a newer test pyramid that uses contract testing to verify integrations is better. The notion of contract tests is much newer than the pyramids and, properly applied, can speed up feedback by orders of magnitude while also cutting debugging time and maintenance by orders of magnitude.

On that front, I love what Pact is doing and would like to see more competition in the area. Hottest thing in testing since Cypress/Playwright . . .

https://pact.io


Genuine question: can somebody please explain why there needs to be a distinction between "true" unit tests and tests that work on several layers at once as long as said tests are runnable <1min on a consumer-grade laptop without any prior setup apart from a standard language + container env setup?

Over the years I had several discussion to that effect and I truly, genuinely don't understand. I have test cases that test a connector to, say, Minio, so I spin up a Minio container dynamically for each test case. I need to test an algorithm, so I isolate its dependencies amd test it.

Shouldn't the point be that the thing is tested with the best tool available for the job that ensures robustness in the face of change rather than riding on semantics?


There doesn't need to be a strict distinction, but if you subscribe to the traditional test pyramid practice, it's worth focusing the bulk of your testing on tests that run very quickly (typically in the order of milliseconds), that don't require complex setup/teardown, and that don't rely on external state or services to run. This means that these tests are easier to read/write, you can have thousands of them, and you can run them much more frequently, which makes you more productive.

In turn, if you focus more on integration or E2E tests, which are usually more difficult and expensive to run, then you may avoid running them frequently, or rely on CI to run them for you, which slows down development.

Also, you may consider having a stronger distinction, and e.g. label integration and E2E tests, and choose to run them conditionally in CI. For example, it might not make sense to even start the required services and run integration/E2E tests if the unit test suite fails, which can save you some time and money on CI resources.


Wow that's interesting, because I never even considered that a unit test to be other than a test to a small unit.

Is it not right there in the name?


I guess what the OP is arguing is that "unit test" doesn't mean you "test a unit" but rather that "each test is a unit" - i.e. each test executes independently from all other tests.

I just find that good testing practice tbh but it's true that there are loads of test suites out there that require tests to be run in a particular sequence. I haven't seen one of those in a while but they used to be quite common.


Yes. That is definitely the original intention of the term!

Of course, language can and does drift etc, but I haven't seen the other use anywhere else.


Having high testing standards means practically to me (having worked for a few SaaS companies): change code somewhere else, and see where it fails elsewhere. Though, I see failing tests as guidelines as nothing is 100% tested. If you don't see them as guidelines but as absolute, then you'll get those back in bugs via Zendesk.


It make sense to write a test for a class when the class/method does complex calculations. Today this is less the case then it was when the test pyramid was introduced.


> I also wasn't aware that "unit" referred to an isolated test, not to the SUT.

What the hell are they teaching people nowadays?

But then I do a quick google search and see this as the top result

https://stackoverflow.com/questions/652292/what-is-unit-test...

> Unit testing simply verifies that individual units of code (mostly functions) work as expected.

Well no fucking wonder.

This is like the whole JRPG thing where the younger generation misunderstood but in their hubris claim it's the older generation that coined the term that doesn't understand.

It's the blind leading the blind and that's probably the most apt description of our industry right now.


This resonates. I learned the hard way that you want your main tests to integrate all layers of your system: if the system is an HTTP API, the principal tests should be about using that API. All other tests are secondary and optional: can be used if they seem useful during implementation or maintenance, but should never be relied upon to test correctness. Sometimes you have to compromise, because testing the full stack is too expensive, but that's the only reason to compromise.

This is largely because if you try to test parts of your system separately, you have to perfectly simulate how they integrate with other parts, otherwise you'll get there worst case scenario: false test passes. That's too hard to do in practice.

I suspect that heavy formalization of the parts' interfaces would go a long way here, but I've not yet seen that done.


> if the system is an HTTP API, the principal tests should be about using that API

So many times yes!

Funnily enough it's also the quickest way to get high code coverage numbers that is still used as a metric everywhere


> the quickest way to get high code coverage numbers

apochryphal tale of test coverage + Goodhart's law:

Team is responsible for a software component that's two-thirds gnarly driver code that's impossible to unit test, and one third trivial stuff that's easy to unit test. Team has 33% unit test coverage.

Test coverage becomes an organisational KPI. Teams must hit at least 80% statement coverage. Oh no!

Team re-architects their component & adds several abstraction layers that wrap the gnarly driver code, that don't serve any functional purpose. Abstraction layers involve many lines of code, but are elegantly designed so they are easy to test. Now the codebase is 20% gnarly driver code that's impossible to unit test for, 10% trivial stuff that's easy to unit test, and 70% layers of unnecessary nonsense that's easy to unit test. 80% statement coverage target achieved! Raises and promos for everyone!


I find it hard to imagine anyone stupid and scrupulous enough to go that route when the honorable and beneficial option is to a) push back on arbitrary metrics and b) strive to extract maximal value out of testing practices in earnest.


> when the honorable and beneficial option is to a) push back on arbitrary metrics

Depends on the company. Sometimes it's just mandated from above and you have no power


Of course it's mandated from above, but you always have power. There's no such thing as a powerless employee. That's the very basis of a free society. "No power" is another word for "slavery."

Or maybe your point is that it's a matter of degree? Sometimes employees' have already stretched their leverage thin across other, higher priority stipulations such as free coffee, and parking privileges.


> but you always have power.

Ah yes. I can tell my manager who may tell their manager who will just say "it's company policy".

If I'm not on a visa I can probably quit and go to. different company, but that's often all the power that you have.


The big TDD misunderstanding is that most people consider TDD a testing practice. The article doesn’t talk about TDD, it gives the reader some tips on how to write tests. That’s not TDD.


I'm fully aware of the idea that TDD is a "design practice" but I find it to be completely wrongheaded.

The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!

Have you ever refactored working code into working code and had a slew of tests fail anyway? That's the child of test driven design.

High level/integration TDD doesnt give "feedback" on your design it just tells you if your code matches the spec. This is actually more useful. It then lets you refactor bad code with a safety harness and give failures that actually mean failure and not "changed code".

I keep wishing for the idea of test driven design to die. Writing tests which break on working code is inordinately uneconomic way to detect design issues as compared to developing an eye for it and fixing it under a test harness with no opinion on your design.

So, yes this - high level test driven development - is TDD and moreover it's got a better cost/benefit trade off than test driven design.


I think many people realise this, thus the spike and stabilise pattern. But yes, integration and functional tests are both higher value in and of themselves, and lower risk in terms of rework, so ought to be a priority. For pieces of logic with many edge cases and iterations, mix in some targeted property-based testing and you’re usually in a good place.


Part of test-driven design is using the tests to drive out a sensible and easy to use interface for the system under test, and to make it testable from the get-go (not too much non-determinism, threading issues, whatever it is). It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation! But the best and quickest way to get to having high quality _behaviour_ tests is to start by using "implementation tests" to make sure you have an easily testable system, and then go from there.


>It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!

Building tests only to throw them away is the design equivalent of burning stacks of $10 notes to stay warm.

As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.

It mystifies me that doubling the SLOC of your code by adding low level tests only to trash them later became seen as a best practice. It's so incredibly wasteful.


> As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.

I think this “2x easier” only applies to developers who deeply understand how to design software. A very poorly designed implementation can still pass the high level tests, while also being hard to reason about (typically poor data structures) and debug, having excessive requirements for test setup and tear down due to lots of assumed state, and be hard to change, and might have no modularity at all, meaning that the tests cover tens of thousands of lines (but only the happy path, really).

Code like this can still be valuable of course, since it satisfies the requirements and produces business value, however I’d say that it runs a high risk of being marked for a complete rewrite, likely by someone who also doesn’t really know how to design software. (Organizations that don’t know what well designed software looks like tend not to hire people who are good at it.)


"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.

So no, this isnt about skill.

"Test driven design" doesnt provide effective safety rails to prevent bad design from happening. It just causes more pain to those who use it as such. Experience is what is supposed to tell you how to react to that pain.

In the hands of junior developers test driven design is more like test driven self flagellation in that respect: an exercise in unnecessary shame and humiliation.

Moreover since it prevents those tests with a clusterfuck of mocks from operating as a reliable safety harness (because they fail when implementation code changes, not in the presence of bugs), it actively inhibits iterative exploration towards good design.

These tests have the effect of locking in bad design because keeping tightly coupled low level tests green and refactoring is twice as much work as just refactoring without this type of test.


> I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.

Mocks are an anti-pattern. They are a tool that either by design or unfortunate happenstance allows and encourages poor separation of concerns, thereby eliminating the single largest benefit of TDD: clean designs.


You asserted:

> … TDD is a "design practice" but I find it to be completely wrongheaded.

> The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!

But now you’re asserting:

> "Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests. In other words, for there to be a “unit test” there must be a boundary around the “unit”, and if the code created by following TDD doesn’t even have module-sized units, then is that really TDD anymore?

Edit: Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do? If so, then what does it direct us to do?


>"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

>Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests.

No, it doesnt contradict that at all. Test driven design, whether done optimally or suboptimally, produces low level unit tests.

Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.

Either way I do not consider it a good practice. The person I was replying to was suggesting that it was a practice that was more suited to be people with a lack of experience. I dont think that is true.

>Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do?

I'm saying that test driven design provides weak direction about design and it is not uncommon for test driven design to still produce bad designs because that weak direction is not followed by people with less experience.

Thus I dont think it's a practice whose effectiveness is moderated by experience level. It's just a bad idea either way.


Thanks for clarifying.

I think this nails it:

> Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.

Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall, leading to code that requires less set up to test, fewer dependencies to mock, etc.

I also agree that a lot of devs ignore that feedback, and that just telling someone to “do TDD” without first making sure that they know that they need to strive to have little to no test setup and few or no mocks, etc., otherwise the advice is pointless.

Overall I get the sense that a sizable number of programmers accept a mentality of “I’m told programming is hard, this feels hard so I must be doing it right”. It’s a mentality of helplessness, of lack of agency, as if there is nothing more they can do to make things easier. Thus they churn out overly complex, difficult code.


>Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall

Yes and that is precisely what I was arguing against throughout this thread.

For me, (integration) test driven development development is about creating:

* A signal to let me know if my feature is working and easy access to debugging information if it is not.

* A body of high quality tests.

It is 0% about design, except insofar as the tests give me a safety harness for refactoring or experimenting with design changes.


Don't agree, though I think it's more suble than "throw away the tests" - more "evolve them to a larger scope".

I find this particularly with web services,especially when the the services are some form of stateless calculators. I'll usually start with tests that focus on the function at the native programming language level. Those help me get the function(s) working correctly. The code and tests co-evolve.

Once I get the logic working, I'll add on the HTTP handling. There's no domain logic in there, but there is still logic (e.g. mapping from json to native types, authentication, ...). Things can go wrong there too. At this point I'll migrate the original tests to use the web service. Doing so means I get more reassurance for each test run: not only that the domain logic works, but that the translation in & out works correctly too.

At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.

I'm therefore with TFA in leaning towards E2E testing because I get more bang for the buck. There are still places where I'll keep native language tests, for example if there's particularly gnarly logic that I want extra reassurance on, or E2E testing is too slow. But they tend to be the exception, not the rule.


> At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.

They give you feedback when something fails, by better localising where it failed. I agree that E2E tests provide better assurance, but tests are not only there to provide assurance, they are also there to assist you in development.


Starting low level and evolving to a larger scope is still unnecessary work.

It's still cheaper starting off building a playwright/calls-a-rest-api test against your web app than building a low level unit test and "evolving" it into a playwright test.

I agree that low level unit tests are faster and more appropriate and if you are surrounding complex logic with a simple and stable api (e.g. testing a parser) but it's better to work your way down to that level when it makes sense, not starting there and working your way up.


That’s not my experience. In the early stages, it’s often not clear what the interface or logic should be - even at the external behaviour level. Hence the reason tests and code evolve together. Doing that at native code level means I can focus on one thing: the domain logic. I use FastAPI plus pytest for most of these projects. The net cost of migrating a domain-only test to use the web API is small. Doing that once the underlying api has stabilised is less effort than starting with a web test.


I dont think ive ever worked on any project where they hadnt yet decided whether they wanted a command line app or a website or an android app before I started. That part is usually fixed in stone.

Sometimes lower level requirements are decided before higher level requirements.

I find that this often causes pretty bad requirements churn - when you actually get the customer to think about the UI or get them to look at one then inevitably the domain model gets adjusted in response. This is the essence of why BDD/example driven specification works.


What exactly is it wasting? Is your screen going to run out of ink? Even in the physical contruction world, people often build as much or more scaffolding as the thing they're actually building, and that takes time and effort to put up and take down, but it's worthwhile.

Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone. You've got a computer there, you might as well use it; "thinking aloud" by writing out your possible API designs and playing with them in code tends to be quicker and more effective.


>What exactly is it wasting?

Time. Writing and maintaining low level unit tests takes time. That time is an investment. That investment does not pay off.

Doing test driven development with high level integration tests also takes time. That investment pays dividends though. Those tests provide safety.

>Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone.

It's actually much quicker and safer if you can change designs under the hood and you dont have to change any of the tests because they validate all the behavior.

Quicker and safer = you can do more iterations on the design in the available time = a better design in the end.

The refactoring step of red, green, refactor is where the design magic happens. If the refactoring turns tests red again that inhibits refactoring.


> It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!

Is it? I don't think I've ever seen that mentioned.


Put simply, doing TDD properly leads to sensible separation of concerns.


I think there can be some value to using TDD in some situations but as soon as people get dogmatic about it, the value is lost.

The economic arguments are hard to make. Sure, writing the code initially might cost $X and writing tests might cost $1.5X but how can we conclude that the net present value (NPV) of writing the tests is necessarily negative - this plainly depends on the context.


I don't even like TDD much, but I think that this missed the point:

> Have you ever refactored working code into working code and had a slew of tests fail anyway?

Yes - and that is intended. The "refactor of working code into working code" often changes some assumptions that were made during implementation.

Those tests are not there to give "feedback on your design", they are there to endure that the implementation does what you thought it should do when you wrote your code. Yes, that means that when you refactor your code, quite a few tests will have to be changed to match the new code.

But the amount of times I had this happen and it highlighted issues on the refactor is definitely not negligible. The cost of not having these tests (which would translate into bugs) would certainly have surpassed the costs of keeping those tests around.


If we’re talking “what you thought it should do” and not “how you thought it should do it” this is all fine. If requirements change tests should change. I think the objection is more to changing implementation details and having to rewrite twice as much code, when your functional tests (which test things that actually make you money) never changed.


If your functional tests fail because you made an “unrelated” code change, then you’ve done something wrong.


Maybe, but I think the point is that it's probably very easy to get into this situation, and not many people talk about it or point out how to avoid it.


I’m still not following what the issue is. If you refactor some code and change the behaviour of the code, and the code tests the expected behaviour and passes, then you have one of two problems:

1. You had a bug you didn’t know about and your test was invalid (in which case the test is useless! Fix the issue then you fix the test…)

or

2. You had no bug and you just introduced a new one, in which case the test has done its job and alerted you to the problem so you can fix your mistake.

What is the exact problem?

Now if this is an issue with changing the behaviour of the system, that’s not a refactor. In that case, your tests are testing old behaviour, and yes, they are going to have to be changed.


The point is that you're not changing the interface to the system, but you're changing implementation details that don't affect the interface semantics. TDD does lead you to a sort of coupling to implementation details, which results in breaking a lot of unit tests if you change those implementation details. What this yields is either hesitancy to undertake positive refactorings because you have to either update all of those tests or just delete them altogether, so were those tests really useful to begin with? The point is that it's apparently wasted work and possibly an active impediment to positive change, and I haven't seen much discussion around avoiding this outcome, or what to do about it.


There has been discussion about this more than a decade ago by people like Dan North and Liz Keogh. I think it’s widely accepted that strict TDD can reduce agility when projects face a lot of uncertainty and flux (both at the requirements and implementation levels). I will maintain that functional and integration tests are more effective than low-level unit tests in most cases, because they’re more likely to test things customers care about directly, and are less volatile than implementation-level specifics. But there’s no free lunch, all we’re ever trying to do is get value for our investment of time and reduce what risks we can. Sometimes you’ll work on projects where you build low level capabilities that are very valuable, and the actual requirements vary wildly as stakeholders navigate uncertainty. In those cases you’re glad to have solid foundations even if everything above is quite wobbly. Time, change and uncertainty are part of your domain and you have to reason about them the same as everything else.


> I will maintain that functional and integration tests are more effective than low-level unit tests in most cases

Right, that's pretty much the only advice I've seen that makes sense. The only possible issue is that these tests may have a broader state space so you may not be able to exhaustively test all cases.


Absolutely right. If you’re lucky, those are areas where you can capture the complexity in some sort of policy or calculator class and use property based testing to cover as much as possible - that’s a level of unit testing I’m definitely on board with. Sometimes it’s enough to just trust that your functional tests react appropriately to different _types_ of output from those classes (mocked) without having to drive every possible case (as you might have seen done in tabular test cases). For example I have an app that tests various ways of fetching and visualising data, and one output is via k-means clustering. I test that the right number of clusters gets displayed but I would never test the correctness of the actual clustering at that level. Treat complexity the same way you treat external dependencies, as something to be contained carefully.


Why does testing behavior matter? I don’t care if my tests exhaustively test each if branch of the code to make sure that they call the correct function when entering that if branch. That’s inane.

I care about whether the code is correct. A more concrete example; say I’m testing a float to string function, I don’t care how it converts the floating point binary value 1.23 into the string representation of “1.23”. All I care about, is the fact that it correctly turns that binary value into the correct string. I also care about the edge cases. Does 0.1E-20 correctly use scientific notation? What about rounding behavior? Is this converter intended to represent binary numbers in a perfect precision or is precision loss ok?

If your tests simply check that you call the log function and the power function x times, your tests are crap. And this is what I believe the parent commenter was talking about. All too often, tests are written to fulfill arbitrary code coverage requirements or to obsequiously adhere to a paradigm like TDD. These are bad tests, because they’ll break when you refactor code.

One last example, I recently wrote a code syntax highlighter. I had dozens of test cases that essentially tested the system end to end and made sure if I parsed a code block, I ended up with a tree of styles that looked a certain way. I recently had to refactor it to accommodate some new rules, and it was painless and easy. I could try stuff out, run my tests, and very quickly validate that my changes did not break prior correct behavior. This is probably the best value of testing that I’ve ever received so far in my coding career.


"Have you ever reconsidered your path up the cliff face and had to reposition a slew of pitons? This means your pitons are too tightly coupled to the route!"


> Have you ever refactored working code into working code and had a slew of tests fail anyway? That's the child of test driven design.

I had this problem, when either testing too much implementation, or relying too much on implementation to write tests. If, on the other hand, I test only the required assumptions, I'd get lower line/branch coverage, but my tests wouldn't break while changing implementation.

My take on this - TDD works well when you fully control the model, and when you don't test for implementation, but the minimal required assumptions.


I don't think that's TDD's fault, that's writing a crappy test's fault.

If you keep it small and focussed, don't include setup that isn't necessary and relevant, only exercise the thing which is actually under test, only make an assertion about the thing you actually care about (e.g. there is the key 'total_amount' with the value '123' in the response, not that the entire response body is x); that's much less likely to happen.


If you’ve refactored code and a bunch of tests fail, then you’ve likely introduced a bug.


Not sure why I’m getting downvoted so badly, because by its very nature refactoring should t change the functionality of the system. If you have functional unit tests that are failing, then something has changed and your refactor has changed the behaviour of the system!


It is very common for unit tests to be white-box testing, and thus to depend significantly on internal details of a class.

Say, when unit testing a list class, a test might call the add function and then assert that the length field has changed appropriately.

Then, if you change the list to calculate length on demand instead of keeping a length field, your test will now fail even thought the behavior has not actually changed.

This is a somewhat silly example, but it is very common for unit tests to depend on implementation details. And note that this is not about private VS public methods/fields. The line between implementation details and public API is fuzzy and depends on the larger use of the unit within the system.


The behavior has changed:

Checking length is now a function call and not a cached variable — a change in call signature and runtime performance.

Consumers of your list class are going to have to update their code (eg, that checks the list length) and your test successfully notified you of that breaking API change.


Then any code change is a breaking API change and the term API is meaningless. If the compiler replaces a conditional jump + a move with a conditional move, it has now changed the total length of my code and affected its performance, and now users will have to adjust their code accordingly.

The API of a piece of code is a convention, sometimes compiler enforced, typically not entirely. If that convention is broken, it's good that tests fail. If changes outside that convention break tests, then it's pure overhead to repair those tests.

As a side note, the length check is not necessarily no longer cached just because the variable is no longer visible to that test. Perhaps the custom list implementation was replaced with a wrapper around java.ArrayList, so the length field is no longer accessible.


In your example, if your code relied on the previous functionality, then surely you could get a nasty bug?


I mean I think it's fair to assume that TEST-Driven-Development has something to do with testing. That being said, Kent Beck recently (https://tidyfirst.substack.com/p/tdd-outcomes) raised a point saying TDD doesn't have to be just an X technique, which I wholeheartedly agree with.


Instead of Test-Driven Design, it should’ve been called Design-By-Testing.


Did you mean Test Driven Development, or is Test-Driven Design a whole other thing?


Well, it's exactly as much about testing as it focus on writing and running tests.

What means, it's absolutely entirely about them.

People can claim it's about requirements all they want. The entire thing runs around the tests, and there's absolutely no consideration to the requirements except on the part where you map them into tests. If you try to create a requirements framework, you'll notice that there is much more to them than testing if they are met.


As I remember the discourse about TDD, originally it was described as a testing practice, and later people started proposing to change the last D from "development" to "design".


Yeah it’s kind of unfortunate because they make a very good argument about defining a thing better, and in the title use a wrong definition of an adjacent term.


Maybe the term TDD in the title can be replaced with "unit testing". But unit testing is an major part of TDD.


> Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases.

If there is anything that makes me cry, it’s hearing “it’s done, now I need to fix the tests”


It's something we've changed when we switched our configuration management. The old config management had very, very meticulous tests of everything. This resulted in great "code" coverage, but whenever you changed a default value, at least 6 tests would fail now. Now, we much rather go ahead and test much more coarsely. If the config management can take 3 VMs and setup a RabbitMQ cluster that clusters and accepts messages, how wrong can it be?

And this has also bled into my development and strengthened my support of bug-driven testing. For a lot of pretty simple business logic, do a few high level e2e tests for the important behaviors. And then when it breaks, add more tests for those parts.

But note, this may be different for very fiddly parts of the code base - complex algorithms, math-heavy and such. But that's when you'd rather start table based testing and such. At a past gamedev job, we had several issues with some complex cost balancing math, so I eventually setup a test that allows the game balancing team to supply CSV files with expected results. That cleared up these issues within 2 days or so.


> how wrong can it be?

Me, right before some really annoying bug starts to show up and the surface area is basically half the codebase, across multiple levels of abstraction, in various combinations.


whenever you changed a default value, at least 6 tests would fail now

Testing default values makes a lot of sense. Both non-set configuration values and non-supplied function parameters become part of your API. Your consumers will rely on those default values, and if you alter them, your consumers will see different behaviour.


If changing the implementation but not the behaviour breaks a test, I just delete the test.


Agree, this is usually a sign the team writes tests for the sake of writing tests.


Sometimes, you have to make a complex feature or fix. You can first make a prototype of the code or proof of concept that barely works. Then you can see the gap that remains to make the change production ready and the implications of your change. That involves fixing regressions in the test suite caused by your changes.


> Tip #1: Write the tests from outside in.

> Tip #2: Do not isolate code when you test it.

> Tip #3: Never change your code without having a red test.

> Tip #4: TDD says the process of writing tests first will/should drive the design of your software. I never understood this. Maybe this works for other people but it does not work for me. It is software architecture 101 — Non-functional requirements (NFR) define your architecture. NFR usually do not play a role when writing unit tests.

The one time I ever did "proper" red/green cycle TDD, it worked because I was writing a client library for an existing wire protocol, and knew in adance exactly what it needed to do and how it needed to do it.

Item 2 is right, but this also means that #1 is wrong. And knowing what order #2 requires, means knowing how the code is designed (#4).


The tips are not contradictory if you follow the advice to start at a higher level.

Let's say you had to invent that wire protocol. You would write a test for a client that doesn't care which wire protocol is used.


TDD works great for this. Usually before I am sent a new piece of equipment (has to go through the approval/purchase process) I’m given the docs. I’ll write unit tests using the examples in the docs (or made up examples based on the docs). I’ll write my software controller against that. By the time I get the actual device I’m just confirming my code works.


TDD was later given the name Behavior Driven Development (before being usurped by the likes of Cucumber, Gerkhin) in an attempt to avoid this confusion. TDD advocates that you test that the client library does what its public interface claims it does – its behavior, not how it is implemented under the hood. The wire protocol is almost irrelevant. The tests should hold true even when the wire protocol is replaced with another protocol.


That's not quite right, historically.

Behavior Driven Development began as a re-languaging of TDD: "The developers were much more receptive to TDD when I stopped talking about testing." -- Dan North.

BDD diverged from TDD fairly early, after some insights by Chris Matts.

As for TDD advocating tests of the public interface... that seems to me to have been more aspirational than factual. The tests in TDD are written by developers for developers, and as such tend to be a bit more white/clear box than pure interface testing would suggest.

In the edge cases where everything you need for testing is exposed via the "public" interface, these are equivalent, of course, but there are tradeoffs to be considered when the information you want when running isolated experiments on an implementation isn't part of the contract that you want to be supporting indefinitely.


I suspect it goes deeper than that, which is some of the confusion.

If you have multiple layers/parts some will treat each part as an independent library to be used; Implementation details of one level are depending on public interfaces of the next level.


Do you have any references that BDD was used as a term before Cucumber?


Search for the writings of Dan North, and don't forget that in the English alphabet, "Behaviour" is spelled with a "U".

Very roughly, Cucumber framework appears in late 2008? Whereas BDD first appears in the TDD community no later than December 2005.

Ex: here Dave Astels references a BDD talk he presented in 2005-09: https://web.archive.org/web/20051212160347/http://blog.davea...


Part of the problem is caused by all sides using the same terms but with a different meaning.

> You just don’t know if your system works as a whole, even though each line is tested.

... even though each line has been executed.

One test per line is strongly supported by tools calculating coverage and calling that "tested".

A test for one specific line is rarely possible. It may be missing some required behavior that hasn't been challanged by any test, or it may be inconsistent with other parts of the code.

A good start would be to stop calling something just executed "tested".


I like the term "exercised" for coverage of questionable assertative value. It's rather pointless in environments with a strict compiler and even some linters get there, but there are still others where you barely know more than that the brackets are balanced before trying to execute. That form of coverage is depressingly valuable there. Makes me wonder if there is a school of testing that deliberately tries to restrict meaningful assertions to higher level tests, so that their "exercise" part is cheaper to maintain?

About the terms thing:

Semantics drifting away over time from whatever a sequence of letters were originally used for isn't an exception, it's standard practice in human communication. The winning strategy is adjusting expectations while receiving and being aware of possible ambiguities while sending, not sweeping together a proud little hill of pedantry to die on.

The good news is that neither the article nor your little jab at the term "tested" really do that, the pedantry is really just a front to make the text a more interesting read. But it also invites the kind of sallow attack that is made very visible by discussions on the internet, but will also play out in the heads of readers elsewhere.


> Makes me wonder if there is a school of testing that deliberately tries to restrict meaningful assertions to higher level tests, so that their "exercise" part is cheaper to maintain?

That's a nice summary of the effect I observe in the bubble I work in, but I'm sure it is not deliberate but Hanlon's razor applies.

With sufficient freedom for interpretation it is what the maturity model of choice requires.


My view on unit testing is if there are no dependencies, there is no real reason not to write tests for all behaviours. While you may have a wonderful integration testing suite, it is still great to know that building blocks work as intended.

The problems arise with dependencies as now you need to decide to mock them or use concrete implementations. The concrete implementation might be hard to set up , slow to run in a test - or both. Using a mock, on the other hand, is essentially an alternate implementation. So now your code has the real implementation + one implementation per test (in the limit) which is plainly absurd.

My current thinking (after writing a lot of mocks) is to try to shape code so that more of it can be tested without hard to setup dependencies. When this can't be done, think hard about the right approach. Try to put yourself in the shoes of a future maintainer. For example, instead of creating a bespoke mock for just your particular test, consider creating a common test utility that mocks a commonly used dependency in accordance with common testing patterns. This is just one example. Annoyingly, a lot of creativity is required once dependencies of this nature are involved which is why it is great to shape code to avoid it where possible.


> it is still great to know that building blocks work as intended.

The only way to know that the building blocks work as intended is through integration testing. The multiple "all unit tests passed, no integration tests" memes show that really well.


I suspect you are not finding many bugs in common built-in libraries via integration tests however. The same concept can extend to code that our own teams write.


> finding many bugs in common built-in libraries via integration tests

I'm not convinced they're finding them with unit tests, either. No shortage of public repos with great test coverage that still have pages of Pull Requests and Issues dealing with bugs. In many of those instances, you could argue that it's the act of integration testing (the team trying to use the project) that ends up catching the issues.


I think for low level functionality, unit tests and integration tests are the same thing. As an absurd example, consider a function that’s adds two numbers together producing a result.


If your whole Program is one unit there is no integration between units that warrants an integration test.


> try to shape code so that more of it can be tested without hard to setup dependencies

Yes!

In general, functional core + imperative shell goes a long way toward this goal.

With that approach should also be minimal coupling and complex structured types outside with simple standard types passed to core functions.

These things make unit testing so much easier and faster (dev time and test run time).


The tests in the codebase I currently work in is a mocking nightmare. It feels like somebody learned about c++ interface classes and gmock for the first time when the codebase was first being put together and went completely bananas. There are almost no classes which don't inherit from a pure interface.

Two of the main drawbacks to this are

- Classes which have only a single implementation inherit from an interface just so they can be mocked. We often only need polymorphism for testing, but not at runtime. This not only makes the code slower (minor concern, often) but more importantly much more difficult to follow.

- The tests rely heavily on implementation details. The typically assertion is NOT on the input/output behavior of some method. Rather, it's on asserting that various mocked dependencies got called at certain times within the method under test. This heavily couples the tests to the implementation and makes refactoring a pain.

- We have no tests which tests multiple classes together that aren't at the scale of of system wide, end-to-end tests. So when we DI class Bar into class Foo, we use a mock and don't actually test that Bar and Foo work well together.

Personally, I think the code base would be in much better shape if we completely banned gmock.


In my experience a lot of engineers are stuck thinking in MVC terms an fail to write modular code. As a result most business logic is part of a request / response flow. This makes it infeasible to even attempt to write tests first, thus leaving integration or e2e tests as the only remaining options.


I’m not a TDD purist but I’ve found that so long as the request / response flow is a JSON api or similar (as opposed to old style forms and html rendering) then writing integration tests first is quite easy so long as you make sure your test fixtures are fairly fast.


With this approach do you stop at testing the JSON API or do you still make it to testing the rendered HTML actually shown to the user?

I've always actually likes the simplicity of testing HTML APIs in a frontend project. For me, tests get a lot more simple when I can verify the final HTML directly from the response and don't need to parse the JSON and run it through client-side rendering logic first.


While we had a few end to end browser tests they tended to be slow so the pattern we ended up at was to generate output from the API to use as mock data in the UI tests.

I think it usually make sense to test the business logic at the JSON API level (except for particularly complicated parts where unit testing of that complex logic may make sense) and then use snapshot tests for the display logic.


Where should the business logic rather be? My tests are typically calling APIs to test the business logic. Trying to improve myself here.


There aren‘t any hard rules here, but you can try to build your business logic as if it was a library, where your HTTP API is merely a interface to it.


> Never change your code without having a red test

I'll never understand why people insist on this. If you want to write your tests first, that is fine. Noone is going to stop you. But why must you insist everyone does it this way?


For me it is LARPing, people pretending that code they write is so complicated and so important they need 100% test coverage and nuclear reactor will melt down somewhere if they don’t do their job on the highest level.

I write testable code without TDD and if something breaks we write test to cover that case so it doesn’t happen again.


How do you refactor code, if you have a poor test coverage? Also for me actually the most importan benefit is the instant feedback I get, when I write the unit tests before the implemenation. I can move faster with more confidents.


Probably by working in an environment where that confidence is conveniently provided by static type analysis. If the parts are sufficiently reshaped to fit together again you just know that it will work. And chances are that feedback is an order or two of magnitude more instantaneous.


A lot of software is business processes I.E Money.

You don't want money and ledgers going out of sync. That can be serious.


The delivery of a text depends on the literary goal of the author.

When you created or found a new way do approach development and you want to spread that idea, you want to persuade.

Alternatively you may want to inform, for example when you teach or journal. In which case you write down the facts without embellishing them.

Finally you may want to do a critical review, then you explain the method in relation to other methods and existing practices. Pros and cons, etc.

The hope is that the reader will understand these goals and adopt it into their own beliefs as they see fit in relation on how it was presented.


The charitable interpretation is that this is just the definition of TDD, and the author isn't actually trying to push it on anyone. I completely agree with you though - pushing TDD as "the best" is just obnoxious until someone comes in with some more solid evidence to back it up.


I think you are hitting the key point to make. It boils down too "dogmas are bad".

All these programming methodologies seem to be in risk of developing a cargo cult following and since programmers are way more dogmatic than people it also gets way worse than in other fields.

I don't like TDD and I think it is silly. But fine whatever floats your boat. The problems begins when someone tries to push the single right way onto others and it seems to mainly be a problem in corporate settings where most people rather get money than doing what they think is the right thing -- and some people push agile, TDD or whatever.


TDD helps you think about the API as a user, which naturally leads to a user-friendly design, and in turn also makes the implementation testable from the start. Often when testing is done after the implementation, the code is tightly coupled, relies on external state, and requires refactoring just to make it testable.

That said, this workflow is tedious during the exploratory phase of the project. If you're unsure yet about which entities should exist, and how they will be used and interact with each other, TDD can be a hindrance. Once the design is mostly settled, this workflow can be helpful, but insisting that it must be used always is also annoying. Everything in software development is a trade-off, and dogmatism should be avoided.


It is the safest way to make sure your test will actually fail in case your code does not work. I often had the situation where I wrote a test which is always green, if if the code is broken.

Often I do not have the clarity to write the test first, then I just write the code and the test later. But I comment out the code or introduce a bug on purpose to make sure when I run the test code it actually works and detects the bug.


If you do TDD, you write your tests first. If you don’t follow TDD, you don’t have to. The author insists on this because they assume a TDD approach.


I suppose that is a fair reading. If TDD is the goal then the "never" instruction makes sense.


In the context of the article, I read this as "modify the test first for an existing unit test and a bit of production code under test". I don't think this is equivalent to "always write the test first".


It helps you think about the problem better.

A particular test has assertions that must be implemented correctly in order for them to pass.

By starting in red you gradually make your way towards green usually paving your thinking and potentially unlocking other tests.

I don't believe you must do this every time though.


> I don't believe you must do this every time though.

So I think we agree. If you find it beneficial, please carry on doing it. For anyone who doesn't find it beneficial, they can write tests in whatever way they want.


I think that unit tests are super valuable because when used properly, they serve as micro-specifications for each component involved.

These would be super hard to backfill later, because usually only the developer who implements them knows everything about the units (services, methods, classes etc.) in question.

With a strongly typed language, a suite of fast unit tests can already be in feature parity with a much slower integration test, because even if mocked out, they essentially test the whole call chain.

They can offer even more, because unit tests are supposed to test edge cases, all error cases, wrong/malformed/null inputs etc. By using integration tests only, as the call chain increases on the inside, it would take an exponentially higher amount of integration tests to cover all cases. (E.g. if a call chain contains 3 services, with 3 outcomes each, theoretically it could take up to 27 integration test cases to cover them all.)

Also, ballooning unit test sizes or resorting to unit testing private methods give the developer feedback that the service is probably not "single responsibility" enough, providing incentive to split and refactor it. This leads to a more maintainable service architecture, that integration tests don't help with.

(Of course, let's not forget that this kind of unit testing is probably only reasonable on the backend. On the frontend, component tests from a functional/user perspective probably bring better results - hence the popularity of frameworks like Storybook and Testing Library. I consider these as integration rather than unit tests.)


Was 'unit' originally intended to be a test you could run in isolation? I don't think so. I'm not an expert in testing history, but this Dec 2000 Software Quality Assurance guide from the Nuclear Regulatory Commission defines Unit Testing as:

> Unit Testing - It is defined as testing of a unit of software such as a subroutine that can be compiled or assembled. The unit is relatively small; e.g., on the order of 100 lines. A separate driver is designed and implemented in order to test the unit in the range of its applicability.

NUREG-1737 https://www.nrc.gov/docs/ML0101/ML010170081.pdf

Going back, this 1993 nuclear guidance has simililar language:

> A unit of software is an element of the software design that can be compiled or assembled and is relatively small (e.g., 100 lines of high-order language code). Require that each software unit be separately tested.

NUREG/BR-0167 https://www.nrc.gov/docs/ML0127/ML012750471.pdf


See also: ANSI/IEEE 1008-1987


When I first leant about unit tests / TDD, I was confused because everyone assumes you are doing OOP. What am I supposed to do with my C code? I can just test a function, right? Or do I have to forcefully turn my program into some OO-syle architecture?

But then I realized it does not matter, there is only important thing about unit tests: that they exists. All the rest is implementation detail.

Mocking or not, isolated "unit" or full workflow, it does not matter. All I care about is that I can press a button (or type "make test" or whatever) and my tests run and I know if I broke something.

Sure, your tests need to be maintainable, you should not need to rewrite them when you make internal changes, and so on. You'll learn as you go. Just write them and make them easy to run.


For C code you can use link-time substitution and a mock generator like CMock (http://www.throwtheswitch.org/cmock).

Link-time substitution means that you swap out certain objects with others when you build your test binaries.

For example, let's say your production software binary consists of a main function and objects A, B and C. For a unit test you could use a different main (the test), object B and a mock for object C - leaving out A.


Read https://www.manning.com/books/unit-testing it's the best book on the subject and is presenting the matter with good evidence.

"Tip #4: TDD says the process of writing tests first will/should drive the design of your software. "

Yes and if that does not happen during TDD i would argue you are not doing TDD. Sure you always have some sort of boundaries but design up front is a poor choice when you try to iterate towards the best possible solution.


This article is internally inconsistent. It leads with considering "unit" to be "the whole system" being bad, and then tip #1 is to test from the outside in, at whole system granularity. On the other hand, it does point out that "design for test" is a nonsense, so that meets my priors.

By far the worst part of TDD was the proposed resolution to the tension with encapsulation. The parts one wants to unit test are the small, isolated parts, aka "the implementation", which are also the parts one generally wants an abstraction boundary over. Two schools of thought on that:

- one is to test through the API, which means a lot of tests trying to thread the needle to hit parts of the implementation. The tests will be robust to changes in the implementation, but the grey box coverage approach won't be, and you'll have a lot of tests

- two is to change the API to expose the internals, market that as "good testable design" and then test the new API, much of which is only used from test code in the immediate future. Talk about how one doesn't test the implementation and don't mention the moving of goal posts

Related to that is enthusiasm for putting test code somewhere separate to production code so it gets hit by the usual language isolation constraints that come from cross-module boundaries.

Both of those are insane nonsense. Don't mess up your API to make testing it easier, the API was literally the point of what you're building. Write the tests in the same module as the implementation and most of the API challenge evaporates. E.g. in C++, write the tests in an anonymous namespace in the source file. Have more tests that go through the interface from outside if you like, but don't only have those, as you need way more to establish whether the implementation is still working. Much like having end to end tests helps but having only end to end tests is not helpful.

I like test driven development. It's pretty hard to persuade colleagues to do it so multi-developer stuff is all end to end tested. Everything I write for myself has unit tests that look a lot like the cases I checked in the repl while thinking about the problem. It's an automated recheck-prior-reasoning system, wouldn't want to be without that.


> It leads with considering "unit" to be "the whole system" being bad

I do not understand this statement. Could you point out which part of the article you mean.


> when people started considering the system under test as the “unit”, it significantly affected the quality of test suites (in a bad way)


It should be "when people started considering parts of the system under test as the “units”"


Units are parts of the system. Perhaps you want to say something about granularity?


The claim in that phrase of the article is that a "unit test" was originally supposed to mean "a test that runs unitarily, by itself". Conversely, the more common interpretation of the term is "a test for a single unit of code".

So, according to that phrase, units are not part of the system. The units are supposed to be the tests.

Note that I don't agree with this semantics game. But that is the intended meaning of that phrase.


For some classic wisdom about writing tests see the classic "Art of Software Testing" by Glenford Myers. It's $149+++ on Amazon, but only $5 on ebay:

https://www.ebay.com/sch/i.html?_from=R40&_trksid=p3671980.m...

This was originally published before TDD was a thing, but is highly applicable.


Which companies or large projects use TDD at the moment? There's always such intense discussion about what it is and its benefits, yet I don't see anyone actually doing TDD.


I've been in several multi-million line codebases that all were built with TDD. It's possible.

The default way of organizing code with DI makes unit tests extremely expensive to write and maintain. Mocks should be banned if you want to add unit tests or practice TDD. Instead the tested code should be pure. Pure code is easy to test, even if it's calling a dozen helper functions.


Dozen-ish person team, 3 year project so far, billion dollar revenue company (not a software company), >500k LOC, TDD since the beginning. Have been doing TDD for 18 years or so. Still getting better at it.


I think it's both attention-getting and distracting to start with a definition of unit testing that hardly anybody uses. Now I'm not interested in the article because I have to see what your sources are and whether you're gaslighting me.

The reason people use the term unit test to mean the size of the system under test is because that's what it's generally meant. Before OO, it would mean module. Now it means class. The original approach would be to have smaller, testable functions that made up the functionality of the module and test them individually. Decoupling was done so that you didn't need to mock the database or the filesystem, just the logic that you're writing.

Some people disagree with unit testing and focus on functional testing. For example, the programming style developed by Harlan Mills at IBM was to specify the units very carefully using formal methods and write to the specification. Then, black-box testing was used to gain confidence in the system as a whole.

I feel that a refactor shouldn't break unit tests, at least not if the tools are smart enough. If you rename a method or class, its uses should have been renamed in the unit tests. If you push a method down or up in a hierarchy, a failing test tells you that the test is assuming the wrong class. But most cases of failing tests should be places where you made a mistake.

However, I agree that functional tests are the hurdle you should have crossed before shipping code. Use unit testing to get 100ms results as you work, functional tests to verify that everything is working correctly. Write them so that you could confidently push to production whenever they're green.


The article highlights this claim:

"Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases."

Whenever this is the case, it would seem at least one of the following is true:

1) There are many ways the 'little change' could break the system.

2) Many of the existing tests are testing for accidental properties which are not relevant to the correct functioning of the system.

If only the second proposition describes the situation, then, in my experience, it is usually a consequence of tests written to help get the implementation correct being retained in the test suite. That is not necessarily a bad thing: with slight modification, they might save time in writing tests that are useful in getting the new implementation correct.

I should make it clear that I don't think this observation invalidates any of the points the author is making; in fact, I think it supports them.


TDD can be valuable but sometimes hindering. I find myself often with an incomplete idea of what I want and, thus, no clear API to start testing. Writing a quick prototype -- sometimes on godbolt or replit -- and then writing tests and production code will actually yield me a better productivity.

I usually test all of the public API of something and only it. Exported functions, classes, constants and whatever should be tested and properly documented. If writing tests for the public surface is not enough, most likely the underlying code is poorly written, probably lacking proper abstractions to expose the adequate state associated with a determined behaviour (e.g.: a class that does too much).


I think this is only true upto some point. Ultimately the API of a unit of code is not fully defined by the public VS private language features, it is defined by the conventions for its use. If a field/method is public but it is documented to not be used by anything except, say, some tests, then it shouldn't be considered part of the actual API.

Even in languages which have a very powerful type system, there are assumptions that have to be left to documentation (e.g. that the Monad laws are respected by types which match the Monad typeclasses in Haskell). Testing parts which are documented to not be relevant is often actively harmful, since it causes problems with later changes.


I think it also heavily depends on the language you are working with. For instance, unit tests are much more important in a duck-typed language than a strongly typed language, since the compiler is less capable of catching a number of issues.


Huh, I found this more interesting than I thought I would. I hadn't heard before that the "unit" in "unit test" just meant "can run independently". I once failed an interview partly because of only writing "feature tests" and not "unit tests" in the project I showed. But actually those tests didn't depend on each other, so... looks like they really were unit tests!

Anyway, I'm still not totally sure about TDD itself - the "don't write any code without a red test" part. I get the idea, but it doesn't feel very productive when I try it. Of course maybe I'm just bad at it, but I also haven't seen any compelling arguments for it other than it makes the tests stronger (against what? someone undoing my commits?). I think even Uncle Bob's underlying backing argument was that TDD is more "professional", leading me to believe it's just a song-and-dance secret handshake that helps you get into a certain kind of company. OR, it's a technique to try and combat against lazy devs, to try and make it impossible for them to write bad tests. And maybe it is actually good but only for some kinds of projects... I wish we had a way to actually research this stuff rather than endlessly share opinions and anecdotes.


If I have the tooling all set up (e.g. playwright, database fixtures, mitmproxy) and the integration test closely resembles the requirement then I'm about as productive doing TDD as not doing TDD except I get tests as a side effect.

If I do snapshot test driven development (e.g. actual rest API responses are written into the "expected" portion of the test by the test) then I'm sometimes a little bit more productive.

There's a definite benefit to fixing the requirement rather than letting it evaporate into the ether.

Uncle bob style unit test driven development, on the other hand, is something more akin to a ritual from a cult. Unit test driven development on integration code (e.g. code that handles APIs, databases, UIs) is singularly useless. It only really works well on algorithmic or logical code - parsers, pricing engines, etc. where the requirement can be well represented.


BitD (Back in the Day), “unit tests” were independent tests that we wrote, that tested the system. It applied to pretty much any tests, including what we now call “test harnesses.” There weren’t really any “rules,” defining what a “unit test” was.

The introduction of TDD (before it, actually, as testing frameworks probably had a lot of influence), formalized what a “unit test” is.

In general, I prefer using test harnesses, over suites of unit tests[0], but I still use both.

[0] https://littlegreenviper.com/miscellany/testing-harness-vs-u...


Thats a new term for me, thanks for pointing out.


> arguments for it other than it makes the tests stronger

It's supposed to lead to a better design. It's easy to write some code that maybe works and you can't actually test (lots of interdependencies, weird state, etc.), or you only think you're trying correctly. But making the test first forces you to write something that 1. You can test (by definition) 2. Is decoupled to the level where you can check mainly for the behaviour you're interested in. 3. You won't bypass accidentally. It's not even someone undoing your commits, but some value in the call chain changing in a way that accidentally makes the feature not run at all.

I've seen it many times in practice and will bet that any large project where the tests were written after the code, has some tests that don't actually do anything. They were already passing before the thing they're supposedly testing was implemented.


> not totally sure about TDD itself - the "don't write any code without a red test" part

I'm not into TDD, but I'm absolutely into "never skip the red phase".

After fixing a bug or writing a test, I always revert and repeat the test. Same when testing manually (except for the absolutely trivial). You wouldn't believe how often the test then just passes. It's the hygienic thing to do. It's so easy to fool yourself.

About half of the time I realize my test (or my dev setup) was wrong. The other times I learn something important, either that I didn't fully understand the original problem, or my bugfix.


"Never trust a test you didn't see fail"


> OR, it's a technique to try and combat against lazy devs,

I think that many of these practices are a result of programmers starting to code before having understood the problem in detail and before thinking through what they want to accomplish. Many times programmers feel an itch in their fingers to just start coding. TDD is an improvement for some because it forces them to think about edge cases and how to test their work results before starting to code their implementation. And bonus: they can do so while coding tests.


One way I think about it is that unit tests help me maintain invariances that the compiler for the current language can't do for me.

That saves me from having to test every combination of inputs.


Michael Feathers in "working with legacy code" tackled the unit test definition, and in the end defined a unit test as a test that runs fast.

Also a very confusing topic for me early on as I tried to grasp how to actually do it. So I read Kent Beck's book on TDD and was even more confused because he did not write tests in isolation there, and I was told to write unit tests like that. But then it hit me: most people are just cargo cultists and repeat what somebody told them.


One of the best things in bazel is that it's very straightforward when comes to tests - https://bazel.build/reference/test-encyclopedia#role-test-ru... - you need to classify your test as short, moderate, long, etc., but also how much memory it takes. In addition if you tests requires isolation, put it too.

So potentionally one can combine multiple independent code bases, that share the same testing framework (say gtest), into single executable, and distributevly execute them using sharding on multiple machines (e.g. depending on what changes were detected in the presbumit/preflight/whatever you call phase).

And because the tests were marked as such and such, it'll use that knowledge to group them better, and if they no longer qualify, an error would be printed out, and how to fix this (if it's worth fixing).


> Write the tests from outside in. With this, I mean you should write your tests from a realistic user perspective. To have the best quality assurance and refactor resistance, you would write e2e or integration tests.

Yah, yah. But good look trying to figure out what went wrong when only the failing test you have is an e2e or integration test.


My e2e tests automatically dump a multitude of debugging information and throw open a console from which I can quickly fire up code debuggers, logs, screenshots, network traces and browser traces in a few seconds.

Building test tooling is hard I'll grant you. It requires engineering chops and being up to date on the latest tooling. But not luck.


I think this article misunderstands the motivation for unit tests being isolated and ends up muddying the definition unnecessarily. The unit of unit testing means we want to assign blame to a single thing when a single test fails. That’s why order dependencies have to be eliminated, but that’s an implementation detail.


<< Originally, the term “unit” in “unit test” referred not to the system under test but to the test itself. >>

Retroactive continuity - how does it work?

For today's lucky 10,000: "unit test", as a label, was in wide use prior to the arrival of the Extreme Programming movement in the late 1990s. And the "unit" in question was the test subject.

But, as far as I can tell, _Smalltalk_ lacked a testing culture (Beck, 1994), so perhaps the testing community's definitions weren't well represented in Smalltalk spaces.

"The past was alterable. The past never had been altered."

(Not particularly fair to single out this one author - this origin myth has been common during the last 10 years or so.)


At my job we wrote tests in Smalltalk and that was before Kent's work. It wasn't the later tests-first discipline. If I recall correctly write tests later was fairly common then in ST shops. SUnit also had the virtue of standardizing the terminology and testing frameworks.


I am not going to say that some of these testing religions don't have a place, but mostly they miss the point. By focusing on TDD or "code coverage" the essential questions are missed. Instead of focusing on methodology instead I recommend asking yourself simple questions starting with:

1. How do I meet my quality goals with this project (or module of a project)?

This is the root question and it will lead to other questions:

2. What design and testing practice is most likely to lead to this outcome?

3. What is the pay off value for this module for a given type of testing?

4. How can I be confident this project/module will continue to work after refactoring?

etc. etc.

I have used TDD style unit testing for certain types of modules that were very algorithmic centric. I have also used integration testing only for other modules that were I/O centric without much logic. I personally think choosing a testing strategy as the "one right way" and then trying to come up with different rules to justify it is exactly inverse of how one should be thinking about it (top down vs bottom up design of sorts).


I took a break from a big tech and joined a startup. It was infuriating how people were actively opposing my TDD approach. It was redeeming when I was shipping one of my projects (service for integration with 3rd party) - and product managers and others were expecting we will need weeks to test and fix the bugs - but instead the other party just said "it's perfect, no comments".

All because I was "wasting my time" on writing "useless tests" that helped me identify and discuss edge cases early in the process. Also, I could continue working on parts even while waiting for a response from product managers or while DevOps were still struggling to provision the resources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: