It appears as if such an essay providing great advice. But it it?
What is the reproducible, objective process or behaviour this essay is advocating? If I give this essay to five teams and then look at what they are doing to follow its advice, will they all be doing the same thing? What results does this essay suggest we will obtain from following its advice? How can we be sure it will be better than the results we would obtain from mindlessly chaining ourselves to the red/green light?
For better or for worse, the red/green light protocol is specific. As such, it can be criticized. A specific protocol has edge cases. You can argue slippery slopes. It's a stationary target. Whereas, vague and nonspecific generalities are more difficult to criticize. There are no details, so how does one identify edge cases?
It's great to hear about this one specific anecdote. beyond that, what am I supposed to walk away with besides the knowledge that slavishly following a process can be bad?
Today I was working with a new library. I wrote a 30 line sample program just to make sure I understood the library semantics, and that it would at least do what I thought in a situation similar to the actual code base.
I called over another developer to sanity check my thinking, and he spent 10 minutes going on how I should have just written tests for it in our codebase instead.
He would not take any of these as valid discussion points:
* I spent nearly 7 minutes on my sample code, writing tests including the appropriate mocks for our codebase would have taken 70.
* I didn't even know we would be moving forward with that library, why start integrating it into our infrastructure?
* I'll write tests of our code using the library, when we figure out what the thing we're building actually does.
* I would have to write that same sample program before I can even write tests, to know how to use the library.
I do know now not to include that guy in my thinking process ever again, if tests aren't perfect, I will lose any time including someone might gain by dealing with his TDD is my holy saviour rants about how I should learn.
Worst part: this guy takes days to write "tested" code that still doesn't actually do what it is supposed to.
It's ok to be untested for discovery. Even in TDD.
In pure TDD you would be supposed to throw it away afterwards and redo it clean (with learnings) based on test-first.
It doesn't seem to matter much that the end result will be the same if you write up some tests for that code, or if you throw the code out, write tests, then just write the same code again. If the tests didn't come first, it's not TDD, and now we can't follow some holy procedure.
Pure madness and everything that is wrong with TDD is summed up in that simple statement: "In TDD you would throw it away and redo it".
But if you are working on critical parts or products you know will be in use for a longer time it is usually the right approach.
To defend why you should throw spike results away in TDD:
- Usually during a spike you learn how to use an api/framework. You are focusing on very low level solutions.
- You made it work the first time. But now when you do it again you wouldn't do it the same way. You don't redo usually.
- Usually the second time the solution is clean and proper strcutured.
TDD is very focused around proper software design. And for the areas where you actually need that it is a great tool.
To the GP, yes, the less you write tests, the more your code base rolls up into a giant ball of hair and wax, and you can't investigate the behavior of any part of it without mocking out the rest of it. That is why we say testable code is usable code.
You do not want to wind up with important code that is not tested. I spent hours today chasing down bugs in code that I wrote over the last month that I would have never known were there if I had not been writing tests. I value tests.
But on the other hand there is a valid role for experimenting and trying things out. In that mode you're not writing production ready code - you're throwing stuff together. When you write code in that mode, following a testing mantra is worse than useless. I say worse than useless because the mental energy spent testing gets in the way of experimenting and refactoring.
The right way to reconcile those two modes is to write your experimental code, think about it, figure out how you want it written, then once you know how to write it, go back and write it "for real", complete with tests. In fact the carefully tested code that I was writing today is based in large part on ideas that I figured out in a throw away prototype written some time ago, in a different language, that had very little in the way of tests.
If all you got out of the article was that slavishly following a process (like TDD) can be bad, I'll consider that a win for my writing. Because my general feel of the community is that testing is some sacrosanct practice that must be adhered to without question. And I question that.
I write tests. I'm not anti-test. I just imagine I write far fewer tests than most. In my opinion we should constantly question whether a certain practice (in this case, writing a specific test) is worth it. I think that's a healthier approach to development than blind red/green light.
My biggest problem with your statements comes down to the idea that you claim tdd can be bad (particularly, it would seem, for startups) and yet you do not give an alternative which has equal or greater benefits without the perceived overhead cost. You say we should constantly questions whether certain practices are worth it in stead of blindly following red/green lights. Fair. But how will questioning practices give you the knowledge that your code is not broken when a junior developer makes a change "somewhere"? How will questioning practices allow you to better refactor when you pivot quickly?
I have worked in many startups and I know time is money and bank roll is king. But I believe that tdd allows you to keep those financial concerns in the black longer. If not tdd, what is an alternative method? "not writing code in tdd", is certainly an alternative one, but is it a good one?
If you aren't market tested and/or market validated, I would suggest that getting something out the door is probably more important than test coverage. Because odds are those tests you wrote will be worthless when you pivot. There's a cost to writing tests, and the benefit from those tests are increased when they are reused. If you use a test once, then pivot and have to scrap it, you would have been better off hacking your way through the first release and getting to that first pivot faster.
Once you start to gain market traction and you actually know what the market wants you to be then your world changes and your approach to writing code should as well.
And as for junior devs, that's a tough one and a reason I avoid hiring junior talent. Not avoidable for all managers, but I didn't write my post to be the new bible. Quite the opposite: I think we're a bit too worried about making sure we're doing what everyone else says we should be doing. Rails is omasake and all...
I'm just tired of hearing everyone talk about TDD as if it were a given. Protecting against SQL-injection? That's a given. Being a TDD-based web startup? Not so much.
The irony is that if testing is over-hyped, it's not where it would be much needed. Too many big-budget long term developments don't use testing at all. Because there are no tests people are scared to make changes because it might break something, so they never refactor and the code becomes a big mess which obviously makes it even more scary to change.
I felt like a hungry man reading an article about how food is over-hyped :)
Is your community full of laptop researching prototypers, or million request per second distributed system operators?
Right, but in this case, he's writing about testing. One would argue that relevant now because the foolishness around how to test and how often has reached ridiculously dogmatic levels.
> What is the reproducible, objective process or behaviour this essay is advocating?
Why does it have to have one? There are many articles that make it to HN that call out software practices that are just ridiculous and help spark a discussion about them, why can't this be one of those? why can't it be the article that makes a young developer who's slowly turning into a TDD zombie "must ... test ... everything ... all ... the ... time" pause for a second and go "hmmmm"?
Really, the only cure is judicious/experienced application of development practice. Is unit testing always appropriate? No! Is it sometimes appropriate? Yes! But that is not an easy message for people to understand.
I've seen unit tests on prototypes before because that was standard practice, only to have the entire prototype scrapped at the end...because that was also standard practice (never ship prototypes :) ). Confusing.
This is one of those issues you are supposed to lie about.
Tests are like crime in politics -- no politician is going to say "I am soft on crime, I think we should reduce sentences." So crazier and crazier laws are created. Mandatory minimum sentences for possessing small amounts of drugs, all kind of craziness. People know it is crazy, but nobody is capable of speaking up and remain standing.
Same thing with tests. Nobody can say "fuck it, stop writing tests, the customer will never run this code, or this is too fucking obvious, test stuff that matters". That is considered irresponsible. Everyone is supposed to worry about not having enough tests.
Now there a subtle difference and that is when it comes to shipping on a schedule and in many companies tests are ignored -- BUT silently. Everyone loves talking about tests but try and tell you boss you spent 2 weeks on adding tests to your code. They might not like. Try to double the time you promise to do a feature by saying I need to write tests for it. Or if you are given free rein try saying I won't do all the new awesome features, I'll write more tests -- it will be approved probably but everyone in the end will praise the guy who chose to work on and deliver features -- even though they might be completely buggy and unusable. So that is the other side if you will.
I don't know about everyone else, but I definitely worry about having too many tests. For every test I write, I have to weigh how useful the test is versus how likely it is that the thing it is testing will change. If its usefulness is overshadowed by the likelihood that it will be a burden later on, it does not get written.
That, IMO, is the best argument for TDD. By having a process that forces tests first, you never get into the situation where you are cutting critical tests for the schedule's sake.
For me, I ask the question, "Is this code adding more value than other code I could be writing?" If the answer is no, I do something else. Of course, this is completely subjective, but my years of experience count for something.
On the other hand, with a process like TDD you also never get into the situation where you can cut non-critical tests.
Does each test tell you something valuable?
Is the cost of implementing and maintaining that test less than whatever value the test offers? That is, is the test cost-effective?
Could you have found the same information more effectively some other way? That is, what is the opportunity cost of writing and maintaining that test?
git rm <code that customer will never run>
The customer will never run the code, but that doesn't mean they don't buy your product because of the code.
This is for a small niche industry where everyone knows everyone a bad word or hit can mean the end of the company.
Really? I mean, tests might be overhyped, but so far it looks like the author draws incredibly general conclusions from some isolated incident. Why did the tests become irrelevant? Is their scoring algorithm now doing something completely different? Do they now have different use cases? Were they hard-coding scores when the actual thing that matters was, say, the ordering of the things they score?
> Only test what needs to be tested
well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?
I used the phrase "only test what needs to be tested" intentionally. How should I know what you need to test? But if you accept that you should only test what needs to be tested, then there is an implication that some (I'd venture to say most) of what you write doesn't need to be tested. And that's a liberating concept. You aren't obligated to test everything, but you should test what really matters.
What needs to be tested is probably directly related to the size and stability of the company (assuming size and stability have a positive correlation). I would venture to surmise that young start-ups have almost no need to test anything. That comment isn't meant to be inflammatory, but I look at testing too early like optimizing too early. There's no reason to shard a database until you absolutely have to, and I don't think you need to test something until you know it's mission critical.
For example, Stripe offers a service that were components of it to fail would jeopardize their entire business. Their code base is obligated, by its nature, to have more stringent tests. But the startup in the garage next to yours who is still trying to determine MVP and will probably end up pivoting five times in the next two weeks? Save yourself a lot of grief and don't worry about the tests. Once you know what you are going to be, once you know what you can't afford to lose, well, wrap those portions up with some solid tests.
In -my- anecdotal experience, this is the excuse developers use while they write bug-ridden code without tests. Maybe they save a little time at the early stage of the development cycle, but the real reason seems to be avoiding work they don't find to be as fun as hacking out code.
After that code is hacked out, we then waste a bunch of cycles across multiple teams while we find and fix out the most obvious of their bugs.
After the release goes out, we spend even more time finding previously undiscovered bugs and fixing them -- which hits actual user satisfaction.
All because some lazy sod thought he was too smart to need tests, and that he could be trusted to decide when his code didn't require them.
Fuck that. I've been doing this too long and seen this happen too many times. If you're on my team, you write the damn tests.
- They don't see the benefit because they are forced to write lots of tests that do not, and never would have, provided value. (like stupid getter and setter tests being the canonical example)
- Writing tests is hard or slow, either due to things not being tooled out or because the code has not been architected to be testable.
- Writing tests causes lots of false positives, fails to catch real bugs, or end up being extremely brittle. Classic example here is religious mocking of the database in the name of performance, yet fails to catch most bugs since it's, you know, not actually hitting a database.
- The suite is slow or causes deployments to slow down such that adding tests adds a "tax" to everything else people do.
I've had enough experiences with all of the above phenomena so as to become completely skeptical of extensive testing since unless you are a real pro you will inevitably wane on at least one of these things.
You use tests to weed those people out. That's fine, if it works for you.
I'm anal about knowing everything that's happening in my code. I compare git commits to ticket numbers to keep abreast of how things are changing over time. A bug comes through on Goalee and I know which handful of files it will be in. Honestly, I'm quite uptight about this stuff. I catch most shortcuts personally before they go out, and have the appropriate conversation with the responsible party.
Don't make stupidly broad statements like "I've never seen untested code that wasn't crap". Well unless by "untested" you mean "code that was never even ran or QA'd a little". I'm pretty sure you mean "code that didn't go through TDD" though, in which case - you are very, very wrong.
I'd argue Linux has significantly more than 'no' tests: http://stackoverflow.com/questions/3177338/how-is-linux-kern... -- it's not TDD, and it's not centralized, but then again, neither is Linux development. And while you might convince me that pre-XP kernels went without unit testing, I'd point out we hardly call them bastions of stability and security. Try certifying a 360 or ModernUI title under the Microsoft banner and see if they'll let you get away without writing a single test. I'd wager anyone who successfully accomplishes this has spent far more effort arguing why it should be allowed than writing some tests would take.
And, given the sheer girth of their security processes (patching, QA certification, tracking) and toolset (through virtualization, fuzzing, unit testing, CI build servers), it would take much more effort to convince me that they do absolutely no unit testing whatsoever on their kernel in this day and age. Far more than you could reasonably give: I suspect such convincing would require breaking several NDAs and committing several felonies.
I have. The only reason they work at all is the enormous number of man years spent banging on the poorly tested code after its written. The number of regressions that pop up in core subsystems of the kernel are staggering. VM bugs still happen in commercially shipping systems like OS X because of small changes to extremely fragile and untested code.
Linux isn't a paragon of code quality, and neither are most Network stacks. If anything, you're proving my point -- code written in that methodology has required enourmous amounts of work to stabilize and prevent/fix regressions going forward.
Case in point; a few years ago, I found a trivial signed/unsigned comparison bug but in FreeBSD device node cloning that was triggered by adding/removing devices in a certain order; this triggered a kernel panic, and took teo days to track down. The most trivial of unit tests would have caught this bug immediately, but instead it sat there languishing until someone wrote some network interface cloning code (me) that happened to trigger this bug faaar down below the section of the kernel I was working in.
This kind of thing happens all the time in the projects you list, and its incredibly costly compared to using testing to not ship bugs in the first place.
That's the statement I was replying to - and yes, code with tests is going to be better. But to call all code without tests crap is directly saying that the Linux code is crap, that the BSD code is crap, tcp/ip code is crap, etc. I disagree completely - the code is awesome and has built billion dollar industries and has made most peoples lives better. Could it be improved with more tests? Sure. Everything can be improved. Calling it 'crap' however is insane.
I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.
It shipped. That doesn't prove your point. Code that is expensive to produce and ships is better than no code at all. That doesn't mean that this is the best way to produce code.
What exactly do you think is awesome about the code?
> I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.
I did produce some of that code.
Nothing that you're saying justifies why producing worse code, less efficiently, is better than producing good code, more efficiently. Your position assumes a false dichotomy where the choice is between shipping or not shipping.
The truth of the matter is that the legacy approach to software development used in those industries has more to do with the historical ignorance of better practices and the resulting modern cultural and technical constraints. In the era that most of that code was originally architected, it was also normal to write C programs with all globals, non-reentrant, and non-thread-safe. Are you going to claim that this is also the best way to write code, just because it produced some mainstays of our industry?
Well, you say it's a false dichotomy, but if TDD really does reliably produce better code and more efficiently than a non-TDD approach, how come hardly any of the most successful software projects seem to be built using TDD? It's been around a long time now, with no shortage of vocal advocates. If it's as good as they claim, why aren't there large numbers of successful TDD projects to cite as examples every time this discussion comes up?
What exactly makes you think there aren't?
The fact that every time I have ever had this discussion, the person taking your position comes back with a question like that instead of just listing counterexamples.
Can you name a few major projects, preferably on the same kind of scale as the examples that have been mentioned in other posts in this discussion, that were built using TDD? Not just projects that use some form of automated testing, or projects that use unit tests, or projects that use a test-first approach to unit tests, or anything else related to but not implying TDD, but projects where they really follow the TDD process?
The development processes for major open source projects tend to be public by their nature, and there are also plenty of people who have written about their experiences working for major software companies on well-known closed source projects, so telling a few TDD success stories shouldn't be a difficult challenge at all.
Operating systems? No. Sorry. Operating systems, as a rule, predate modern practices by some 1-4 decades. Making a kernel like Linux, Darwin, or FreeBSD testable would be a task somewhere between "massive headache" and "really massive headache". I've done some in-kernel unit testing, but only on our own driver subsystems that could be abstracted from the difficult-to-test underlying monolithic code base.
Outside of kernels/operating systems, just Google. Unit/automated testing is prevalent in just about all modern OSS software ecosystems.
A few examples, off the top of my head.
Apache APR: http://svn.apache.org/viewvc/apr/apr/trunk/test/README?revis...
Go: https://code.google.com/p/go/source/browse/src/pkg/math/all_... (see all *_test.go files)
However, I would also say that if you've never seen code untested in that way that wasn't crap then you're not looking hard enough. Some of the most impressively bug-free projects ever written used very different development processes with no unit testing at all. Space shuttle software, which is obviously about as mission critical as you can get, went in a very different direction. Donald Knuth, the man who wrote TeX among other things, isn't much of a fan either.
For example static code analysis is sometimes superior to testing, because it can prove for absence of some wide classes of bugs.
And in code reviews we often find subtle bugs that would be extremely hard to write tests for (e.g. concurrency related bugs).
You can also decrease bug rates by writing clean, understandable code. Which is often related to hiring a few great programmers instead of a bunch of cheap ones.
The trouble is that you have to do this for a while to get to the point where you can both feel and recognize pain, or the lack thereof.
2) Test enough during development to support later regression tests, and to make sure that the design is testable. This can usually be achieved with less than 20% coverage. But if you write production code that's so screwy it can't be regression tested, then you've got big problems.
3) Test any parts that scare you or confuse you or make you nervous. Use "test until you're more bored than scared" here.
I am not the OP, but I think unit tests are more useful for code that has a lot of edge cases (such as string parsing) and for code which causes the most bugs (as seen in black box testing or integration tests.)
Also, some code is just easier to develop if you create a harness that runs it directly instead of having to work your way to the point in the program where it would execute it. If you do this, you might as well turn it into a test.
* A test should accept any correct behaviour from the tested code. Anything which is not in the requirements, should not be enforced by the test.
* A test should not use the same logic as the code to find the "right" answer.
* A function whose semantics are likely to be changed in the next refactor should not be invoked from a test.
* Whenever a test fails, make a note of whether you got it to pass by changing the test or by changing the code. If it's the test more than 2/3 of the time, it's a bad test.
* If you can't write a good test, don't write a test at all. See if you can write a good test for the code that calls this code instead.
Where I've found a ton of value has been in writing almost artificial-intelligence driven integration tests. Write a bot that uses your service in way, as fast as possible. Run fifty of these bots simultaneously, and see what happens. Then have some way to validate state at various points (either by tallying the bots actions, or sanity checks). Bugs will come fallout out of the sky. Then, in the future, when you get a bug, the challenge becomes to update the integration test bots behavior so that they (preferably quickly) can reproduce the bug.
I mean, I think that this is dependent on the domain of your software, but I think it's a good strategy for many areas.
Yes, it's easy to write bad tests. But that does not reduce the value of good tests!
result = get_foo(bar=12)
This lets you verify that this particular branch (when bar=12) is executed, and that your results are as you expect. If you change some of the underlying calculations, things can break (as you get different answers), but then you at least have a test that lets you ensure that changing the answers is what you want to do. Sometimes, you want to change the way you calculate something and get the same answer, after all.
So what do you do? Typically one writes a mocked up service that returns the expected results, pass it in somehow, and write your tests like you did above. So if you have a lot of services, you end up doing what I said - testing lines of code. Here you're testing that FooService gets called. If it also called BarService, you'd be testing that BarService gets called. And so on.
But then later, we decide that FooService is no good - we want, nay demand - a FooService2, with a new API. However, get_foo is to behave the same. So what now? Only one choice - update all of the tests to have a mocked up FooService2.
As the code base grows, I find this becomes annoying to maintain (although to some extent a necessary evil, of course).
The alternative I'm saying is that instead of spending a ton of energy unit testing get_foo, hammer on it and write good state validators. Your unit test shows that get_foo for bar = 12 returns a unit_price of 97.12. Great, but I don't think that's as interesting as building a fake database (so we know what the expected returns are), and then making 50,000 requests per second with random data for 12 hours, and then through runtime assertions and as a post-process validating that the service actually did what it was supposed to.
In the example you gave (of a mocked up service), what would happen if you didn't mock the service? That would mean you couldn't call the service in your tests, right? (Except the tests that tested the service integration explicitly.) How could you change your design to make your code testable under that constraint?
The answer depends on your situation, but one way I've seen this play out in the past is to make more value objects and create a domain layer that operates on value objects rather than the service. The service is responsible for creating and persisting the value objects, but the domain layer handles the real work of the application.
As a result, testing the important part of the application can be done completely independently of the service. No mocks required. Now I can upgrade to FooService2 without changing any tests except for FooServiceConnection's. My value objects and domain layer remain unchanged.
If you're bored while programming, look for the duplication hiding in your design. In the case you're describing, it's hiding in the oft-repeated dependencies on FooService.
A 12 hour stress test is going to catch different things than a unit test. A simple suite of regression tests can be automatically run by a continuous integration tool to flag if the build is broken and alert the dev/team so things can be fixed quickly.
In this example it seems you are thinking of writing tests as a separate step from writing the code, which is part of why it seems like a chore. Make small changes, update the tests until everything passes & new code is exercised, commit, repeat. (Or if you are better disciplined than I, update your tests first and then write code until tests pass, repeat) Monster refactors, while sometimes necessary, are best avoided when possible.
Suppose there's a race condition in some library that you're using. No unit test in the world is going to catch that. Now, unit tests certainly have their place - but my point is that from what I have seen, unit tests catch the boring bugs, while integration tests catch the interesting ones(by interesting here I mean obscure or subtle - deadlocks, invalid state, etc), while at the same time inferring the boring bugs (ie. add(5, 3) returns 7 instead of 8), so that hour for hour, especially with limited resources (ie. a small startup), integration testing has the potential to give you a lot more value.
However I would still add that simple suite of regression tests (in my CRUD app these are almost entirely integration tests), often speed up development by more than the time it takes to write the tests in the first place. So to say a startup doesn't have time for them seems shortsighted.
Recently I had a code base where we decided that due to a new delightful feature our customers were going to be quite pleased with we would need to switch out our old queuing system for a new one. In doing so well more than half of our huge test suite turned red. This told us two important things 1) that the queuing system touched a lot of areas of code we needed to think about and 2) where in the code the queuing system had touch points.
Ultimately we were able to put in the new queuing system, fix the areas that were broken by the change, and have the confidence at the end of the process that we had not broken any of the areas of the code that were previously under test. (This does not mean that our code was bug free of course, only that the areas under test were still working in the prescribed way, but that is a discussion for a different article.)
I believe that this would have taken a team of people weeks to do previously. I was confident that the change was ready after only 3 days with 2 developers. I would not trade my tests. There is a cost associated with everything, but I believe tests are the least costly way to get highly confident software built.
What happens is usually that you have 14,000,000 compile errors. Well, congratulations! That's the difficult part over. Now it's time to relax! Start at error 1, and fix it. Repeat. Every now and again, build. Once it builds again, it will usually work; if not, it's always something quite simple. If you have enough asserts in your code, then you can be doubly confident that there's nothing wrong with the new code.
I've had good success with this approach, for all sorts of changes, large and small. I've given up being surprised when the result just builds and runs.
I have no real idea, how you would do this reliably in a large program written in something like python. Sure, you'd fix up your tests easily enough... but then what? Don't you have a program to maintain as well? :)
Jonathan Blow wrote about this years ago: http://lerp.org/notes_4/index.html
> There's often one corner case that works just a little bit differently.
You might just as well miss that one corner case in your test suite. This is the problem with tests - you can never be sure.
I'm not saying it can't go wrong when you're finished - it just usually doesn't. And it just doesn't take much effort to fix when it does.
Anyway, while it was a - slight :) - exaggeration for effect, even in well-factored code small changes can have a wide-ranging repercussion.
You're right people pay for features, but lagging a little at the beginning to establish good TDD culture pays off in spades later on. Shipping product is something you have to do continuously, and you arguably create more value as time goes on, so ensuring you can continue to ship product in a timely manner is a great thing for organizations.
Being able to test that the whole system works as intended gives a better return on investment, in my experience, than testing small bits in isolation. The errors are, often as not, in the glue between the small bits.
The general workflow I've grown fond of is to write mostly higher-level tests that test the system as a whole, and then only write fiddly unit tests when there actually is a bug to fix. Those unit tests then stick around, but I don't feel bad about blowing them away without replacing them if the thing being tested changes significantly enough to make them useless.
Originally I was going to title this "Tests are overrated" but that both seemed like linkbait and distorts my actual opinion.
I've been on projects where they tested to make sure that certain values were unique in the database and I couldn't help but think they: didn't understand the framework; didn't understand what tests are meant to do; didn't understand database key constraints; or all of the above.
Tests have their place. But they are a means, not an end. And I see a lot of people confusing them for the end.
But, again, I don't dislike tests. I just dislike what I perceive to be a current overemphasis.
Yes, it is time-consuming and invisible to your customers, but I imagine so is setting up a frame for a house instead of just stacking bricks. The structure, flexibility and peace of mind you get from a comprehensive test suite pays off when you have a 50-brick tall structure to put a roof on.
The current shop I'm at is maintaining a huge code base, parts of which go back 20 years. Because test coverage is so low, there is a real reluctance to refactor.
The first thing I did when I started here was to clean up the code, renaming miss-named variables to get it in line with the coding standard, adding autopointers here and there to head off memory leaks. By gosh, I nearly got fired.
If you are going to fearlessly edit your codebase, you need to know that regressions are going to be caught. You need automated testing.
What I don't understand about comments like this is that a whole section of your code, both runtime / deliverable code and test code had become worthless. But, you only seem to view the discarded test code as wasted effort. Either the tests have value or they don't. And, if you write tests, and then discard the code they test, you'll likely also discard the tests. But, that doesn't change whether or not the tests had value, nor whether the new tests that you'll write for the new code have value.
Not so. The code demonstrated that the first algorithm wasn't good enough and provided the experience needed to write the second one. The tests (hopefully) made the first algorithm's code maintainable, but it turns out there was no need to maintain it.
The same thing with green fields development. I've seen steaming piles of shit with huge test suites. Absolutely zero insight into the problem. No craftmanship at all, nothing interesting about the application. But a set of tests.
It's like the suite is proof enough that there was a job well done. I fear that a lot of development is devolving into nothing more than superstition and hype, backed up by agencies that like to bill a lot and amateurs who need a justification for their timelines and ineptitude.
I wrote a similar piece here: http://blog.circleci.com/testing-is-bullshit/
Unit tests are more than any other factor a design tool. Like any other design tool (uml, specification, etc), when the design needs to change, you throw them out. If it takes longer to design a system with unit tests than without them 1 of 2 things is true 1) you should not write unit tests 2) you should learn how to write unit tests.
I think it just goes to the way human beings handle original ideas, first they fight them, then they embrace them, then they take them to ridiculous extremes as they try to substitute rules for common sense in applying them.
You can see it in politics, religion and almost any really popular area of human endeavor.
Testing falls in the same category, I have had interviewers look me in the eye and in all seriousness, declare that developers who don't write tests for their code should be fired, or that their test suites cannot drop below x% of code coverage. Dogma is a horrible thing to afflict software teams, whether it is pair programming, or mandatory code reviews, if there are no exceptions to the rule or situations where you don't have to apply it, its probably a bullshit rule IMO.
Me, I like to ship shit, and I like to look at tests as a way to help me ship shit faster, because the less time I spend fixing regressions the more time I can spend actually getting more code/features out that door.
So my only metric for writing tests is this ... "is this going to help (me or a team member) not break something months from now, when I change code somewhere else".
I honestly don't care about anything else.
This is particularly evidenced by the fact that Haskell--certainly a "strongly typed functional language"--also has some of the best testing facilities of any language I've seen. QuickCheck simply can't be beat, and you can cover other parts of the code with more standard tools like HUnit.
Now, there is some code--only very polymorphic code--where the type system is strong enough to give a proof of correctness. For that sort of code, which you're only likely to encounter if writing a very generic library, you can get away without testing. But that is not even the majority of all Haskell code! And even there you have to be careful of infinite loops (e.g. bottom).
Comments like this make functional programmers sound much more arrogant and clueless than they really are.
There are two kinds of test units: workflow and functional.
1 - Workflow test units are a waste of time because no single test unit stays valid when there is a change. In other words, whenever we added/removed steps in the workflow, 99% of the time we have to change the test unit to fit that new workflow which breaks the concept of "write once, test all the time" concept. In my experience, having proactive developers who test areas around the workflow that they changed is much faster and reliable.
2 - Functional test units are great. They test one function that needs certain parameters and is expected to spit a certain output i.e function to calculate dollar amounts or do any king of mathematical operations.
However, these functions tends to stay unchanged during the lifetime of a project. Therefore, the test units are rarely run.
From my experience workflow changes/bugs represent 80% of the problems we face in enterprise software. Functional changes/bug are rare and can be detected quickly.
This is why I agree with the author premise that unit testing is overhyped.
However if the test exceeded this cost/benefit metric where it wasn't really helping me get the feature written, out the window it went.
Helped when I went to refactor/fix fairly major chunks of the backend as all those tests from back when I did initial development were still there. It wasn't really "test first" because I didn't know what to test for until the basics of the API endpoint were in place.
This was Python if it matters (default unittest2). I do mostly Clojure when I have the choice lately.
It's not even really a matter of "does it need tested?", although you should be asking that question and building up the coverage for the critical bits.
For me it was a question of, "is this going to save me time/sanity?"
I advocated tests to the other engineers at my startup only when they were experiencing regressions/repeat bugs. I left them alone about the lack of test discipline otherwise.
My Clojure code tends to "just work" (after I'm done experiencing insanity and disconnection from reality in my REPL) to the degree that I mostly write tests when I'm making a library or touching somebody else's Git repo.
This is all fitting though. I use Clojure instead of Haskell precisely because I'm impatient, lazy, etc. Would kill for a "quickcheck" for Clojure though.
This whole debate has a whiff of excluded middle (we have 3 UND ONLY 3 TESTS <---> TDD ALL THE TIME!), not to speak of people not mentioning how tests can simply...save time sometimes.
The problem with an heuristic weight though, is that it's an heuristic, judged against other heuristics by taste not proof.
The obvious testing approach, ensuring that the score for each test case retains the same order as you tweak the algorithm - is overtesting. You don't care about this total order; you more likely care about ordering of classes of things, rather than ordering within those classes; or simply that 'likely' cases follow an order. Hence, you hit far too many test failures.
I'd agree that it's possible to overtest in general, but it's so easy to overtest heuristics that it needs called out as a special case, and it sounds like the problem here.
Old, but quite relevant: http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...
Some sane amount of testing is good. However, I'm not very convinced writing tests first is a good idea. I saw a few programmers practicing this and when writing code they often concentrated too much on just passing the damn test-cases instead of really solving the problem in a generic way. Their code looked like a set of if-then cases patched one over another. Therefore, if they missed an important case in their testing suite, they had 99% chances for a bug. Once I ended up writing a separate, more generic implementation to validate the test suite. And it turned out there were a few missed/wrong cases in the test suite.
The kind of large pivot that the author refers to is only possible when you don't have established customers and you have a minimal product. You may as well call it prototyping. And prototyping with or without tests is indeed more a matter of taste than effectiveness.
The text above is just like the blog post: is unfounded and purely based on personal opinions.
Having a strong test culture in a company of any size is important for a number of reasons, and all of them are founded by both books and articles, but also by AMAZING products.
An engineer who has talent in writing readable and well structured tests is also someone who has a special skill in: creating robust systems, reverse-engineering existing systems and therefore good at being hired by a startup that has NO or very little test culture, which end up being hired by a company that has code that is buggy, error prone, coupled and hard to maintain, so that he can apply all this knowledge to clear crap written by people who doesn't like tests
Tests are double checking your work. ("Yup. That condition I said would be true still is!"). The tighter and more lightweight we can get that feedback loop going the better out output and the more confident in our work we'll be.
I think he actually mens an specification change. Tests would help you refactor in case you want to change the algorithm against the same specification.
But yeah, is a flawed metric so it shouldn't be put on a pedestal.
I'm under the impression that the tests are written first. If you are writing new code, then first there's a (failing) test, then there is the new code. If you are altering existing code, then pivot or not if something breaks you need to see if the reason is broken code or an inappropriate test, and you fix it before moving on.
In any event this should be a gradual process. How common is it that large slabs of code gets changed while ignoring failing tests, or large slabs of new code gets added while skipping the tests?
Is it that TDD is, in actual practice, unsustainable, or is the problem with not adhering to TDD? Or something else altogether?
I could use some opinions actually. The current workplace arguments about what we should or shouldn't test go like this:
"We write tests to establish specifications, then we write code to meet
those specifications. We should have tests covering every aspect of the
system as a way to assert that things are meeting the designed specs."
"but our model objects have allot of tedious detail. Why should we write
tests about what the name field can and can't accept? It's a max length of
32, that's it. Asserting that the constraint fails is beyond tedious and a
waste of our time. Most of the code we'd be testing is auto-generated. Are
we running tests on our framework or our code?"
"Some of this does seem like a waste of time, but what about when
someone changes something as part of the development cycle? eg.
Someone needs to remove validation from the name field to test other
parts of the systems behavior with an overly large name. If we had a
complete testing system in place it would alert us that the model isn't
meeting the spec. We've already had instance where temporary code to
bypass authorization was accidentally pushed to the develop branch."
On the other, I don't know how big of an issue bug regressions will be, we've heard other developers suffering for lack of a solid testing base to detect problems. Without a full battery of tests asserting everything we know about the spec, there's no way we will catch many regressions.
HN what do you think?
Well, since you asked!
The most valuable tests in any system is functional tests. Unfortunately most tests are unit tests that give little value.
Here's something that I encountered recently in the wild. Imagine a situation like this, a calculation process with three distint black boxes of code.
Block A - Pull data from 3rd party source
Block B - Distribute data to individual nodes (multi-threaded environment)
Block C - Individual nodes processing data
Each one of these blocks is around fifty thousand lines of code, each one of these blocks has hundreds if not thousands of unit tests. Every line of code is tested by a unit test. Yet there is a very real and dangerous bug in the system.
Why? Because the object passed from B-C has a transient variable and when it is serialized and deserialized that value is lost and reverted to the default value. Now most places leave this value as the default, so it's only one out of a thousand customers that have this problem, but when they do it's a huge problem.
Functional tests is the process of saying "I put meat/spices on this side and sausage comes out the other" it doesn't try to determine the RPM of the grinder, the number of bolts, the rotation of the gears, or the type of metal used to build the sausage-maker. It is simply the process of determining that meat becomes sausage. Functional tests have high value and are absolutely worth knowing.
In most CRUD applications Unit Tests generally wind up testing that the universe still exists as expected (true is still equal to true, zero is not one, and a null value is not the same a string, oh happy days the system works!).
That's not always the case. If you are working on something that is more advanced, say something that should measure the chance a hurricane might knock down a given 3d model you'll wind up having a huge stack of unit tests, and these will be very valuable. More often than not you'll know they are valuable as they'll be spent proving the science behind the application, and not simply that the application 'does something'.
That bug with the transient variable. You could only would really catch that if you had e2e or integration tests covering a majority of the code in your application, right? Even then, only if you were to persist the data, read the data back out, then again run more tests on it.
I accept testing isn't a silver bullet, but ouch.
As a general strategy, it sounds like it's better to run your higher level tests as a gauntlet (feeding the result of test one into test two) then with tightly controlled inputs (using explicit data for each test).
Other things being equal, testing is good and automated testing is more valuable than manual testing, for the same reasons that having any systematic, repeatable process is generally more efficient and reliable than doing the same things manually and hoping to avoid human error. Many of the following notes are just reasons why other things aren’t always equal and sometimes you might prefer other priorities.
Automated test suites get diminishing returns beyond a certain point. Even having a few tests to make sure things aren’t completely broken when you make a change can be a great time-saver. On the other hand, writing lots of detailed tests for lots of edge cases takes a lot of time and only helps if you break those specific cases. For most projects, a middle ground will be better than an extreme. But remember that as a practical reality, an absurdly high proportion of bugs are introduced in those quick one-line changes in simple functions that couldn’t possibly go wrong, so sometimes testing even simple things in key parts of the code can be worthwhile.
Automated test suites have an opportunity cost. Time you spend writing and maintaining tests is time you’re not spending performing code reviews, or formally proving that a key algorithm does what you think it does, or conducting usability tests, or taking another pass over a requirements spec to make sure it’s self-consistent and really does describe what you want to build. These things can all help to develop better software, too.
Automated test suites do not have to be automated unit test suites. For example, higher-level functional or integration tests can be very useful, and for some projects may offer better value than trying to maintain a comprehensive unit test suite.
Unit tests tend to work best with pure code that has no side effects. As soon as you have any kind of external dependency, and you start talking about faking a database or file access or network stack or whatever other form of I/O, unit testing tends to become messy and much more expensive, and often you’re not even testing the same set-up that will run for real any more.
A corollary to the above is that separating code that deals with external interactions from code that deals with any serious internal logic is often a good idea. Different testing strategies might be best for the different parts of the system. (IME, this kind of separation of concerns is also helpful for many other reasons when you’re designing software, but those are off-topic here.)
Modifying your software design just to support unit tests can have serious consequences and can harm other valuable testing/quality activities. For example, mechanics that you introduce just to support unit testing might make language-level tools for encapsulation and modular design less effective, or might split up related code into different places so code reviews are more difficult or time-consuming.
Automated testing is there to make sure your code is working properly. It is not a substitute for having robust specifications, writing useful documentation, thinking about the design of your system, or, most importantly of all, understanding the problem you’re trying to solve and how you’re trying to solve it.
No-one really only writes code that is necessary to pass their tests. Even if they religiously adhere to writing tests first, at some point they generalise the underlying code because that’s what makes it useful, and the test suite didn’t drive that generalisation or verify that it was correct. In TDD terms, the tests only drive the red/green part, not the refactor part.
For similar reasons, just because someone has a test suite, that does not mean they can safely refactor at will without thinking. This may be the most dangerous illusion in all of TDD advocacy, but unfortunately it seems to be a widespread belief.
A lot of “evidence” cited in support of various test strategies and wider development processes is fundamentally flawed. Read critically, and be sceptical of conclusions that over-generalise.
And finally, what works for someone else’s project might not be a good choice for yours, and something that didn’t fit for someone else might still be useful for you. If you’re experimenting, it can be very informative just to keep even basic records of roughly how much time you’re really spending on different activities and what is really happening with interesting things like speed of adding new features or how many bugs are getting reported in released code. You’re allowed to change your mind and try a different approach if whatever you’re doing right now isn’t paying off.
A one-person engineering team. This is some 20-something working on a start-up in his parents' garage or the 40-something code hobbyist out there that likes writing fun little apps every now and then. I would argue that there is virtually no need for testing unless you're either bad at writing code or inexperienced. I've written dozens of applications (in various languages) and once you get into the swing of things, you don't need TDD to actually "test." Debugging/edge cases/etc. just becomes a natural process of writing code.
A two-person engineering team. This is a secondary stage in every start-up -- sometimes, start-ups even are founded by two engineers. Here is where TDD starts being important. I may be an expert coder, but if my partner isn't that great (or if our code doesn't work well together), not testing can be a huge nightmare. But the impact is still relatively small. Bugs are easy to spot as both engineers are (or at least should be) incredibly familiar with the entire codebase.
Three-person to 10-person engineering team. Here is where things get really tricky and TDD becomes integral to meeting deadlines and actually, you know, putting out products. You've got Jim working on some financial stuff while Bob is optimizing some JPA code. At the same time, Susan is implementing a couple of new features. Having a test suite ensures that behavior will remain predictable given non-centralized iterations. Without TDD, large teams would be fixing bugs more than they would be writing code (most teams probably do that anyway -- not that it's a good thing).
10+ people; open source projects, etc. When you have more than 10 people working on a project, I think that TDD is simply a necessity. There are simply too many moving parts. Without a cohesive test suite, having this many people mess around with code will undoubtedly break stuff.
Ironically, I think that small teams put too much emphasis on TDD, whereas teams in larger companies (AT&T and IBM, for example) put too little.
I find that when testing is tough it is often that the underlying design is deficient. And tests shine a light on code and design smells. Discarding tests could be valuable if the tests have outlived their value. It might also be the case that excessive test maintenance is telling you something about your production code.
If you are planning on some sort of reuse (API, library, protocol, etc) a test spec is a very good idea if only to hash out practical use of the interface and identify caveats in the interface spec. Once released, the test spec is great for answering the question "did I accidentally break something that used to work?" thus vastly speeding up the bug-fix process for underlying components or widely used components.
If you are writing an end-user app, or a service that will be used non-programmatically (such as a GUI or web app), then a test spec is not required and probably a waste of time as things will likely change too quickly.
If you really do need to just get something out the door that quick, then you should just be prepared to accept the consequences of no tests, or put some in later for maintenance sake.
In Safety Critical Software however, I hope tests have been written for every single change, and everything has been checked over and over again. Tests become a lot more valuable and simply having to re write them, may make you think of problems you have introduced in the code.
Any assumptions about the output of your code that you make but don't record in a checkable form, are something that could potentially be silently broken by a change in the code. And then you're wasting your time debugging rather than solving the problem you're working on.
As a side note, all this recent hand wringing over testing reminds me so much of the time when the new breed of version control software was coming out, it was the same kind of accusations about things being "overhyped" and "you can just email a tar file!"
It's easy to rant about unit tests. Don't fall into that trap. Be aware that it's a hype, and correct for that fact - and once you do, use your best judgement. In the end, it's you, the engineer, who's shipping the product - not the countless ever-arguing denizens of the internet.
I think the 80/20 rule applies here. Only 20% of your code does 80% of the work. Focus on writing tests for those 20% then your time is much better spent.
Adding strict types (Typescript!) to your code base gives you way more maintainability than testing. Do it first, then if you still needs tests, write them.