Hacker News new | comments | ask | show | jobs | submit login
Superior Testing: Stop Stopping (arturdryomov.online)
57 points by ming13 16 days ago | hide | past | web | favorite | 52 comments

Regardless of all the arguments for and against tests, it's important to remember your purpose as an IT professional: Your job is to solve the customer's problem. That's it.

HOW you solve their problems is entirely up to you, and some solutions will work better than others.

The only thing about customers guaranteed to be consistent is that they will request changes. Some will be good changes. Some will be bad changes. Some will be unavoidable changes. Your job here is to guide them through the best possible change process, with the ultimate goal of solving their problems.

Testing is a useful tool for (1) making sure the system matches the customer's expectations, and (2) making sure new changes don't break old things.

How you implement tests is entirely up to you, but once again, some solutions work better than others depending on the situation. Manual testing only scales up to a point before the cost of the manual testing is greater than the cost of implementing and maintaining automated tests. You need to become good at estimation here, but for 90% of projects, automated testing is the most efficient long term strategy.

WHERE you test is also important. Generally, it's best to test components at the edge of their interfaces. When you input X, Y should come out, ideally every time. This is where immutable data and idempotent APIs are VERY helpful for maintaining a reliable codebase. The more you need to cut into a component's innards to test it, the more you should be asking yourself why.

Testing is very much about architecture. Make clear boundaries between components and outside interfaces, use immutable data and idempotent APIs, and tests become a lot easier to write, and more resilient to change.

> but for 90% of projects, automated testing is the most efficient long term strategy

I don't think this is a good way to think about it.

I see plenty of projects that don't go anywhere, and I also see plenty of projects that do go somewhere, but where large parts change so little and are so easy to test manually that there's little point in much automation. Other parts are complex and full of contradictory business rules so formal tests are necessary.

Of course, YMMV. If you're inside Google, things are probably looking very different. But most people aren't.

I've inherited one or two projects like the first one you describe before, and, when they don't have tests, it's awful.

The problem is, regardless of how stable the code was, or how easy it was for the original author to test it manually, if the code hasn't been worked on for ages, then the original author is either gone or can't remember all the details of how it was supposed to work anymore. Which leaves you in a dangerous situation, because there's no good way of knowing which behaviors are by design and which ones are incidental.


I swear, if it's not "we don't have time to test" it's "we don't have time to document". Ok, but you've hired all these firefighters instead.

Also, if you've inherited code, or didn't work on it for couple weeks, chances are you won't know/remember how to do the manual tests, and will waste time trying to rederive it from first principles. This also applies to any other kind of "ad-hoc" ops, and multiple times more so if the activity involves multiple runtimes - e.g. issuing a shell command. I've learned this the hard way, and these days I note down the precise steps/commands for any manual op that's not very clearly a one-time thing / throwaway. Your future self and whoever inherits your code will both thank you.

Adding in to your point, it's really easy to get familiar with a codebase and be confident that there are no regressions when you make changes if there are comprehensive tests.

Similarly, tests can serve as a device for asserting contributions are acceptable pre/post-merge as well as evaluating that your dependencies work as expected.

There's a case for ugly entangled tests too. If you, say, isolate your tests from the filesystem (mocks, etc), your tests can become fast, hermetic, etc. But they lose all sensitivity to differences between filesystems, and encode any assumptions about how filesystems behave.

I like to have tests that exercise the real stuff. The tests are flakier but have better fidelity to the actual use.

I generally agree, but we should acknowledge that "the real stuff" can be, in its way, somewhat fake.

Portable code can be run on whatever filesystem the user chooses. We can only test with what we have. So this boils down to another way to say "works on my machine" and that may or may not be relevant depending on how well the OS and hardware we test on matches the customer's setup.

It's good to be able to run your tests against whatever filesystem you choose, though.

Concrete example: Docker for Linux, Docker for Mac and Docker for Windows all implement both networking and mounted volumes in slightly different ways.

This has recently eaten up a nonzero amount of my time in maintaining our integration and acceptance test suites.

I'm not sure where I'm going with this, other than to observe that real-ish isn't necessarily a substitute for real. I suppose the "developer empathy" story is that maybe the more empathetic approach in the long run would have been to have developers work on the target platform rather than letting them pick whatever one they used to use at their old gig.

Tests are first and foremost about design. If it's hard to test, there might be too much coupling. Some code, by it's very nature, is hard to decouple. Just like some code, by it's very nature, has to mutate state. But having a signal (the pain of writing a concise test) is invaluable in proactively improving code quality.

I think there's a direct relationship between _good_ tests and _good_ codebases. I have to say _good_ tests, because I've seen people attack in-cohesive code not by making it more cohesive, but by writing in-cohesive tests. Any test IS not better than NO tests. Slow tests are awful. Tests that are flaky are awful. And tests that don't really test properly (or miss out important edge cases) can give a false sense of safety. I've heard people say, if tests are so hard to get right, maybe they aren't the right tool. I've since come to the conclusion that: no one said programming properly had to be easy.

What I've found is that, if you test for a long time, the value of tests as a design tool diminishes. Because you gradually write code that's easier to test (and easier to test is, for all intents and purpose, always easier to read, reason about and maintain). But, you get better at testing and spend less time struggling to write tests (because the code is cleaner) so you gain less, but it costs you less.

And then you still gain the other benefits. Protection against regression, correctness, documentation (especially of edge-cases). More efficient onboarding.

Also, there's this common fallacy of: speed, cost, quality: pick two. The reality is a lot more shades of gray, but if you were to generalize it, I'd go the other extreme and say that you can't have low cost WITHOUT high quality. Cost and quality aren't opposing forces of each other, they are opposing forces of the skill of your team.

Once you have something nearing a steady state design, you need to have tests. I can understand if something is in a very early stage it might not have tests yet.

At my current work we have loads of automatic tests. Every time something is changed, there's pipelines that test the following:

- Simple unit tests: Start with state 0, make and action, is state 1 what you expect? There's literally hundreds of these, and they have indeed found issues. Many edits that would have gone in have had to be rewritten to cater for corner cases found in the tests.

- Valgrind suite tests: Since we're dealing with c++, you get some interesting errors that might happen. Changing the order of some lines can look innocuous, but sometimes that causes an iterator invalidation. This kind of thing is hard to spot, but we have automated tests that will stop you merging code that does this. Especially things that are UB in c++ can be very tricky.

- Integration tests: several pieces that are in themselves passing unit tests might not work together. Luckily you can define a CI script that launches both and makes them talk to each other.

As you can imagine, it takes a lot of work to write the tests. One issue with a lot of teams is there simply isn't enough time. You have to show progress, which means showing things that seem to function externally. But you're always paying for that by having to fix things that you find as you go. Not having tests is a form of tech debt. It costs more and more as your code base grows.

I am not convinced this is always true. The first 25 years of my career was spent making games. Never had tests. We had testers but no tests. I then worked at a big company with tests. My first experience with them. Working on something important I loved them but they easily cut my velocity by 70% compared to what I used to get when working on games. The games were AAA and shipped to millions of customers on CDs and Cartridges so no bug fixes allowed after the fact. Being a big company and big project there were 10-30 engineers who's maintained testing infrastructure, continuous integration, etc. That's larger than the entire engineering team on most games I worked on.

I'm not arguing against tests and having had the experience of testing I'd use them when appropriate I'm just not sure the exact cutoff. If I was working on a game engine for multiple teams I'd be writing tests for sure. If I was working on a small 5-20 person team with custom tech I'm not sure I'd start adding tests where I didn't before. Maybe if there was IAP or multi-player online or some other server component and metrics. Or maybe if it was easier to get started and maintained.

Theres a bunch of big differences with games vs much other software. Often games are not maintained after shipping. Of course that's less true today than it was in the past with longer term online games but it's still also true that many games area pretty much done the moment they ship. Another big difference is the teams are 70-90% non-engineers making tons and tons of data.

I mostly bring this up as that often software engineers talk past each other since no all software engineering is the same.


Well, I mainly work in a team where developers ship separate products for the internal usage. So it's not really working in a team. Products are generally quite technical - it's not your usual CRUD shop.

I do write tests, but only post-factum, when I know for sure this component will be re-used in other components. I write them in cases where I know I will forget about the certain edge-cases in some time, so changing the code will most certainly introduce bugs.

When you have components that are reused over the course of several years (and obviously being optimised if the requirement is there), the chance of regressions is severely increased.

So I see that "writing tests for everything" is more of a political stance, rather than entirely practical.

I think the key distinction there is - you'd ship the games, and never change them again. What the tests do is let you continually evolve the software over a long period of time. What you need tests for is to give you confidence that chaning system a doesn't impact system b negatively, allowing the software to continue evolving at a much faster pace once you have a good test suite in place.

Anytime you are writing a product where the first version you ship will be the last manual testing is going to be cheaper/faster than automated testing. Only when you are making changes over several years do automated tests really pay off (in a few cases this can be some core libraries to a product that otherwise is ship and forget).

I like unit tests to test the expected cases (positive or negative, as long as they're expected), to prove to myself that they conform to the spec. Then, if bugs are found (ie unexpected cases are discovered), I like to add tests for those, but I don't need to try and think of them up front. This way the unit tests will prevent regressions (both on spec conformance and bugs) as the code changes over time.

Integration tests are necessary for the exact reason you mention: just because things works in isolation, doesn't mean they will work together.

Ideally, for catching unknown bugs, I would also have a property-based generative test suite that is run over night against lots of random scenarios. This isn't always possible (in fact, so far, I've sadly only worked on one project where we did this, but I'd like this to become more common).

> Ideally, for catching unknown bugs

I forgot to mention this part. We have data recorded from external sources, which we run through our code. Millions and millions of state changes, along with their summary states. This also helps to find cases we didn't think of.

That sounds great when you have the data available! It sounds like your software is thoroughly tested. If only where I currently work had that...

Once you have something that keeps changing, you need tests. If it's steady, there is no rush. Before something is written tests add nothing (neither remove), and if it's changing completely (as things usually do at the beginning), you'll still need the tests, but won't be able to get them, sorry.

About tests types, unit tests are worthless unless proven otherwise. Automated analysis is just great, both at static (like types) and run time (like Valgring), do it as much as you can. Integration tests are the actual tests worth their name, everything else is just development tooling. For them, see the first paragraph.

I heard the "customers don't pay for tests" line recently, from someone boasting about how they migrated hundreds of thousands of lines of PHP from 5.x to 7.x.

I didn't have an off-the-cuff reply to this, but thinking about it afterwards the issue became clear to me: customers don't pay for code either! Customers pay for solutions to their problems.

I think the key phrase is "how do we know?": maybe those hundreds of thousands of lines of PHP were needed to solve the customers' problems, but how do we know? After all of that work, does it actually do what's needed? When the requirements inevitably change, how do we know when we've finished our patches?

We can never know for sure, but there are ways to gain confidence in what we've done. Automated testing is a really low-cost way to gain quite a lot of confidence, which is also relatively stable over time. In contrast, manual tests are either very expensive or woefully shallow; and re-running them in the future takes just as much effort each time. Static analysis, code read-throughs, formal verification, etc. can give us more confidence than tests, but at a much higher cost. Simplifying the codebase can also help (code is a liability, not an asset!), but again that can be expensive.

We should get the most bang for our buck: usually that means adding more tests. Occasionally, if that's not enough, we might sit down and prove something, or manually step though a print out, etc. but usually we could get more out of the same time by writing tests.

> customers don't pay for code either! Customers pay for solutions to their problems.

And that's why you don't test code. You test that your application comply with your customer's requirements.

I'm happy to see things changing about tests with people realizing fine grained unit tests are often an hindrance and you should prioritize end to end testing. Test the interface of what you're selling, not the inner workings.

I agree with what you say. That's why I tend to use the phrase "automated testing", rather than something more specific like "unit testing". The tests I write would mostly be called property tests and integration tests, although I think the terminology around testing is too loaded and ambiguous ( http://chriswarbo.net/blog/2017-11-10-unit_testing_terminolo... ).

In any case, whilst I think it's good and healthy to debate the different forms of testing, their merits and tradeoffs, etc. I don't think it's appropriate when the alternative is not testing at all (e.g. "customers don't pay for tests").

I've worked at three organisations and managed to introduce automated testing to two of them. Even then, the test suites were "my responsibility", since (a) nobody else was running them and (b) I had to hand-hold people whenever I spotted a breakage (which they inevitably blamed on the test being wrong).

The heuristics I've come to follow are:

- Having automated tests is better than not having automated tests

- It's easier to improve a bad test suite than it is to get a test suite added/accepted

- There usually aren't enough tests

There are exceptions to these rules, but I don't believe them without evidence. For example, "the test suite is slow" won't convince me that there are too many tests; demonstrating systematic redundancy and fragility in the test suite might convince me (e.g. exposing privates in order to test them).

Only if people are on board with the idea of automated testing, will I bother to get more opinionated about the specifics.

> you should prioritize end to end testing. Test the interface of what you're selling, not the inner workings.

E2E tests can be slow and extremely fragile though. They have a place and purpose, but they should not be the only form. Why would you skip entirely over "inner workings" tests that could catch bugs sooner?

Are you aware of the Test Pyramid, and why it exists?

> Are you aware of the Test Pyramid


> and why it exists?

Because the xUnit consultant crowd worked a lot, it is easier to make unit tests (which fossilize your code) so there's a lot of tooling around it. And the reliance on external services with no sandbox or ready to use mocks mean E2E is harder to implement. But being harder just mean you have to do your job.

When I buy your software I don't care about how you implemented some pattern. What I care is that when I click on this gizmo in that situation it does what it should in some time-frame using some resources.

> It is the goal of every competent software developer to create designs that tolerate change [...] Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.

I agree that testing is a valuable tool for making good software, but I think the idea that all code categorically requires testing to be considered good is overzealous.

I think the most ardent TDD proponents underestimate the costs of testing. Tests are also code which can have bugs and has to be maintained, and having a well-developed test-suite can act as a type of inertia which makes it harder for an organization to make necessary architectural changes.

Don't get me wrong - I think testing is important, but it is just one tool which has to be balanced against a number of other factors.

I have two anecdotes about testing. First one was where I decided to refactor a function which had something like a 10-way cascading branch each of which had a compound test. Normally I wouldn't bother but, on this occasion, I wrote something like 60 unit tests in preparation for the exercise and one of those tests exposed a misplaced closing parenthesis in my solution.

The second case was where an in-house quotes engine was to be migrated to a SOAP service and some calculations needed to be ported. We didn't have access to the source so we created a small set of the most complex scenarios we could come up with and used those to generate calculator requests. I think we had 26 or 27 test cases and they each required non-trivial setup before the calculator could be invoked. Those cases exercised code which took the developer about 3 months to refine into a working solution.

So what does this reveal? I don't know. On the one hand, we had just under 60 unit tests which picked up 1 bug whilst, on the other, we had less than half that number of end to end tests which were sufficient to build a major piece of business functionality.

My gut feeling is that end to end testing is a better long term investment and unit testing is perfect for refactoring but inefficient for anything else.

Many of us had to deal at some point in our careers with 'legacy ball of mud with no tests' codebases. Furthermore, longer lived products require a certain degree of inertia; it is hard for users to adapt to what appear to be arbitrary changes on a regular basis. Reasonable to understand why so many people are touchy around the subject of testing.

That being said, there is a certain degree of intellectual rigidity when it comes to TDD. The point of a good test suite is to capture the behavior of the product from a user perspective, and not to duplicate implementation details, or test the same behavior at three different levels of the architecture.

Ideally, each path in the code should be exercised by one [and no more] test written from the perspective of the product, but not necessarily using the user-visible APIs. To exemplify test refrain, suppose one is building an IDE. Write one test that the IDE frontend surfaces errors from the compiler to the user, and write a test suite for the compiler to check they produce sensible errors, using code coverage tools to make sure all possible errors are accounted for. But don't test each and every compiler error using the IDE frontend APIs.

The introduction made me hope for real advice on how to "stop stopping", but it only repeated the old arguments on why we want tests. Why not deal with some of the real problems that actually prevent people from writing automated tests?

Real-world example 1: The system being developed talks to external system X. We don't want tests to litter the production database, especially since it's about accounting and we would get into legal trouble for that. However, there is no possibility to open a test account on the production system, and no budget for a license for a test installation of X. The main troiuble with X is that its public API (web service) changes from version to version and there is no useful documentation about it. How would one write integration tests for that?

Real-world example 2: How would you write tests for a system that has its requirements unspecified, even on a very coarse level, after the deadline where it is forced into production by management?

I'm pretty sure that both examples happen in other places than the ones I've seen them, too.

> However, there is no possibility to open a test account on the production system, and no budget for a license for a test installation of X. The main troiuble with X is that its public API (web service) changes from version to version and there is no useful documentation about it. How would one write integration tests for that?

How would you manually test in a situation like that? If you can't interact with the production API because it would corrupt data, and there is no dev API, would you just be guessing that everything works before you deploy your code?

> How would you manually test in a situation like that?

IMHO, You can't. We did in fact just hope it doesn't break in production.

But then, the article says just do it, so maybe there was a better solution we overlooked.

Nothing prevents people from writing automated tests except their own shortsightedness.

You're going to spend extra time on the the problems you identified. You can either plan for it up front with tests, or be unprofessional and firefight later.

1. Create a facade abstraction (it could be a microservice, API or even command line app) over the expensive to test thing. Make the abstraction really dumb and easy to test manually in isolation from the rest of the app. Integration test only against the abstraction, not the real thing and do a minimal level of end to end manual testing against the real thing.

2. Don't. If you don't have fixed requirements your tests will have negative ROI.

1. Unfortunately, the main issue was a changing interface of the third-party system. The manual testing against the real thing was the problem. I fully agree with automating everything where an abstraction is sufficient.

2. This is good to know. I might have worked with vague requirements for too long, but knowing this may help identify the parts where testing is indeed possible.

I don't think a blog post can possibly cover all the cases. A book will get you closer. For a great book on exactly that sort of subject matter, I couldn't agree more with the author of TFA: Working Effectively with Legacy Code is a book that everyone should read at least once.

It's been a while since I personally last read it, but, as I recall, it has an entire chapter devoted to each of your examples.

If it's an accounting service, make a new account/table named "Software test". Book-keeping was invented to spot errors. If you have a non zero balance in your "Software test" account you probably have a bug in the software.

Unfortunately, this wasn't possible for two reasons. The first one was that we would not just need one account, but several ones. Though it might have been possible to limit this for the integration test, and test all the stuff that deals with multiple accounts by mocking the accounting system.

The second problem is that even a fake account doesn't belong on production, not in accounting. They are very strict about such things.

So I write some code and a test, and of course the first time I run the code, the test is going to fail.

So now the balance in our "Software test" account is nonzero whereas it should be zero.

But audit requires us to record all bookings, so how do we explain these bogus bookings (both from the erroneous code that made the mistake in the first place, and from the manual adjustment later that fixes the mistake for the next test run)?

You can have many types of tests. First you have assertions and unit tests with mocks/injections that runs at compile time together with the type checker and other low level tests. Then you have the integration tests that makes sure your code works together with third parties and other API's. Those tests might seem unnecessary when you already have unit tests, but it's very nice to know if some third party have made breaking changes and that your software have stopped working because of it.

> What is better — having a test or not having a test? The answer is obvious — any test is better than no tests at all.


For most people here, this is probably true - because you (and hopefully those you work with) know how to write tests well.

Examples of badly written tests I've encountered that wasted everyone's time:

* Unexpected environment - such as, Django doesn't turn off the cache while tests are running

* Tautological tests - where the test just repeats what's in the code

* Peeking too far into the implementation - restricts refactors and can create lots of false negatives or false positives depending on what's being asserted

* Mocking out too much - tests that pass when they really shouldn't

* False assumptions/not thinking it through - why did this test start failing on New Year's Day?

* Flaky integration tests

In our case, about half of these could be updated once the issue was apparent, but the rest either scrapped or completely rewritten.

And then there's a number of issues with test coverage giving you a false sense of security, with things like reaching 100% for a given piece of code, but only thinking about the happy path.

I partially agree w/ you. One can always delete the test and then it's like you have no tests at all.

> And then there's a number of issues with test coverage giving you a false sense of security, with things like reaching 100% for a given piece of code, but only thinking about the happy path.

Yup, and this is why one writes tests against well-defined interfaces/boundaries.

I like the concept of error budgets. Start off by knowing what kind of quality and resiliency a system requires and design your test strategy around that. Means talking to the client about that.

I'm not going to invest a load of time in various types of automated test for an internal site with a form over a database that 2 users use for low priority work. The idea of 80%-100% code coverage for basic work like that seems like waste to me.

But for critical path of the eCommerce shopping experience I'm going to going to write all kinds of automated tests at multiple layers of the stack, right up to chaos/stress testing it, so that we know when black friday comes we can handle it.

I don't like dogma and TDD seems too dogmatic for me. I am very pro testing, having been both a QA, Developer and Ops engineer. I want the freedom to exercise my own expert judgement. The problem with dogma is that it makes Thinking take a back seat. Suddenly we have 80% code coverage enforced on a page that loads a grid from a table, going through a three layered monstrosity of code.

> I like the concept of error budgets. Start off by knowing what kind of quality and resiliency a system requires and design your test strategy around that. Means talking to the client about that.

If you have a client knowledgeable enough, then great. But most people who don't have an engineering background think that the correct number of bugs is 'zero'. It's really hard for them to get their head round something being - to some degree - buggy, and still being acceptable.

I did a load of work in defence (in the UK) and found it a very frustrating experience because the way it works they weren't even "paying for a solution to their problems".

You have to underbid the initial proof of concept phase to win it, and then hack together something that vaguely meets the requirements in the limited budget.

Then you've got the next 10 contracts over several years to develop that into the actual product - meet all the requirements. The problem is that nothing in this process encourages you to write good, clean code. If the code was such a mess that it took 3 times as long as it should to add new features, that meant that we could charge 3 times as many hours.

> Customers ask providers for a quality product

It's not even the "quality" thing. It's that tests help at delivering the product.

Of course if they tell me, "I need this tomorrow" because of some emergency there is hardly enough time to code for the happy path and manually check that it looks OK. No tests. I make sure they understand it will be full of bugs and we'll fix them later on. This happens rarely but it happens.

How do people here test their UIs? As an extreme example, the link here contains hundreds of lines of declarative code; a single typo (unclosed bracket, misspelled CSS property...) could break the site. So how is it tested?

Probably it's not tested in any automated way, because manual testing (open the site and look) is much easier. Maybe lots of software is like that, to various degrees.

I like the flutter widget test approach (it't not unique, but it's well executed there): https://flutter.io/docs/testing#widget-testing

In flutter parlance, a whole view (page/screen) is a widget, and that's the granularity I was testing it - it's good for testing the high level patterns that the screen is meant to adhere to, e.g. Send button appears when there is some text to send.

You can also have golden images for rendering of each screen. They change a lot, but that at least forces developers to look at the changes each time and decide if they're on purpose.

You can implement a combination of end-to-end browser driven tests as well as applying visual regression analysis etc. I wrote something which would actually parse two sites and return the differences in the DOM/CSSOM many years ago, but it worked nicely :)

I always thought that websites are tested with Selenium and similar (https://www.seleniumhq.org/).

For other kind of GUIs, you can also emulate mouse + keyboard interaction with tools like Sikuli (http://www.sikulix.com/). This is also scriptable, e.g. in Python, but actually quite painful in my experience: When you GUI needs significant time for loading and rendering, the Sikuli program might fail to find a button that did not appear yet. So you need to play with timeouts and trial-and-error.

E2E testing using a headless browser controlled by a framework like puppeteer. Scaling up with many different OS/browser combinations gets difficult and expensive fast though, which is why services like browserstack exist. Testing CSS is sometimes done by diffing screenshots but I personally don't think it's worth the effort unless your app has hundreds of pages.

Well, I hope he continues with the idea to "kick-off a series of articles about testing".

Any suggested readings about testing, HN?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact