In this case, right at step 1. Capturing the output and making sense of it while simultaneously allowing for normal variation e.g. due to timing is one heck of a big job and requires a lot of mocking. It's not a job that can be done in hours; probably weeks, if you want something that doesn't get thrown away the next time you have time to spend on testing.
I'd need to mock hardware I don't have, possibly mock some kernel interfaces, set up a physical test endpoint that can observe things on the wire (as in raw ethernet packets, but also application level protocols..). And well, really capturing timing issues and such is kinda important because things like flow control, multipath routing & failover are critical features of the software.
Of course some of the things you want to test are inherently timing related issues. Like, a certain signal must be asserted if a particular message isn't received within a specified (possibly configurable or not) time window after some event.
The core of the software is a heavily asynchronous & multithreaded mess..
Getting all the scaffolding in place and then automating everything is going to require lots and lots of time. And reverse engineering.
I tried playing with bash scripts and docker based virtual deployments but it completely misses the hardware side of things and really doesn't get me far.
That is, you can hire a new person, they can do all this work and it doesn't affect the speed that the rest of the team delivers at much. They don't need to know extensive amounts about the underlying core code - just the way it interfaces with the outside world.
It's also a pretty effective way of onboarding somebody on to the project.
That isn't the case for any kind of unit testing approach, where you have to be able to work with, understand and refactor the core code. That requires core dev team members and will take their attention away from customer facing tasks.
Even then, it is often hard to justify this work/find the time. It depends on how long you think you'll be working with the codebase really.
Edit: the above assumes you have something like CD/fast release cycle. If not, then it's not really viable unfortunately.
Though that dependent on what you mean by legacy. If nobody’s touched it in 5+ years it might very well be unchained when the system is completely replaced with something else. It’s well worth adding tests if something is or will be in active development.
Slowly adding tests is fine if your goal is to increase the total code under test, but it's not going to give you much confidence that you didn't just break something when making updates to the system.
Ugly advice: fall back to manual "unit tests" with a debugger, temporary printf or whatever you have at hand.
Does this eat a lot of time? Yes. Is it grungywork? Yes. Why do it then? Basically to surface the real cost of poor testability.
Snapshot testing is basically what used to be referred to "master-knows testing" (don't remember if this is the exact term) where only the one who initially created the test knows if it's correct, because the intent is not exposed at all. Instead, a properly written unit tests exposes what you want that unit to do, and focuses on only that part, so you can change things in the unit without breaking those specific parts. Snapshot testing would ruin this.
> You want to add tests, but you don’t have time to!
This also seems like a weird thing. The idea behind testing is to save you time. If you're taking longer time because you're adding tests, you're doing something wrong! The whole idea is to make the feedback if something is working or not faster, not slower.
For example if working with a large legacy codebase, it’s important to maintain previous functionality and only change it if you can rationalize why. People and other systems might be relying on something that’s a bug, or a side effect that’s not obvious.
If you don’t do this, then you can either choose to have no tests, or have to tear down the whole system to understand it.
Visualizing and eyeballing snapshots in behavioral tests is a highly effective way of catching bugs and defining behavior (edit code - run test - verify output is correct, if it is, lock it down).
I find them to be vastly more effective and cheaper to build than unit tests even on greenfield code bases.
If a new person joins the codebase and sees that a snapshot is now different, how they know what's correct or not? What I've seen in the wild is 1) talk with the person who authored the test or 2) just say yes and move on.
If you're are doing a pure refactor of your css, you shouldn't see a change. Unless your css rules are order dependent and now you caught that.
This is true when you're writing the code initially, but the fact is some people don't write tests, and then you might inherit a huge pile of untested code. Adding tests to would take a lot of time, and sometimes you need to start making changes fast.
When there are no tests it's very likely you won't be able to add tests without changing the code to be something that you can test. This is where snapshot testing shines. Snapshot tests are the only type of test you can add without needing to modify any of the code. You can add it in easily, and it gives you some confidence that you're not breaking things.
Indeed, it will most likely require refactoring the code to support the testing. But the initial idea of _why_ you want to add testing there in the first place, is so you can refactor with confidence and not having to manually test all the cases. Compare "refactor to test -> add tests -> refactor behaviour" with "refactor behaviour -> manual tests" and the first one will most likely be faster.
The system handled around 500K requests per minute. The inputs/outputs of the system looked like this.
1. Receive HTTP request
2. Make a bunch of HTTP requests
- Call external web services for business logic
- Call reporting services for analytics
3. Render result
On a single production box of the legacy system, we instrumented the http clients to dump the requests/replies to disk for 15 minutes.
Using the snapshots, we wrote some scripts to allow us to replay the data through the new system. It was a totally ad hoc manual process that took about a week. It found a TON of bugs and greatly increased our confidence in the new system. After we were done we threw it away.
I am convinced that using standard unit testing alone would not have found many of these bugs and caused them to surface in production.
So, if you're on a short deadline and maybe you didn't write the code the first place, so you might not be familiar with the internals, snapshot testing is the quickest way to get confidence that you aren't causing further regressions.
These aren't meant to be used forever more. Just to stop things from sliding backwards.
After all, the title of the site is understandlegacycode.com. It's all about getting to a point where you can refactor and do things properly when you do get time.
If that doesn't make sense, what would you suggest to do?
Its not lazy, its just a trade-off space.
Now you need to change it.
Any future bug discovered is probably going to come back to you, given you are now the last person to touch it and the perceived expert on this legacy monstrosity. Having proof of how the code worked before you touched it not only prevents regressions while you're working on it, but also prevents having to do this sleuthing on the fly with someone wagging a finger in your face. Not only that, but depending on dependency drift you not be able to easily go back and prove the old functionality after the fact, so having this in a CI log will be valuable for your sanity and reputation.
Most likely the output will change and you'll be left with a random blob of data to figure out what went wrong instead of clear specifications.
You don't have an external spec to test against in this case, the existing implementation details are the spec, and so the goal is to preserve them.
I'd say going from:
* Find one input
* Get coverage
* Mutation test to ensure you have assertion density
is probably a fine way to accomplish the goal.
Eventually you'd want to define an external specification for the code that isn't just "the service works, therefor ensure the service does what it's been doing", at which point some of these tests may be removed, or at least enhanced.
That's part of the problem. The original test was created based on implemented behaviour and not test/functional requirements, so you don't have any confidence that what you're preserving is the right behaviour. It tells you you haven't changed the output, but once you've covered your codebase with these, and make a commit that triggers 46 snapshot failures, you have zero knowledge to decide whether that change is desirable or not.
The goal here is not "Prove this software meets a spec" it is "This software is the spec, and we must not regress". There are many more systems that act as a defacto spec than that are built to an external one.
If you choose to then move to a world where you want an external spec, as I said, at that point some of these tests will be thrown away or refactored. Most systems never make it to that stage.
This is especially common when you have legacy software where the person or team that wrote the code is no longer there. It takes a monstrous amount of effort to derive a meaningful spec, and step 0 is always preserving existing behavior.
In my experience, tests like this actually aren’t hard to maintain. The test can print out a diff between the expected and actual result when there is a failure, so you can check if the new behavior is expected and easily copy-paste to update the test if desired.
I will write a mocking test to ensure a business flow is followed at the test level just because I have fallen too many times for a developer updating snapshots because they made a change... but they update all the snapshots even the ones that show they introduced a new bug.
As the author said:
> It’s fine to delete useless tests afterwards. They were useful when you had to change the code. They’re not here to replace proper tests in your codebase!
> It's meant to [...] make sure you don't break something
It helps you see if you didn't break 'something', but 1) you have no clue what that something is by reading the test 2) you have no information to fix it. Plus the snapshot itself might have already captured the wrong implementation. It's a false sense of security.
If it truly is a legacy app, snapshots should rarely be updated (because you're not updating the code) and any introduced bugs will trip a snapshot. The issue is always about people blindly updating snapshots because 'I made a change and the snapshots are wrong'.
If the failed tests don't line up with what you believe you changed, you are the reason it's Legacy Code.
It's a very good technique that I've used many times in the past.
Ideally, in a healthy software project, you want to define automated tests where each test checks that an invariant is satisfied by the implementation, and links that invariant to requirements that the software is meant to implement. Then, if a test fails, you know that one of your functional requirements has been broken, in at least some scenario. If the tests and test harness are well structured, it should be very quick and easy for you to learn which requirement has been broken, in exactly what scenario. Fantastic. You can take that information and iterate, and try again.
Unfortunately, in a legacy code situation, we commonly would have no idea what the functional requirements of the software were, or what the relationship is between the functional requirements and the code. There is often no existing automated test suite. If we are under pressure to make a change to the codebase, having a dumb automated regression test suite that can check if _the behaviour of the code is the same as it was before_ is very valuable. Note that the purpose of this regression test suite is very different to the purpose of the earlier test suite that we would build in a healthy software project.
You can think about the feedback from an automated test suite as giving some information as you modify the code and move about the space of possible programs. A very high quality test suite will give you feedback that helps you steer in the right direction: towards modifications that produce possible programs that do not break any existing requirements. However, in a legacy code situation, we often don't have that luxury of knowing any requirements, or knowing internal invariants that the existing code is meant to maintain. So in this case, the feedback from our automated regression test suite will not be very informative: it does not give us a signal that we are moving in a direction that breaks requirements or not. It does however let us know if our change (from our tiny bug fix / high priority feature we're trying to jam in) causes the behaviour of the existing system to change that we did not otherwise expect. I.e. we can use it to detect if we are "moving" in the space of behaviours of the program. If a regression test breaks, this alerts us to an unplanned consequence of our change, so we can abort rolling the change out and go back the drawing board with more analysis and debugging and to try to understand what the system is doing, and if we actually want the behaviour to change or not.
reminds me a lot of how github's scientist works. https://github.com/github/scientist
We had an outage because of some code that got pushed to production, and no tests were written for it. It was a very simple bug, basically a list was being appended to instead of replaced. Nobody noticed for a while because of the right data 'being there', which made things worse because a lot of things had to be re-worked after the bug was fixed, as they were going off the bad data.
I mentioned that any new code should have tests. And the manager replied with "We don't have time to write tests", and I had to bite my tongue to not reply, "because we're spending all our time fixing simple bugs and reworking data"
For example, a project I work on is available in Asia. They put their last name first. So anything related to names needs 2 tests to make sure it works right. Oh, there is legacy data, so customers might be missing first, last, or both names. So add another 3 tests onto the pile. You're looking at 5 tests right there.
Let's say the feature you're working on related to names has 3 dimensions. Well, now you potentially have to run 5*3=15 tests. If one of those fails and you bugfix, you have to re-run those tests. Have fun with that if your tests are manual.
The later tests are added, the less of their value is leveraged.
It's one thing if you're using snapshot testing like your average test, to make sure things don't break when you make changes. But if you're in a bind, is a snapshot test really better than your own eyes?
Maybe it's better to release the thing in that case and add better tests asynchronously to the deadline.
I suppose the snapshot test in a pinch is not so bad if it's just changes being made to legacy code. But I'm not sure I'd do such a thing when building new features.
But, yes, in many cases it's not a good automated check.
The 1st commit in this sequence is a pure refactor, and definitionally should change no behavior. The "snapshot test" described sounds perfect for this, especially in unfamiliar code. Ideally, I'd go even further, and have a compiler prove for me that my refactor produced a perfectly equivalent program from a black-box perspective. Snapshot testing is great because it gets pretty close, very cheaply, whereas the full program equivalence problem is impossible.
Hook up the last known good snapshot and the new output to your favorite diff UI and you'll get a pretty good idea about the actual effects of the code changes. I feel more confident signing off a new "known good" based on reviewing a full document diff than after iterating a set of meticiously hand-crafted expectations alongside the production code until convergence.
All of this is true. But what's best for the system isn't necessarily best for the programmer. If you're under pressure to show results in order to keep your job or get a good performance review, buggy untestable code is better than no code because you spent time writing tests. Remember, if the goal is to keep your job and/or get that raise.
"But," you may argue, "in that case your job is terrible and you should quit."
I often see this advice in the bubble that is HN. The reality is that a lot of programmers can't change their jobs that easily because of a multitude of problems (age, difficulty interviewing, social anxiety, bad economy, disabilities, pressing financial problems, etc).
You'd be surprised by in how many businesses you can simply plod on with no repercussions, but if you try to test (and reduce your apparent immediate output) you can get a bad review or a PIP. In some, automated testing is not even something they are aware of. But what happens to the software, you say? Well, all software eventually ends up either working/used or not working/abandoned, and tests often have very little impact in this decision...
This was for example the case in the "shared services" unit of a MAJOR oil & energy company. Probably the first you will think of.
Basically what your test does is to ensure there's no change to the result (compare the new serialized output to the old one). Google's ToTT had a good write-up on this topic: https://testing.googleblog.com/2015/01/testing-on-toilet-cha...
For applications where state drives output (IE, every single webapp) I find that change detection is a decent chunk of issue detection.
Change Detection + Business Flow Testing will probably deal with your stakeholder requirements far cheaper than almost every other method of 'ensuring quality'... even the classic lie of 'developers would never make that mistake'