Tests that run for 30 hours is an indication that nobody bothered writing unittests. If you need to run all tests after changing X, it means X is NOT tested. Instead you need to rely on integrations tests catching Xs behavior.
I beg to differ. Having to run the full test suite to catch significant errors is an indication that the software design isn't (very) modular, but it has nothing to do with unit tests. Unit tests do not replace service/integration/end to end tests, they only complement them - see the "test pyramid".
I think it's important to point this out, because one of the biggest mistakes I'm seeing developers do these days is relying too much on unit tests (especially on "behavior" tests using mocks) and not trying to catch problems at a higher level using higher level tests. Then the code gets deployed and - surprise surprise - all kinds of unforeseen errors come out.
Unit tests are useful, but they are, by definition, very limited in scope.
While I see some value in the red-green unit testing approach, I've found the drawbacks to often eclipse the advantages, especially under real-world time constraints.
In my day to day programming, when I neglect writing tests, the regret is always about those that are on the side of integration testing. I'm okay with not testing individual functions or individual modules even. But the stuff that 'integrates' various modules/concerns is almost always worth writing tests for.
In my experience it's far easier to introduce testing by focusing on unit testing complicated, stateless business logic. The setup is less complex, the feedback cycle is quick, and the value is apparent ("oh gosh, now I understand all these edge cases and can change this complicated code with more confidence"). I think it also leads to better code at the class/module/function level.
In my experience once a test (of any kind) saves a developer from a regression, they become far more amenable to writing more tests.
That said I think starting with integration tests might be a good area of growth for me.
Writing functional tests is easy when you have a clear spec. If you do, tests are basically the expression of that spec in code. Conversely, if they're hard to write, that means that your spec is underspecified or otherwise deficient (and then, as a developer, ideally, you go bug the product manager to fix that).
Integration tests are pretty important in huge codebases with complex interactions. Unit tests are of course useful to shorten the dev cycle, but you need to design your software to be amenable to unit testing. Bolting them onto a legacy codebase can be really hard.
Exactly. I have a product that spans multiple devices and multiple architectures: micro controllers, SDKs and drivers running on PCs, third-party devices with firmware on them and FPGA code. They all evolve at their own pace, and there’s a big explosion of possible combinations in the field.
We ended up emulating the hardware, run all the software on the emulated hardware, and deploy integration tests to a thousand nodes on AWS for a few minutes it takes to test each combination. Tests finish quickly and it has been a while since we shipped something with a bug in it.
But there’s a catch: we have to unit test the test infrastructure against real hardware - I believe it’d be called test validation. Thus all the individual emulators and cosimulation setups have to have equivalent physical test benches, automated so that no human interaction is needed to compare emulated output to real output. In more than a few cases, we need cycle-accurate behavior.
The test harness unit (validation ) test has to, for example, spin up a logic analyzer and a few MSO oscilloscopes, reinitialize the test bench – e.g. load the 3rd party firmware we test against, then get it all synchronized and run the validation scenarios.
Oh, and the firmware of the instrumentation is also a part of the setup: we found bugs in T&M equipment firmware/software that would break our tests. We regression test that stuff, too.
All in all, a full test suite, run sequentially, takes about 40,000 hours, and that’s when being very careful about orthogonalizing the tests so that there’s a good balance between integration aspects and unit aspects.
I totally dig why Oracle has to do something like this, but on the other hand, we have a code base that’s not very brittle, but the integration aspects make it mostly impossible to reason about what could possibly break - so we either test, or get the support people overwhelmed with angry calls. Had we had brittle code on top of it, we’d have been doomed.
If you're only writing tests at the unit level, you might as well not bother. And it's always good to run all tests after any change, it's entirely too easy for the developer to have an incomplete understanding of the implications of a change. Or for another developer to misuse other functionality, deliberately or otherwise.
It could also be that the flags are so tangled together that a change to one part of the system can break many other parts that are completely unrelated. Sure you can run a unit test for X, but what about Y? Retesting everything is all you can do when everything is so tangled you can’t predict what a change could effect.
> Tests that run for 30 hours is an indication that nobody bothered writing unittests.
Yes, they were not unit tests. There was no culture of unit tests in the Oracle Database development team. A few people called it "unit tests" but they either said it loosely or they were mistaken.
Unit test would not have been effective because every area of the code was deeply entangled with everything else. They did have the concept of layered code (like a virtual operting system layer at the bottom, a memory management layer on top of that, a querying engine on top of that, and so on) but over the years, people violated layers and wrote code that called an upper layer from lower layer leading to a big spaghetti mess. A change in one module could cause a very unrelated module to fail in mysterious ways.
Every test was almost always an integration test. Every test case restarted the database, connected to the database, created tables in it, inserted test data into it, ran queries, and compared the results to ensure that the observed results match the expected results. They tried to exercise every function and every branch condition in this manner with different test cases. The code coverage was remarkable though. Some areas of the code had more than 95% test coverage while some other areas had 80% or so coverage. But the work was not fun.
I do. It's about state coverage: every Boolean flag doubles the possible state of that bit of code: now you need to run everything twice to retain the coverage.
FWIW, I know people who work on SQL processing (big data Hive/Spark, not RDMBS), and a recurrent issue is that an optimisation which benefits most people turns out to be pathologically bad for some queries for some users. Usually those with multiple tables with 8192 columns and some join which takes 4h at the best of times, now takes 6h and so the overnight reports aren't ready in time. And locks held in the process are now blocking some other app which really matters to the businesses existence.
These are trouble because they still "work" in the pure 'same outputs as before', it's just the side effects can be be so disruptive.
Writing tests for error handling can be a pain. You write your product code to be as robust as possible but it isn't always clear how to trigger the error conditions you can detect. This is especially true with integration tests.