Interesting. I've always been aggressive on trying to keep my Java and lately Kotlin builds fast. Anything over a few minutes in CI becomes a drain on the team. Basically a productive team will have many open pull requests open at any point and lots of commits happening on all of them. That means builds start piling up. People start hopping between tasks (or procrastinating) while builds are happening. Cheap laptops become a drain on developer productivity. Etc. All of this is bad. Maintain the flow and keep things as fast as you can. It's worth investing time in.
Some of the overhead is unavoidable unfortunately. E.g. the Kotlin compiler is a bit of a slouch despite some improvements recently. Many integration tests these days involve using docker or docker compose. Overall that's better than a lot of fakes and imperfect substitutes. But it sucks up time. A lot of Kotlin and Spring projects involve code generation. This adds to your build times. Breaking builds up into modules increases build times as well but tends to be needed. Be mindful of all this.
A few performance tips not covered in the article that may also apply to other languages:
- Run your tests concurrently and write your tests such that you can do so. Running thousands of tests sequentially is stupid. When using junit 5, you need to set junit.jupiter.execution.parallel.enabled=true in platform.properties (goes in your test resources). Use more threads (e.g. junit.jupiter.execution.parallel.config.dynamic.factor=4) than CPUs for this as your tests will likely be IO limited and not CPU limited. If you are not maxing out all your cores, throw more threads at it because you can go faster. If your tests don't pass when running in parallel, fix it. Yes, this is hard but it will make your tests better.
- Don't do expensive cleanup and setup in between tests. This takes time and integration tests become more realistic if they don't operate in a vacuum (your production system is not a vacuum either). To enable this, randomize test data so that the same tests can run multiple times even if data already exists in your database. Docker will take care of cleaning up ephemeral data after your build. This also helps with running tests concurrently.
- Distinguish between (proper) unit tests and scenario driven integration tests as the two ideal forms of a test. Anything in between is going to be slow and imperfect in terms of what it does. This means you can improve test coverage (of code, functionality, and edge cases) by making it a proper integration test or faster by making it a proper unit test (runs in milliseconds because there is no expensive setup).
- With integration tests, add to your scenarios to make the most of your sunken cost (time to set up the scenario). Ensure they touch as much of your system as they can to do this. You are looking for e.g. feature interaction bugs, heisen-bugs related to concurrency, weird things that only happen in the real world. So make it as real as you can get away with. A unit test is not going to catch any of these things. That's why they are called integration tests. Make it as real as you can.
- Fix flaky tests. This usually means understanding why they are flaky and addressing that. If that's technical debt in your production code, that's a good thing. Flaky tests tend to be slow and waste a lot of time.
- Separate your unit and integration tests and make your builds fail fast. Compile + unit tests should be under a minute tops. So, if somebody messed up, you'll know in a minute after the commit is pushed to CI.
- Eliminate sleep calls in tests. This is an anti pattern that indicates either flaky tests or naive strategies for dealing with testing asynchronous code (usually both). It's a mistake every time and it makes your tests slow. The solution is polling and ensuring that each test only takes as much time as it strictly needs to.
- Run with more threads than your system can handle to flush out flaky tests. Interesting failures happen when your system is under load. Things time out, get blocked, deadlocked, etc. You want to learn about why this happens. Fix the tests until the tests pass reliably with way more threads than CPUs. Then back it down until you hit the optimum test performance. You'll have rock solid test that run as fast as they can.
- Keep your build tools up to date and learn how to use them. Most good build tools work on performance issues all the time because it's important. I use gradle currently and the difference between now and even two years ago is quite substantial. Even good old maven got better over time.
- Pay for faster CI machines. Every second counts. If your laptop builds faster than CI, fix it. There's no excuse for that. I once quadrupled our CI performance by simply switching from Travis CI to AWS code build with a proper instance type. 20 minutes to 5 minutes. Exact same build. And it removed the limits on concurrent builds as well. Massive performance boost and a rounding error on our IT cost.
Most of this advice should work for any language. Life is too short for waiting for builds to happen.
Some of the overhead is unavoidable unfortunately. E.g. the Kotlin compiler is a bit of a slouch despite some improvements recently. Many integration tests these days involve using docker or docker compose. Overall that's better than a lot of fakes and imperfect substitutes. But it sucks up time. A lot of Kotlin and Spring projects involve code generation. This adds to your build times. Breaking builds up into modules increases build times as well but tends to be needed. Be mindful of all this.
A few performance tips not covered in the article that may also apply to other languages:
- Run your tests concurrently and write your tests such that you can do so. Running thousands of tests sequentially is stupid. When using junit 5, you need to set junit.jupiter.execution.parallel.enabled=true in platform.properties (goes in your test resources). Use more threads (e.g. junit.jupiter.execution.parallel.config.dynamic.factor=4) than CPUs for this as your tests will likely be IO limited and not CPU limited. If you are not maxing out all your cores, throw more threads at it because you can go faster. If your tests don't pass when running in parallel, fix it. Yes, this is hard but it will make your tests better.
- Don't do expensive cleanup and setup in between tests. This takes time and integration tests become more realistic if they don't operate in a vacuum (your production system is not a vacuum either). To enable this, randomize test data so that the same tests can run multiple times even if data already exists in your database. Docker will take care of cleaning up ephemeral data after your build. This also helps with running tests concurrently.
- Distinguish between (proper) unit tests and scenario driven integration tests as the two ideal forms of a test. Anything in between is going to be slow and imperfect in terms of what it does. This means you can improve test coverage (of code, functionality, and edge cases) by making it a proper integration test or faster by making it a proper unit test (runs in milliseconds because there is no expensive setup).
- With integration tests, add to your scenarios to make the most of your sunken cost (time to set up the scenario). Ensure they touch as much of your system as they can to do this. You are looking for e.g. feature interaction bugs, heisen-bugs related to concurrency, weird things that only happen in the real world. So make it as real as you can get away with. A unit test is not going to catch any of these things. That's why they are called integration tests. Make it as real as you can.
- Fix flaky tests. This usually means understanding why they are flaky and addressing that. If that's technical debt in your production code, that's a good thing. Flaky tests tend to be slow and waste a lot of time.
- Separate your unit and integration tests and make your builds fail fast. Compile + unit tests should be under a minute tops. So, if somebody messed up, you'll know in a minute after the commit is pushed to CI.
- Eliminate sleep calls in tests. This is an anti pattern that indicates either flaky tests or naive strategies for dealing with testing asynchronous code (usually both). It's a mistake every time and it makes your tests slow. The solution is polling and ensuring that each test only takes as much time as it strictly needs to.
- Run with more threads than your system can handle to flush out flaky tests. Interesting failures happen when your system is under load. Things time out, get blocked, deadlocked, etc. You want to learn about why this happens. Fix the tests until the tests pass reliably with way more threads than CPUs. Then back it down until you hit the optimum test performance. You'll have rock solid test that run as fast as they can.
- Keep your build tools up to date and learn how to use them. Most good build tools work on performance issues all the time because it's important. I use gradle currently and the difference between now and even two years ago is quite substantial. Even good old maven got better over time.
- Pay for faster CI machines. Every second counts. If your laptop builds faster than CI, fix it. There's no excuse for that. I once quadrupled our CI performance by simply switching from Travis CI to AWS code build with a proper instance type. 20 minutes to 5 minutes. Exact same build. And it removed the limits on concurrent builds as well. Massive performance boost and a rounding error on our IT cost.
Most of this advice should work for any language. Life is too short for waiting for builds to happen.