
Slow CI: real problem or easy excuse for developers? - iamnguele
https://www.codingnagger.com/2019/01/31/slow-ci/
======
peterwwillis
Things that cause slow CI:

\- Not parallelizing tests. Microsoft runs something between 100k and a
million tests per deploy, and that's only possible if you can run your tests
in a distributed fashion.

\- Not bundling tests by stage (re-doing setup/teardown, re-doing nearly
identical tests, tests at random times). Similar to the above, you have to
group your tests into stages where they can be parallelized together
efficiently. Example: you're a network device company and you only have a
couple racks worth of gear you can test on, so you have to optimize for every
device and stage, or your poorly coordinated tests will tie up infrastructure
someone else could be using. (In the cloud with dynamic infrastructure, this
wastes money as well as time)

\- Not optimizing tests for speed. You don't have to "micro-optimize" to
occasionally do build and test profiling and replace the really slow bastards.
If you _never_ do build or test profiling, you _will_ end up with slow junk.

\- Too many/unnecessary functional or end-to-end tests. You don't need to do
flyway migrations on every build, you need to do it when you've changed the
schema.

\- A gigantic monolithic codebase. If your code is huge and in one repo,
you're going to end up with 100k+ tests, because a one-line change _might_
affect anything in the codebase, so you better test everything every time. If
you design your apps well, you can segment your application into discrete
components that _do not depend on each other_. This way you can do unit
testing on _only the component you changed_ , and then move on to
functional/e2e testing for the whole shebang.

\- Using sub-optimal tools. Changing a build or test from using one tool,
library, or service to another can give exponential performance increases in
things like builds and tests. Lots of people have gotten dramatic speed
increases just by changing a tool. If profiling doesn't make a big dent in
speed, this often can.

~~~
kodablah
Both this and the article only focus on tests, but in other cases the build
time before even running tests is slow. This is often the case in lower level
languages with dependencies.

~~~
craftyguy
> but in other cases the build time before even running tests is slow.

In my experience, this is spot on. In the CI system I help run for testing
graphics, a significant amount of the CI time is spent:

1) fetching source code from external locations for all dependencies that need
to be built for each test, and for the project itself

2) fetching source code from internal git cache to builders

3) compiling projects (ccache helps a LOT)

4) syncing built artifacts to shared storage (NFS)

5) testers sync artifacts (rsync)

6) testers run tests

7) results sync'd back to CI master

8) process results and display to developers (the processing and display of
results also takes a few minute, since it's often over 1 million tests results
to process).

Many of the steps here are repeated multiple times, since we run multiple test
suites across nearly 200 individual testers. Our we average around a 30 min
turnaround time for results.

Steps 2, 4, 5, and 7 take a surprisingly long amount of time, since internal
network utilization depends on CI load (e.g. it could be running other CI jobs
at the same time). We are constantly looking for ways to improve it.

For some of the tests that run, the syncing of artifacts and results take
longer than the tests that run on the tester.

------
GhostVII
I feel like slow CI can definitely be a huge bottleneck in large projects,
with lots of tests. If you are making changes that affect lots of different
areas, there are likely too many tests to run locally, so some test failures
will only be caught by CI. Slow CI makes it take far longer to iterate and
solve these failures, and forces you to multitask more while waiting for CI to
finish, and context switch whenever you have a failure. I think that, if slow
CI is a big problem, it probably indicates that your codebase is too tightly
coupled, and you aren't testing the right things. But it is still a problem.

------
kaetemi
As a developer, you should be able to run the tests that are relevant to your
work manually. The CI is there to do the complete validation, and to help you.
Not to be in your way.

------
sanxiyn
All good points, but sometimes you solved all other problems and slow CI does
become the bottleneck. My last work project and open source project (Rust)
happen to be in such situation.

------
ArturT
Back in 2014 I work on a project where we had ~15min test suite running on CI
and this was painful so I started working on open source solution knapsack for
CI parallelisation for Ruby tests.

Later on developed more advanced way with dynamic tests allocation across
parallel CI nodes not only for Ruby to get CI build as fast as possible. To
give you some idea how it works check [https://docs.knapsackpro.com/2017/auto-
balancing-7-hours-tes...](https://docs.knapsackpro.com/2017/auto-
balancing-7-hours-tests-between-100-parallel-jobs-on-ci-buildkite-example)

or watch video
[https://www.youtube.com/watch?v=hUEB1XDKEFY](https://www.youtube.com/watch?v=hUEB1XDKEFY)

------
monksy
I agree with the blog. From my experience that there are a lot of newer
inexperienced devs that think that feature tests as unit tests are something
valid. (It's not and its flat out wrong, and it is dangerous)

~~~
azemetre
Can you expand upon this? I do a lot of frontend work, and I am slowly getting
into the mindset that nearly all testing we do in react using enzyme is
worthless compared to doing e2e tests with something like cypress.

I rather not have people on my team waste hours writing tests that break the
moment props change, or styling changes, or a child component renders breaking
the test because they used mount instead of shallow. IDK, I think I just hate
Enzyme mostly...

~~~
monksy
In frontend work, you're a little bit of a disadvantage due to the language
and what you're testing. If you're testing interface code (something that
draws something on the screen) the only way you can test that is if you have a
really good testing library that will test it in a virtual drawing space or
function/feature tests.

With the backend you should have 90% unit tests (1000s of them) all completing
in less than 3 minutes, integration tests (7% of all your tests) competing in
under say 10 minutes, and feature tests (those shouldn't be in the build but
in a post build step that can reject a change)

~~~
avinium
I disagree - your “90%/7%/3%” seems completely arbitrary. It might be
appropriate for some projects but I think it’s wrong to adopt as a hard and
fast rule.

IMO, the correct amount of testing is the amount that lets you refactor
quickly and with confidence that you haven’t broken anything, and that tests
pathways/edge cases in complicated logic.

