
Measuring and Improving Your CI/CD Pipelines - kiyanwang
https://blog.petegoo.com/2018/11/09/optimizing-ci-cd-pipelines/
======
hinkley
> Parallelize

This was my theory early on in my journey, mostly before CI/CD became a thing
a few people were doing. My enjoyment of concurrency didn't survive contact
with my peers and subordinates. I was frequently stuck working on certain
types of problems because either I caused them or there were few other people
who could understand them and not all of them were patient people.

It's really easy to design a concurrent system that only 20% of the people on
your project will properly understand. You are doing deep violence on your
team if you are creating pervasive parts of the system that most of them can't
understand.

In a CI pipeline the first concurrency problem is representational: How do I
present the progress of 4 tasks happening in parallel when one of them is
failing?

The status and behavior of the CI pipeline needs to be obvious to all. If the
CI tool has an answer for this problem (like parallel tasks or jobs), then use
it if it saves you time. Otherwise, I don't know of a good way to report on
the progress of simultaneous tasks, and it's better not to try than to try and
fail. Race conditions in your build can sometimes take hundreds of runs to
become obvious. And by then it's difficult to roll things back.

~~~
petegoo
We developed our own representation of this in Slack. It's just a tool to
optimise throughput but you're right, you need to be able to represent it and
surface failures early. For us these are separate builds, chained together
with fan out and fan in.

Parallelizing within a test suite / build is a whole other thing and yes,
there be dragons.

------
vi1rus
As a DevOps guy I find the biggest hurdle Dev education and stubborn
management.

Right now 90% of the end to end tests could have been run during unit testing.
Instead they are run after a full code deployment. This adds an extra hour of
testing. :(

~~~
hinkley
End to end tests seem to be a crutch.

Of the people I've observed learning testing, the ones that do e2e tests early
pick up habits that they can't seem to unlearn. And the existence of the e2e
tests seems to block prioritization of architectural changes to make unit and
functional tests more effective.

And the frameworks are never what I would call reliable. You can do work to
remove race conditions from them but it takes tremendous discipline (if a tool
is wrong by default, that to me means they are using the wrong metaphor).

These days I try to keep people focused on unit tests until they run out of
runway.

~~~
claytoneast
Do you have any good books you'd recommend on what you feel is the proper
approach to testing? I feel that I reach for feature/e2e tests first, when
perhaps I really should be building up a solid base of unit specs before
moving on. I'm always a little unsure what specs I should have vs. are
unnecessary.

~~~
teeray
There are some really fantastic testing resources in the Ruby community. The
talk that most influenced my approach to building unit tests (even though I
write Go these days) was one of Jim Weirich’s:
[https://youtu.be/983zk0eqYLY](https://youtu.be/983zk0eqYLY)

I’ve found that his zero-knowledge approach gives me a suite of tests that
have high signal when they fail.

Sandi Metz has also spoken extensively on testing, and I particularly like her
advice on using mocks in tests appropriately. This talk of hers on the subject
comes to mind: [https://youtu.be/URSWYvyc42M](https://youtu.be/URSWYvyc42M)

------
MattPearce
Neat to see an article from Pete on here - I worked with him at Pushpay, he's
brilliant. The CI/CD pipeline the SREs built there was a joy to work with (and
that's quite a compliment for a CI/CD pipeline!)

~~~
kornish
Out of interest, what was the stack and what were some of the aspects that
made the pipeline so great to work with?

~~~
MattPearce
They're a Microsoft shop so it was C#, SQL Server, RabbitMQ etc running on
AWS.

What I liked the most about the pipeline:

\- Speed - we spent a lot of time (as the post says) optimising the process of
getting changes into production, and making it as streamlined as possible

\- Safety - automated testing caught a huge percentage of the issues, meaning
we were able to fix them earlier and avoid the turnaround time of finding out
later in the process.Tests included visual diffs of many of the pages,
approval tests to check contracts and routes didn't change, etc.

\- Transparency - while there are obviously differing opinions on ChatOps, it
was great to be able to scroll back through Slack history on the shipping
channel and see a complete record of the pipeline for a particular deploy,
seeing the execution of the automated steps interwoven with the conversations
of the team members working on it. It was also great being able to see the
shipping queue at all times so you could take a look and judge how long it
would take to get a change through, and could negotiate with others if you
needed to jump ahead of them etc.

\- Focus on having everyone involved - everybody was involved in reviewing,
merging, etc. The aim with a new hire was to get them to complete a change and
deploy it to production themselves within their first week. If you were the
first person on a "carriage" it was your responsibility to "drive" it and to
judge the risk factor, whether to allow other specific changes in the
carriage, etc. This meant everyone spent a lot of time thinking about how to
reduce risks in their PRs (smaller PRs, more tests, always feature flagging,
etc) which was much healthier (IMHO) than having one or two people in the team
being responsible for merging or deploying etc and having all the
responsibility.

Some of it is cultural as well - Just Culture (blameless postmortems etc),
being brutally honest (radical candor) and willing to continually refactor
processes.

