
Mistakes teams new to Chaos Engineering make - tylopoda
http://dadontherunblog.com/2018/06/05/the-five-mistakes-i-see-teams-new-to-chaos-engineering-make/
======
pnevares
For those not previously familiar with the concept:
[https://principlesofchaos.org/](https://principlesofchaos.org/)

------
throwaway5752
It's not going to take over the world.

What this is a form of testing, and applying boring old requirements to
operations. The whole idea of "chaos" is just a really sad tell about the
state of planning for some places.

You should be able to articulate your reliability requirements in certain
situations and the verify your stack meets them before a release. And if it's
cost prohibitive to do that in a non-prod environment then planning around
testing in production in your pre-release planning.

It's silly, it's like classifying a test engineer as an "exploratory tester",
which would clearly be a mistake. That is just a type of activity of an
engineer, not an engineer role. This is just exploratory load and scalability
testing, and falls under the responsibility of a test or devops team.

~~~
sp332
_Chaos experiments are costly to implement relative to unit and integration
tests and so are a poor choice for detecting issues unit or integration tests
could. Instead, Chaos experiments should be designed to find the truly hidden
flaws which only surface in real world usage with real user traffic and
production environment configuration._

This is in case you didn't quite articulate or verify your real requirements.
The more confident you are that your test environment matches your prod
environment, the less you need this.

~~~
throwaway5752
I'm not unrealistic - functional requirements are hardly ever complete, and
they're almost always better than scalability and capacity planning.

Call me old fashioned, but I don't like messing around with production. I
think you can reasonably scale down production environments and simulated wan
networks packet loss/latency. I can't imagine a platform moderate importance
not having some sort of benchmarking and scalability as part of their pre-prod
pipeline automated acceptance criteria. It feels like this work would fall
under that engineering role (again, performance test engineer or devops/sre).

Anyway, I think I chafe at the name more than anything. I know nonlinear
dymanics, stress testing, queuing theory, et al and it just feels overly glib
for what should be the most serious sort of activity a company considers.

------
mentat
The most interesting chaos experiments are the ones that show you gaps in your
observability, when you know you've broken something and know you can't see
it. The teams I've been on have learned that there are different sorts of
observability that you need for different kinds of failure. Also, with complex
systems, you need to learn how to operate your observability. Designing or
testing resilience without being able to measure effectiveness is a time sink.
Know that you can see failure first.

------
pure-awesome
I struggled to parse that title. A more verbose, paraphrased version:

"A list of the mistakes that teams make if they are new to the concept of
Chaos Engineering"

