
There is No Root Cause - themcgruff
http://www.kitchensoap.com/2012/02/10/each-necessary-but-only-jointly-sufficient/
======
kscaldef
The points here seem valid, but I think the author misunderstands 5 Whys
analysis a bit. From <http://en.wikipedia.org/wiki/5_Whys>

"It is interesting to note that the last answer points to a process. This is
one of the most important aspects in the 5 Why approach - the real root cause
should point toward a process that is not working well or does not exist.
Untrained facilitators will often observe that answers seem to point towards
classical answers such as not enough time, not enough investments, or not
enough manpower. These answers may sometimes be true but in most cases they
lead to answers out of our control. Therefore, instead of asking the question
why?, ask why did the process fail?"

and

"These tools allow for analysis to be branched in order to provide multiple
root causes"

~~~
vacri
I was a bit puzzled reading the article, because in the places I have worked,
'root cause analysis' isn't looking for the simplistic thing the article
describes. If it is simple, great, but more often than not the root cause is
poor interaction between things, which can require multiple changes or even
under-the-bonnet refactoring. The article's description of root cause analysis
sounds like something a first-year undergraduate would think.

~~~
cbsmith
I'm getting the impression that that HBR video has had a much wider
penetration than the 5 Why's concept as a whole. This seems to have lead a lot
of people to draw the wrong conclusions about even what 5 Why's is really
about, let alone how it works.

------
AngryParsley
_During stressful times (like outages) people involved with response,
troubleshooting, and recovery also often mis-remember the events as they
happened, sometimes unconsciously neglecting critical facts and timings of
observations, assumptions, etc._

This is a great reason to use IRC or some other loggable, text-based medium.
After everything's green and you're doing the failure analysis, you can look
at the logs to see what happened and when. IRC also makes it easier to
collaborate if things are broken in the middle of the night when everyone is
at home.

~~~
mey
Why we have central monitoring systems that aggregate system data together.
Makes trend analysis and correlation cross systems very useful. Unfortunately
its a home grown solution, not sure what's available off the shelf.

------
DanielBMarkham
Great post. A keeper.

We see this same type of thinking when dealing with all sorts of other
complex, multi-dimensional systems. Economics, for instance. There are a huge
number of people who think economies work in this same linear fashion. Or
managing large groups of developers.

In some systems that are becoming more complex, such as avionics systems for
large airplanes, it's getting to the point where the old root cause analysis
methodology is still being used although it's getting less and less applicable
(I'm speaking specifically of the crash over the ocean of the Air France
flight, but there are other examples)

Our minds desperately want to live in a world with clear causality. Do X and Y
will happen. When the world doesn't live up to our expectations, many times we
just get out a bigger hammer.

Looking at this problem solely from a philosophical standpoint, it looks like
there is a powerful argument in place for tiered systems to have some sort of
distributed goal-seeking self-programming (machine learning), especially when
dealing with large numbers of identically programmed/configured computers.
That way the same combination of obscure causes wouldn't have such a
disastrous multiplicative effect. Would be cool to chase that down further
sometime.

~~~
billswift
The real world still has clear causality. Just because the links of causes and
effects (and the involved feedback processes) are often too complex for most
people to follow does not mean that they are not there. You might as well say
that because most people can't do calculus, calculus isn't really useful.

~~~
lucisferre
Yeah this was not a keeper for me. "Causality is complicated..." yeah and if a
butterfly flaps it's wings... blah... blah... blah...

This offers no insight as to how to improve process when failure occurs. Sure
dogmatic application of root cause analysis is foolish, but the same obvious
conclusion could be reached about dogmatically applying any type of management
principle or analytic process. Failure to think outside of the box is a
failure too.

What is annoying is that the author suggests that people should be skeptical
of root cause analysis and 5 whys, without offering anything concrete as an
alternative approach.

Every management technique is simply a practice towards further learning, but
failing to practice anything simply because you can find flaws in everything
accomplishes exactly nothing.

------
jayferd
I love this article. I've seen this firsthand, having been the "fall guy" for
an outage that nobody really understood. There were many causes for the
outage, including poor software design, but the entirety of blame and
punishment landed at the convenient "single place where a human touched
something".

------
mathattack
I agree - systems fail for many reasons at once. People like root cause
analysis because it allows everyone to point the finger at someone else.

Of course the flip side is equally awful. When folk say "that's just how it
works around here" you know you are doomed.

~~~
bdunbar
_systems fail for many reasons at once._

Disagree.

Now, I will allow that very complex systems can mask that root cause.

One might _never_ be able to find out the root 'why' for a number of reasons:
lack of time, inability to see into the black box where the failure happened.
Perhaps everyone is dead, the data you need destroyed, the widget is lost
under the ocean.

And that sometimes the root cause failure isn't a hard technical thing but
something squishy like 'we failed to budget for disk space' or 'the CIO
insisted we do it that way'.

But there is _always_ a root cause.

------
ajuc
So rootcause is not A, but A && (B || C) && D.

It is obvious to any programmer, that bug can occur depending on many
factorshappening at once, or in the "right" order, years of bad data
accumulating, etc. That does not destroy causality, nor make analysis useless.

------
cagenut
Of all the really out-of-nowhere/longer-lasting outages I've been hit by a
clear pattern emerged that its always the "alley-oop" or "one-two punch combo"
issues that really get you.

------
betageek
I get where he's coming from but for most of these situation it's the
complexity that's the root cause and that's where you end up on your 5th why.

------
nknight
I agree with the point, but disagree with how the word "trigger" is being used
here.

An identifiable "trigger" is usually present, but it's not the root cause,
it's _the last condition to be satisfied_ , the final event that "triggered"
the Rube Goldberg machine that brought the system down.

Identifying _that_ trigger can be valuable, because it sometimes points to a
clear design problem, like an error in assumptions about how the system is
used.

~~~
barrkel
Sorites paradox; in complex systems - especially ones with feedback - knowing
the tipping point trigger isn't very useful.

Traffic jams can be caused by density of traffic. A slight variation in speeds
cascades through human delay in reaction, causing a longitudinal wave to
ripple backwards through the traffic. If the amplitude of the wave gets high
enough, sections of traffic will periodically reach a standstill while the
wave works its way through.

There's no meaningful trigger here. Decomposing the whole into parts won't
solve the problem. If you didn't know that traffic density causes jams,
looking into the root cause would seem mysterious, because the chaotic
behaviour that gives rise to the initial perturbation is essentially
unimportant. It's the interaction between the parts that matters, not the
"trigger".

And even when you've "solved" this by building more and wider roads to spread
the density, you find a different level of homeostasis; better transport
infrastructure like roads encourages people to take more journeys, live
further apart with more space, further away from work and play, leading to
more traffic again.

Sometimes, when solving a problem, looking inwards, to parts, to triggers, to
root causes, isn't the right approach; looking outwards, to the holistic
whole, running experiments and simulations, creating new theories, is better.
But this is a synthetic approach, not an analytic one driven by 5 Whys.

