
Resilience engineering: Where do I start? - azhenley
https://github.com/lorin/resilience-engineering/blob/master/intro.md
======
solipsism
Readers beware -- this particular taxonomy of robustness vs resilience is not
a pervasive or even common one. Often these terms are used completely
synonymously. And often they are used with different subtleties that
distinguish them.

For example, some distinguish between the two terms in that robustness refers
more to staying functional in the face of failures, where resilience refers
more to the capability to work around failures (neither having anything to do
in particular with whether the unknowns were unknown).

The blog post author says that this taxonomy come straight from David woods,
so there's no problem. Just keep in mind that most people don't use these
terms in this particular way.

~~~
vageli
> Readers beware -- this particular taxonomy of robustness vs resilience is
> not a pervasive or even common one. Often these terms are used completely
> synonymously. And often they are used with different subtleties that
> distinguish them.

> For example, some distinguish between the two terms in that robustness
> refers more to staying functional in the face of failures, where resilience
> refers more to the capability to work around failures (neither having
> anything to do in particular with whether the unknowns were unknown).

> The blog post author says that this taxonomy come straight from David woods,
> so there's no problem. Just keep in mind that most people don't use these
> terms in this particular way.

Can you go into more detail with specific examples between the two that
highlight the differences? "Working around failures" and "staying functional
in the face of failures" sound borderline synonymous to me, so I'm curious how
that plays out in practice.

~~~
fao_
> "Working around failures" and "staying functional in the face of failures"
> sound borderline synonymous to me

One is going around the iceberg, the other is ensuring the Titanic can sail on
with a couple of huge holes in it's hull.

~~~
vageli
If the Titanic does not hit the iceberg, it does not enter a failure mode.
That doesn't sound like "working around failures" but "avoiding failure" which
seems very different.

------
rdoherty
This is a great overview. I would also recommend Dekker's book The Field Guide
to Understanding Human Error [1]. It's a bit easier to read than Drift Into
Failure, which I found to be _very_ dense.

1: [https://www.amazon.com/Field-Guide-Understanding-Human-
Error...](https://www.amazon.com/Field-Guide-Understanding-Human-
Error/dp/1472439058)

------
FigmentEngine
The team I work on at AWS wrote a paper on this
[https://d1.awsstatic.com/whitepapers/architecture/AWS-
Reliab...](https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-
Pillar.pdf) covers concepts such as Recovery Oriented Computing (ROC) etc

~~~
FigmentEngine
this is a good one as well, show how humans and our societal systems act
around disasters (we are happy to live on volcanoes even after we see they
blow up) The Big Ones: How Natural Disasters Have Shaped Us (and What We Can
Do About Them) [https://www.amazon.com/Big-Ones-Natural-Disasters-
Shaped/dp/...](https://www.amazon.com/Big-Ones-Natural-Disasters-
Shaped/dp/0385542704)

------
DyslexicAtheist
I think the work by NN Taleb on Fat Tails, Black Swans and Antifragility at
least deserves a mention on this list.

Edit: also this USCSB youtube channel has some cool info on disaster
engineering
[https://www.youtube.com/channel/UCXIkr0SRTnZO4_QpZozvCCA](https://www.youtube.com/channel/UCXIkr0SRTnZO4_QpZozvCCA)
(see also [https://www.csb.gov/videos/](https://www.csb.gov/videos/))

------
nwhatt
I love small concise mini-syllabi’s like this. Just give me the big papers and
set some context.

------
jonahhorowitz
I've been trying to implement and apply these principals at my $job. It's so
helpful to have an intro guide published with all the supplemental reading.
I'm going to send this around to all my teams.

