
Dispatch – Open-source release of Netflix's crisis management framework - m0hit
https://medium.com/@NetflixTechBlog/introducing-dispatch-da4b8a2a8072
======
iruoy
You can also read it here:
[https://outline.com/p3PBUY](https://outline.com/p3PBUY)

Medium really annoys me, because I can't even scroll without disabling my
adblocker.

~~~
athenot
I have Safari configured to open all medium.com pages directly in Reader Mode.
Takes care of all the noise/interruptions they throw up on the screen.

~~~
ksec
How do you do that?

~~~
athenot
\- Open a medium URL.

\- In your toolbar, click the site settings icon (might need to edit your
toolbar if it's not there).

\- Check: "When visiting this website [X] Use Reader when available"

~~~
ksec
This is Brilliant! Thank YOu.

------
madrox
Incidents basically represent engineering culture in extremis. Seeing how
large orgs manage incidents really says a lot about culture. It's interesting
to see Netflix go so far to automate what amounts to trivial amounts manual
labor in (hopefully) rare instances. It says a lot about how they think about
making mistakes and the developer experience working through crisis.

------
lifeisstillgood
So much of this is generalisable to just running a project or a company
(comms, collecting metadata, making smart automation decisions to save time
effort duplication.)

There is a deep business transformation lurking here. As a post here says
Netflix clearly has at its heart "just automate it all".

------
luord
Python, VueJS and Postgres.

That right there is my favorite stack for prototyping. Though, admittedly, I
only say that because none of my prototypes have taken off (yet).

------
jf___
Fun to see `sentry.io` as one of the dependencies, kind of an interesting
level of recursion on an incident mgmt app

------
doublerabbit
So in other words another over-complicated ticket system

~~~
athenot
No, there's a lot more that goes into handling incidents that affect large
production systems. Getting things back up as fast as possible, coordination,
communication, getting the right action items out of it. There are tradeoff
decisions that need to be made, executives and big customers picking up the
phone.

This kind of tooling is what arises in an effort to automate and streamline
incident response. When you're operating at Netflix' scale, each minute is
precious and if a tool manages to save 45 seconds on each incident, it can be
quite valuable.

~~~
rhizome
The incident workflow about halfway down reads to me like a lingoed-up version
of "create a bug, escalate, put a Slack (etc.) URL in the bug, send the bug to
blamees/ondutys, message boss(es), finish fix and push, schedule a meeting for
the next day." Which it turns out that I've guessed reasonably well, having
read the rest of the article. I mean, there's decades behind this very use-
case, and at the end of the day it's possible to hook out to Slack from RT,
too. But they're not using RT, true.

[https://rt-wiki.bestpractical.com/wiki/WorkFlow#Modeling_Wor...](https://rt-
wiki.bestpractical.com/wiki/WorkFlow#Modeling_Workflow_in_RT)

I don't have a problem with the work -- like I said, it's a persistent use-
case -- it's just the way it's described here, as if it wasn't and with
puffery. And the thin-ness of my skin with this is not the issue!

