
A Richter scale for outages - smcgivern
http://interconnected.org/home/2015/03/12/richter_scale_for_outages
======
poulsbohemian
Corporate America often has major issues that even internally they can't
decide where it rates on the scale - and corporate communications is not about
to let that information out. Outright outages for major corps or
infrastructure is rare, but performance degradation is so constant for some
companies that stumbling along in constant fire-fighting mode is considered
normal, IE: something like a 4.0 on this scale is _every day_. A significant
portion of my time over the last ten years has been consulting to large
enterprises to solve their production problems and try to ensure they don't
happen again. It's _shocking_ how broken things can be before a dime will be
spent to improve anything.

~~~
genmon
I get super concerned when I hear reports like this -- what I'd like to hear
is that the Netflix chaos monkey method is being adopted for critical
infrastructure to increase resilience. Instead I can totally imagine a
software fault that brings transit across a city down for a few days. I think
we had a software fault in UK air traffic control recently that knocked out
flying for about half a day -- who knows how much of this is going on

------
United857
Why would "collapse of minor network requiring rebuild. e.g. recent Sony hack
that meant no computers, printers, or existing network infrastructure could be
re-used without manual check of each item." rate a 6.0 (affect a single
organization) whereas "Minor network freeze but can be recovered with a
reboot; broad human inconvenience without threat. e.g. regional ATM network
down for a day, cellular network down for a day for single operator." is 4.0
(affect many members of the public in a region)

~~~
genmon
Good point. My examples should ideally be the same scale. For 4.0, I think
it's about a single point of failure that, when rectified, recovery is simple.
For 6.0, some kind of permanent and cascading problem. The difference between
the server being down, and accidentally shipping an update that bricks all
clients.

------
Thrymr
> Like the Richter magnitude scale

Obsolete? The Richter scale is historically important, but in modern usage has
been replaced by moment magnitude [0].

As others have pointed out, in any case this subjective scale has more in
common with the Mercalli intensity scale [1].

[0]
[http://en.wikipedia.org/wiki/Moment_magnitude_scale](http://en.wikipedia.org/wiki/Moment_magnitude_scale)

[1]
[http://en.wikipedia.org/wiki/Mercalli_intensity_scale](http://en.wikipedia.org/wiki/Mercalli_intensity_scale)

------
nuclear_eclipse
> 2.0 ... Facebook down ...

For some users, this more like a 9.0. We've literally seen people _call 911_
because they can't access Facebook during an outage.

------
genmon
Author of linked blog post here -- would love to hear what you think!

~~~
superpatosainz
It's a very interesting concept, because now in 2015 I've seen more and more
things using technology and computers where they shouldn't be used. People
seem to not understand that the gimmick feature that needs a computer just
adds a billion moving parts (one for each transistor, statement of code, etc)
that may fail.

I was in the process of buying a new TV and I couldn't find a good ol' TV,
nope, all there was "super awesome smart voice commanded can see the weather
TVs"... so I bought one intending to just use it for cable (I still used
netflix in my computer) and after a whole damn day of updating firmwares and
waiting for it to boot (I miss the CRT days)... some capacitors that were in
the backlight circuit blew. Thank you very much Samsung. (Btw another clear
example of misusing technology was yesterday's post about trains using GPS for
"smart door controls")

Oh also, the scale in your post is more of a Mercali than Richter.

~~~
genmon
I didn't realise there was an effect-focused earthquake scale, that's
definitely more what I meant

------
JamesBaxter
I like the idea for internal discussion of problems but I don't think we'll
see usage of a scale like this on say a corporate support Twitter account.

It will end up with the Hacker groups of the world bragging about the 10s
they've caused major technology companies and competing for scores...

------
steakejjs
Would a possible example of a large magnitude event be the 2012 Virginia
Derecho[1] ?

Does anyone remember this evening as a day of large outages? I wasn't able to
experience the event as an internet user having lost power. This is the type
of event I would EXPECT to get a big magnitude.

[1][http://en.wikipedia.org/wiki/June_2012_North_American_derech...](http://en.wikipedia.org/wiki/June_2012_North_American_derecho#Virginia)

~~~
genmon
that's the kind of thing. the one that comes to mind for me is the 9 hour AT&T
long distance outage of 1990:
[http://www.mit.edu/hacker/part1.html](http://www.mit.edu/hacker/part1.html)
\-- it's these kind of emergent outages that really interest me

