
Monitoring Sucks. But Monitoring as Testing Sucks A Lot More - javinpaul
http://java.dzone.com/articles/monitoring-sucks-monitoring
======
btilly
_At Devopsdays I listened to a lot of smart people saying smart things. And to
some people saying things that sounded smart, but really weren’t. It was
especially confusing when you heard both of these kinds of things from the
same person._

Experience has taught me that if someone who is clearly smart, who clearly
says smart things, says something that sounds dumb to me, it is worthwhile for
me to not just dismiss that. Instead I need to examine my preconceptions for
why I disagree, and why they came to the conclusion that they did. Until I
have satisfied myself that I understand both why they thought as they did, and
why I disagree, there is a good chance that I'm missing a valuable lesson.

This example is a case in point. Clearly it is from the point of view of a web
company. The advice offered is not for all environments - there is a world of
difference between a case where downtime means someone doesn't see a picture
for 15 minutes and one where someone dies.

Now about unit testing and monitoring, let me give an example. I know a
company (which I can't name) that releases multiple times per day, and
releases every change as an A/B test. This is important. If they release a
change, that works exactly as designed, that hurts conversion by 10%, _THEY
WILL KNOW_. (You need significant traffic to follow this strategy, they have
that.) This is important. There are a lot of trivial changes that could move
the needle 10% without your realizing, and you don't want to move it in the
wrong way.

In fact, if you look at the dollar values, a bug that causes 1% of pages to
crash which unit testing could catch is simply not as important as a bug that
hurts revenue that the A/B test could catch.

But it gets better. If you have 0 tolerance for web pages crashing and have
monitoring in place to catch it (I know people who have all crashes email key
developers), then you'll catch a lot of bugs that you would catch with unit
tests, AND you'll catch bugs that you SHOULD have caught with unit tests but
messed up on the test. Which, then, provided more value, the unit test or
monitoring?

You want the unit test. You don't want to be catching stuff after you roll
out. And one of the automatic questions when you do catch it should be, "How
could we have caught that with an automatic test?" But having smart monitoring
is more valuable to you than the unit test.

~~~
Terretta
The A/B test or phased rollout test works if you have great telemetry
(NewRelic, etc) and sufficient traffic that results are statistically sound.

A low traffic site working on MVP releases can synthesize both the OP article
and the comment above by "unit testing" the essential functions of the web
site from users' browsers point of view. Think Selenium, Watir, PhantomJS, and
the like.

<http://jquery.bassistance.de/webtesting/presentation.html>

------
benjaminwootton
This rant completely ignores context.

In the air traffic control system, you can never have an error and need to
take every possible precaution up front to avoid bugs.

In a lot of situations, you might be prepared to slightly increase the risk of
introducing bugs in order to move towards continuous or much more frequently
delivery.

I'd argue that in _most_ applications and businesses the later scenario is
true - it's just that where you draw the line just varies from project to
project.

Once you've accepted that, proactive monitoring is your second line of
defence.

------
Spooky23
I think the root of what this guy is ranting about is that many web people
seem to confuse rapid release iteration with rapid engineering iteration. The
folks who do this don't understand that the frameworks that they are using are
taking care of their lack of forethought -- but will only do so for a limited
time.

Design, implementation and testing are different disciplines that are
essential to reliable systems. You may not need to formally embrace all of
them at a given point in time (or product cycle), but claiming that "testing"
is a fraud or that "roll back" is mythical is just a demonstration that the
speaker doesn't have a mature engineering background.

~~~
NoahSussman
I'm pretty sure that I didn't say that testing is a fraud. Although I did say
that I think production monitoring should be set up on day one of a new
project, while unit testing can wait until the project has matured a bit. I
also said that unit testing isn't going to be nearly as helpful without having
production monitoring in place. And I said that _with_ production monitoring
in place it's possible to get by with a lot less unit testing than has been
historically prescribed -- which is a good thing since production monitoring
systems tend to be less expensive to maintain than unit tests.

I did at one point say "there's no such thing as rolling back." This is an
idea to which I'm pretty dedicated. I've seen first hand many times that using
an SCM revert command to attempt to restore "last good state" is a risky
endeavor, especially when a large changeset is involved.

------
aidenn0
Talking about not testing at all is both stupid and non-sensical, as there
will always be some informal integration testing

Not doing _any_ integration testing would translate to the developer making a
change in the code and checking it into the VCS without running it. That does
happen from time to time, but it's obviously a bad thing, right?

If you build and run a smoke-test before checking in, that's integration
testing! So clearly everyone does at least a tiny amount of integration
testing, and the question is how far do you take it, and how much effort do
you put into automating it is a tradeoff. From experience, the more effort
that is put into testing, the fewer bugs will be seen by customers, though you
do run into diminishing returns.

Monitoring has the advantage of reporting just bugs that are seen by
customers.

QA I think of more as having professionals that pretend to be customers,
which, done right, can let you catch the bugs that are egregious enough to
make you look had actual customers seen it.

