
Monitoring-Driven Development - henrik_w
http://benjiweber.co.uk/blog/2015/03/02/monitoring-check-smells/
======
CraigJPerry
Yeah I agree. Rob Ewaschuk at google has an excellent document called
something like "my philosophy on monitoring". It's fresh in my mind because
I've been using it as a kind of social proof for improvements I've been making
in the monitoring of our application.

Symptom vs cause based monitoring he calls it and I find the terminology works
really well.

He doesn't mention what you allude to - this approach works very nicely
alongside teams who's development practices see them iterate on features.

~~~
henrik_w
I guess this is the document you mean:
[https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa...](https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit)

------
gatehouse
Similar to what Steve Yegge said in his unintentionally published platforms
rant:

 _\- monitoring and QA are the same thing. You 'd never think so until you try
doing a big SOA. But when your service says "oh yes, I'm fine", it may well be
the case that the only thing still functioning in the server is the little
component that knows how to say "I'm fine, roger roger, over and out" in a
cheery droid voice. In order to tell whether the service is actually
responding, you have to make individual calls. The problem continues
recursively until your monitoring is doing comprehensive semantics checking of
your entire range of services and data, at which point it's indistinguishable
from automated QA. So they're a continuum._

[https://plus.google.com/+RipRowan/posts/eVeouesvaVX](https://plus.google.com/+RipRowan/posts/eVeouesvaVX)

~~~
swah
This was a great rant and I wish I could have access to all those "learnings".
Are there any books or blog posts with that knowledge already? The best
resource I know are Martin Fowler's posts on the subject...

~~~
gatehouse
I'm not aware of anything really in-depth.

There is one paper I like about operational issues in general:
[https://www.usenix.org/legacy/event/lisa07/tech/full_papers/...](https://www.usenix.org/legacy/event/lisa07/tech/full_papers/hamilton/hamilton_html/)
. It lists a lot of criteria that must be met for a system to be highly
automated.

~~~
swah
That material is great, thanks! The style reminds me of c2 wiki.

------
SEJeff
I've always followed the "metrics driven development" school of thought. If
you make capturing "application telemetry" part of the deployment and running
of your application, you make it super easy to monitor via a more or less full
on integration test using production traffic. Api response time > 300ms for >
5 seconds? Pager duty!

------
fennecfoxen
Interesting. My tentative 39000-foot-view understanding of Booking.com is that
they run what might be thought of as an A/B-test-driven development shop: you
know your code is busted when it makes less money. It appears to be working
out for them, so there must be _something_ to it.

(Rumor also has it that they're actively hostile towards more traditional
automated testing and test-driven development, which I assume is a prime
reason they didn't want me to work there, so I can't tell you more...)

------
bbrazil
I'd tend to agree. I've found many times that when a system is hard to monitor
that it's also difficult to manage more generally, and that a rearchitecture
is in order.

