
Domain-Oriented Observability - arunc
https://martinfowler.com/articles/domain-oriented-observability.html
======
btilly
I hoped from the intro that they would be suggesting an approach that was
standard at Google a decade ago, that the rest of the world never adopted.

But no such luck.

However if anyone is creating a distributed product today in 2019, here is an
excellent suggestion. Free of charge.

In whatever common code you have for your RPCs, add the ability for an RPC
message to be traced. If it is traced, then you will log when it is received,
when you reply to it, and also cause all RPC messages you send while handling
it to also be traced.

At your front end you can then pick an arbitrary fraction of your traffic, say
1%, and trace it.

Tracing such a small fraction of traffic means that you can instrument it in
real time, gather the logs, and create a complete view of how each web request
cascaded through your system. And then whenever you run into a slow request
that only happens for a small fraction of your traffic you can find a request
that was both instrumented and slow, and then dig into those rare problems
that happen here and there, then add up.

If you don't do this, then good luck tracking down the performance problems
that happen 5% of the time, 3 RPC calls from the front.

~~~
Groxx
This is basically what
[https://www.jaegertracing.io/](https://www.jaegertracing.io/) /
[https://opentracing.io/](https://opentracing.io/) are intended to do. Not
sure who widespread they are, but e.g. Uber uses it:
[https://eng.uber.com/distributed-tracing/](https://eng.uber.com/distributed-
tracing/)

~~~
jsiepkes
In my experience its pretty common? Aside from projects like
[https://opencensus.io](https://opencensus.io) a lot of other technologies not
primarily about tracing like service proxies (Envoy, linkerd, etc.) also
support various types of tracing based on message id's.

------
mlthoughts2018
I’ve had good success using decorators and infrequently also metaclasses to
cleanly solve this type of thing in Python.

However I somewhat disagree with the article in the sense that instrumenting
reporting logic is just table stakes bare minimum of software in a business.
90% of business software is about creating a report for somebody somewhere,
often needing to be rapidly changed for ad hoc requests from product or
management people. Reporting code _is_ the code. It may even _be_ the business
logic in a truer sense than the application logic itself.

Another lesson along these lines applies to machine learning and modeling
systems: the mathematical algorithm is always the easiest part. The hard part
is situation-by-situation customized data ingestion (things that cannot be
solved by standardizing on platform-specific data formatting, like Hadoop or
Spark), and then also defining business metrics that capture the success of
the model at a high level (as opposed to engineering metrics like accuracy,
precision and recall, etc).

Model training and the 10% of the project spent tweaking an algorithm or
optimizing things just pales in comparison to the type of systems you need to
facilitate efficient custom data ingestion that can be arbitrarily different
on a per project basis.

------
redact207
I'm not a fan of logging errors and returning success codes in lieu of
throwing them. It leads to some very nasty support tickets and difficult to
debug problems.

There's also a lot of mixed concerns here around in place logging and
analytics collection for something that is a very business-logical focussed
class.

If you're wanting to do things like this, I'd encourage you to have a look
into a message driven Domain driven design structure. It's approach would be:

\- remove all logging and analytics concerns from the class

\- throw errors based on business rule violations, or alternatively publish a
failure type message if it's logic path that had compensating actions

\- publish all mutations of the shopping cart as immutable events

\- subscribe to events related to analytics/logic and do that work in a
handler, or do it in a higher level application service so the shopping cart
just contains shopping cart concerns

A lot of this can be distilled down to basic DRY and SRP principals

~~~
nine_k
Returning an error code or throwing an exception result in the same problem.
In one case, the error code is ignored. In the other, the exception is
swallowed. I bet that monadic binding (as now widespread with the use of
promises) can be equally abused.

The problem is in the engineering culture, not the particular tool used.

------
revskill
I would rather use Publisher-Subscriber pattern here. Everytime i want to
"log", i'll send a "log" event. Publisher doesn't care it'll send to whom,
because it's not its responsibility.

------
dm03514
The examples are interesting in that why is it the Shopping carts
responsibility to report the results of the discountService?? If the metrics
have to be surfaced, someone, somewhere needs to surface them: and I've been
finding it achieves the cleanliness and doesn't break encapsulation when the
service (discountService) is responsible for its own hooks, or monitoring
adapter. Like mlthoughts2018 i've been finding success using decorators for
this because it keeps the domain logic clean of metric/monitoring code:

(shameless plug i recently wrote about how decorators can be used to keep the
domain metric/plumbing free)

[https://medium.com/dm03514-tech-blog/designpatterns-
consider...](https://medium.com/dm03514-tech-blog/designpatterns-considering-
decorators-b4968a8006e8)

(also just disclaimer: Team's i've been on have had lots of success using the
approach outlined in the article, injecting a metrics/instrumentation object,
it cleans up the domain logic, is easy to provide a stub implementation during
tests, and abstracts the from metric implementations.

\------

A decorator version might compose the production code by wrapping the domain
logic in metric adapters:

    
    
      metrics = InstantiateMetricsClient()
    
      ObservableShoppingCart(
        ShoppingCart(
           ObservableDiscountService(
             DiscountService(
    
             ),
             metrics=metrics,
           )
        )
        metrics=metrics,
      )
    

\----

Another option that I've seen be used successfully is to completely decouple
domain events and the surfacing of domain events by having the domain
generically emit events, and the logging/metrics subscribe and surface those
however it decides.

\----

I feel like for normal logging statements and metrics it only borders on being
an issue, but when metrics include many timing wrappers, or when tracing
(which involves much more reporting) is involved explicit adapters help to
really keep things clean.

------
ivan_gammel
Fowler just reinvented GoF pattern Observer. There was a reason we called it
„pattern“, which is now being devalued by a new name for an old thing.

~~~
skybrian
Although it's on Fowler's website, it seems this article was written by
someone else.

~~~
Rexxar
Indicated in a panel at the top:

 _Pete Hodgson : Pete Hodgson is an independent software delivery consultant
based in the San Francisco Bay Area. [...]_

------
quelltext
Looking forward to where this article is going. It looks a bit AOP-y to me.

------
dfboyd
Zipkin. You've reinvented zipkin.

~~~
munchbunny
How is this reinventing Zipkin?

Zipkin looks like a library for collecting and inspecting distributed traces.
This post is about how to keep your instrumentation code clean. Unless I
missed something while I was reading it, it doesn't say anything about the
underlying infrastructure.

