
Monitoring Microservices with Synthetic Transactions in Go - martinsahlen
http://labs.unacast.com/2016/03/10/monitoring-microservices-synthetic-transactions-in-go/
======
djb_hackernews
And what watches the watchmen?

The microservices trend is great but this sort of monitoring starts to fall
apart when you horizontally scale for HA. In this case we separate out our
monitoring by doing synthetic requests at the service level, and then very
specific health checking at the "instance" level (really containers now).
Instance level health checks will ensure connectivity to outside dependencies,
databases, filesystems etc. The trick is to know when a failure is a localized
failure or a widespread failure. No sense in taking out all of the instances
if they all can't talk to the DB.

Also, I'm not a Go developer but is this idiomatic?

~~~
jpittis
> Also, I'm not a Go developer but is this idiomatic?

The error handling is kinda strange.

I would do some like this:

    
    
        var ErrUnexpectedResponse = errors.New("unexpected response")
        ....
        func syntheticHttpRequest(url string, apiToken string) error {
            ...
            } else if resp.StatusCode != 202 {
                return ErrUnexpectedResponse
            }
            return nil
        }
    

(a bunch of other stuff was mentioned in the comments)

~~~
weberc2
What's strange about this?

------
driverdan
We're doing something similar but for our whole stack.

We have a large, complex pipeline for processing incoming data. Historically
it has been very opaque which made debugging data errors hard.

We added event logging throughout the pipeline. We are building a tool that
feeds known data in one end and checks the final output. We can then use the
event logging in between to monitor state throughout for errors. It also
allows us to see how long it takes to transition between events.

Ideally the tool will check every event for deviations from the expected
output and alert us if any events fail. It will also alert us if the time
between events rises above the average by a predefined threshold.

I'd love to hear about any open source tools designed to do something like
this.

------
Animats
Well, of course. There are other things you need if you do this. Large systems
need some way to direct internal transactions to specific servers. In systems
with load balancing, one server may be in trouble, but the others are carrying
the load and the overall system seems to be OK.

To monitor this, you need some kind of dashboard which displays the state of
all your servers, and shows the dependency relationships. If A calls B and B
calls C, and B fails, you'll see A and B as down. You need to be able to
establish that B is the problem. (Microservice architectures which are not
DAGs, i.e., they have loops, are a huge pain in this sense.)

------
twic
Note that this approach isn't just useful for microservices - it's a great way
to monitor any complex system, even if it's a monolith.

------
travjones
I wasn't familiar with "synthetic transactions" until I read this post--great
write up! Synthetic transactions are kind of like TDD on a live system (really
loose analogy). It's a cool concept for keeping microservices consistent and
bug-free by verifying responses return data that are expected.

------
valevk
Why does the author use double ticks? As far as I know you use only one.

    
    
      ID string ``json:"id"``

~~~
gronnbeck
Well, that is a typo. I'll fix it =) Thank you!

~~~
unfunco
You should definitely run go fmt, at the moment it's rather difficult to read
the examples.

~~~
gronnbeck
I had to anonymize the code a bit and it seems like I messed up doing so in
the gist. I am sorry! I will unsure to run it through gofmt the next time.
Thank you for the feedback!

------
sboak
I'm going to risk a plug since our service is highly relevant. If you're using
AWS, Opsee will let you define service-level health checks and automatically
detect instance membership. You can set detailed assertions on response
bodies, including JSON keys, to verify that your services are responding as
expected. This all works inside your environment, running checks from an
instance we spin up. More info at opsee.com

