
Ask HN: Do you have problems finding stuff across logging/monitoring systems? - El_Mo
I constantly find myself struggling to answer questions that require me to correlate data across multiple logging and monitoring products (Splunk, Pingdom, Elasticsearch, Icinga, Prometheus, etc., etc.).<p>For example, I’m often asked to get NFR figures, do SLA reporting or trace events across stacks for root cause analysis. It is not easy in a big shop. More and more I’m noticing that no matter how well thought out these tools are, they can act as data silos pretty darn quickly.<p>Do you face this issue? Do you easily solve it? If so, what tools do you use?<p>I’m asking ‘cause I have an idea for a new side project, but I don’t want to build something that only me will find useful (sadly, I’ve done THAT before).<p>I would be VERY grateful for any comments or PMs on this topic.
======
itamarst
I've designed a logging system designed to address some of these issues
([https://eliot.readthedocs.io/en/1.2.0/](https://eliot.readthedocs.io/en/1.2.0/))
and I know people have e.g. used the trace ID it provides to correlate Sentry
errors with logging traces. Not a full solution, obviously, but if you do go
this route do borrow some ideas if it's helpful.

As far as a business goes: it's a real problem. Whether people will pay for
it, that's something else. You may wish to spend some time reading up on
questions real people ask about these systems (e.g. on relevant forums) to see
what patterns you notice. [https://stackingthebricks.com/vintage-sales-safari-
in-action...](https://stackingthebricks.com/vintage-sales-safari-in-action/)
and other articles there have some examples of doing this sort of research.

~~~
El_Mo
Thanks alot for the pointer; you are right, some very interesting ideas in
that.

Having already tried a few products in the whole logging and monitoring space,
I do find it is pretty challenging to operate here. I realise having paid
solution can be a barrier ... unless there are compelling benefits, which is
what I'm trying to define.

------
brudgers
The solution space for a cross logging/monitoring systems tool has a non-
polynomial number of permutations. The permutations of _(Splunk, Pingdom,
Elasticsearch, Icinga, Prometheus, etc., etc.)_ is at least 6! or 720.

Each is a snowflake. Each will change independently over time as each product
changes. Each user will have different configurations and use cases. There is
a combinatorial explosion.

It is an interesting problem. But there isn't likely to be a general solution.

~~~
El_Mo
Thanks for the comment. It is true there is the possibility of too many
permutations. I suppose the other way of approaching this is to let the tools
do what they do, but export or forward on specific data that such a cross-
logging/monitoring system could deal with for very specific use cases, like
those I mentioned. It is these use cases I'm looking to validate.

~~~
stympy
Check out [https://bigpanda.io](https://bigpanda.io) \-- it might be in line
with what you are thinking.

------
twunde
Practical Monitoring, just released by OReilly discusses this same problem. If
you have a Safari Books subscription, it's definitely worth a read

------
SirLJ
I am building stock trading robots and regularly go trough the logs to make
sure they are working as designed and in the begging it was a lot of wasted
time, because I was logging everything I could think off... Begin lazy, I soon
decided to log only the critical stuff, so now it takes me only couple of
minutes per week, so the moral of the story is simply KISS :-)

------
stympy
You might find it useful to take a look at
[http://opentracing.io](http://opentracing.io).

~~~
El_Mo
Hey, that's quite cool ... I'm checking it out. Thanks!

