Hacker News new | past | comments | ask | show | jobs | submit login

I had the impression, logs and metrics are a pre-observability thing.



I've never heard the term "pre-observability", what does that mean?


The era when "debugging in production" wasn't standard.


Observability is about logs and metrics, and pre-observability (I guess you mean the high-level-only records simpler environments keep) is also about logs and metrics.

Anything you register to keep track of your environment has the form of either logs or metrics. The difference is about the contents of such logs and metrics.


When I read Observability Engineering, I got the impression it was about long events and tracing, and metrics and logs were a thing of the past people gave up on since the rise of Microservices.


> Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to improve upon what you're doing today, and provide practical dos and don'ts for migrating from legacy tooling, such as metrics, monitoring, and log management. You'll also learn the impact observability has on organizational culture (and vice versa).

No wonder, it's either strong bias from people working in a tracing vendor, or outright a sales pitch.

It's totally false though. Each pillar - metrics, logs and traces have their place and serve different purposes. You won't use traces to measure the number of requests hitting your load balancer, or the amount of objects in the async queue, or CPU utilisation, or network latency, or any number of things. Logs can be more rich than traces, and a nice pattern I've used with Grafana is linking the two, and having the option to jump to corresponding log lines from a trace which can describe the different actions performed during that span.


While I was at Google, circa 2015-2016, working on some Ad project, and happened to be on-call our system started doing something wonky, so I think I either called the SRE on the sub-system we were using (spanner? something else - don't remember) to check what's up (as written by our playbook).

They asked me to enable tracing for 30s (we had Chrome extension, that sends some URL common parameter that enables in your web server full tracing (100%) for some short amount of time), and then I did some operations that our internal customers were complaining.

This produced quite a hefty tracing, but only for 30secs - enough for them to trace back where the issue might be coming from. But basically end-to-end from me doing something on the browser, down to our server/backend, downto their systems, etc.

That's how I've learned how important it is - for cases like this, but you can't have 100% ON all the time - not even 1% I think...


Oh yeah, tracing can be extremely useful, precisely because it should be end to end.

As for the numbers, that's why all tracing collectors and receivers support downsampling out of the box. Recording only 1% or 10% of all traces, or 10% of all successful ones and 100% of failures is a good way of making use of tracing without overburdening storage.


You can sorta measure some of this with traces. For example, sampled traces that contain the sampling rate in their metadata let you re-weight counts, thus allowing you to accurately measure "number of requests to x". Similarly, a good sampling of network latency can absolutely be measured by trace data. Metrics will always have their place, though, for reasons you mention - measuring cpu utilization, # of objects in something etc. Logs vs. traces is more nuanced I think. A trace is nothing more than a collection of structured logs. I would wager that nearly all use cases for structured logging could be wholesale replaced by tracing. Security logging and big object logging is an exception, although that's also dependent on your vendor or backend.


> metrics and logs were a thing of the past people gave up on since the rise of Microservices

Definitely not the case, and, in fact, probably the opposite is true. In the era of microservices, metrics are absolutely critical to understand the health of your system. Distributed tracing is also only beneficial if you have the associated logs - so that you can understand what each piece of the system was doing for a single unit of work.


> Distributed tracing is also only beneficial if you have the associated logs - so that you can understand what each piece of the system was doing for a single unit of work.

Ehhh, that's only if you view tracing as "the thing that tells me that service A talks to service B". Spans in a trace are just structured logs. They are your application logging vehicle, especially if you don't have a legacy of good in-app instrumentation via logs.

But even then the worlds are blurring a bit. OTel logs burn in a span and trace ID, and depending on the backend that correlated log may well just be treated as if it's a part of the trace.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: