
Let’s talk about logging - jodooshi
http://dave.cheney.net/2015/11/05/lets-talk-about-logging
======
azinman2
Bah. Use cases are specific yet talked about as if they're universal.

Let's look at error. If you're running a web server, or something like a web
server, then when a request errors you don't want it to blow up the whole
server. You just want to log that error. Yet you may not care about the many
many other normal requests happening at the info level (and debug would be
even more info underneath). Thus now error is actually useful both in terms of
monitoring but also separating out from other requests that went through just
fine.

People say exceptions should blow up the world -- that's great when you're
debugging, but users don't typically want to keep reusing an app that crashes.
There are more graceful ways to degrade.

------
bb101
I disagree with doing away with the Warning log level. It's very useful for
errors that don't negatively impair the function of a program, yet should be
captured.

For example, converting a user input to an integer. You can provide safe
defaults if the value doesn't convert so it doesn't break the flow. As the
developer however, if you see too many warnings with conversion errors then
you may want to change the label of the form field, or provide some guidance
to make it more clear to the user.

~~~
qznc
"too many warnings" is a very vague condition.

For a similar example: The D compiler gives very few warnings and some people
want to remove those as well. The argument is that the compiler should not do
the job of static analysis tools, just because it is convenient or easy. A
compiler should either generate code or give an error.

If something is for developers only, make it debug-level.

However, in the server environment, there is also the IT/sysadmin role between
user and developer. It makes sense to provide a level specifically for them.
"warning" mostly fits that.

~~~
mattmanser
A good example is when you're dealing with third party APIs. They break, a
lot, without you really knowing why and often not really knowing _how_ they're
going to fall over (even google's, one of our reports using the Adwords API
fell over last week because it was returning 500s for an hour or so and then
just worked again).

So say you're doing an hourly sync, and it falls over, you want to log a
warning. It happens, no need to send out the fire engines.

If you're getting a warning every hour, that's "too many warnings".

------
perbu
I'm kind of dissapointed that the rest of the world didn't pick logging to a
circular memory buffer like varnish does. It's proven extremely useful. You'll
have a full firehose of information when you need it without sacrificing
performance. You can set up async processes that pick out the bits you need
and persist them to disk.

I suspect this isn't widely used because it relies on POSIX shared memory and
support for keeping data structures in SHM isn't widely available.

~~~
deweerdt
> I'm kind of dissapointed that the rest of the world didn't pick logging to a
> circular memory buffer like varnish does. It's proven extremely useful.
> You'll have a full firehose of information when you need it without
> sacrificing performance

Aren't you incurring a cost for logging for stuff that would end up being
thrown out?

~~~
perbu
Sort of. But logging to memory is more or less free in terms of resource
costs. You can easily log one million lines per second to memory without your
computer getting bogged down.

This also removes the rather heisenbergian issue of turning on and off debug
logging. In a lot of application turning on debug logging is a surefire way of
making sure any race conditions won't occur.

It also means that logging (to disk) will not occur in the performance
critical bits of the code. The logging will be fully asynchronous as
completely different process will be given the task of persisting the logs.

------
marcus_holmes
I think every starting Go coder snorts in disgust and writes their own logging
lib. Then as they get wiser and more experienced in the "Go way" they take
another look at the std lib and start to realise the inherent wisdom of the
log package. Finally they throw away their over-featured logging lib and
refactor everything back to the vanilla log. It's like a zen progression of
understanding or something

~~~
epoch1970
No, it's just an inability to make trade-offs, and to live with the
consequences of said trade-offs. A month after going back to the vanilla
logging they'll realize that they actually do need more flexibility, and back
to the custom logging they'll go.

------
zamalek
In terms of part of the solution, ETW[1] is pretty fantastic. It's blindingly
fast, strongly typed and rich tooling exists for it (e.g. you can correlate
events with performance counters, possibly saving yourself a profiling
session). It's not a turn-key logging solution, it's simply a great place to
queue your messages to.

We don't [yet] use it as part of our logging framework, but recently I used it
in a pretty interesting way. We had to make some changes to how we interact
with SQL to handle transient faults on the cloud (specifically SQLAzure). Big
risky change. In order to guarantee that we didn't mess anything up I sent
events to ETW from our data access layer. A separate process would then listen
for those ETW events and resolve the stack trace outside of the primary
process (something that ETW will do for you). Each stack frame was then sent
to a primary server. This allowed us to do _very_ fast code coverage analysis
on QA environments for additional guarantees beyond unit tests.

Without ETW this would have taken me a week instead of day - it's some really
exciting tech. I'd love to see something analogous to ETW in Linux; it has
made me realize that textual logging is, frankly, unacceptable.

[1]: [https://msdn.microsoft.com/en-
us/library/ms751538.aspx](https://msdn.microsoft.com/en-
us/library/ms751538.aspx)

~~~
windowsworkstoo
Absolutely. ETW is fantastic - the tooling isn't great but it is a fantastic
bit of tech.

As an aside, you're comment about textual logging points to a deficiency
generally about how developers do logging - with text. ETW forces you to
allocate Event IDs and structure to your log events so that logs can a) be
consistent and b) can be easily (i.e without regex magic) monitored and
alerted on, summarised and generally reasoned about.

If you are on a platform with ETW, you should use it. If not, you should still
use some of its principles - i.e Event IDs and structured data.

------
dllthomas
_" I believe that an overwhelming proportion of items logged at error level
are simple done that way because they are related to an error."_

I think that is a desirable property. When something broke, there's a good
chance that those comparatively few lines tell me what and why, and if I'm
really lucky give me enough info to reproduce the problem without diving into
the more verbose logs (which might not even be generated/captured, if
performance concerns preclude).

------
halayli
I have a feeling this person never ran any production environment, nor spent
time debugging.

------
arethuza
"Nobody reads warnings, because by definition nothing went wrong."

That's a rather bold assertion to make - I look at warnings in logs and
approve of their inclusion. Sometimes you want a request to work even when
some parts of it aren't quite as expected and in that case the right thing to
do is log warnings that something is generating requests that are a bit off
and it might be worth investigating _that_ application to find out what is
going on:

"be liberal in what you accept from others"

But log as warnings when you are being liberal... :-)

------
tjholowaychuk
IMO unstructured is more or less useless in production, most logs cannot be
easily parsed, and it influences you to omit useful information. Plus who
wants to write dozens of parsing formats :D

~~~
justinsaccount
Absolutely.. The worst part of logging is that the apis are broken and often
use things like 'sprintf' type formatting.

sprintf is lossy serialization.

My favorite example is ssh. Thousands of man hours have likely been wasted
writing parsers for ssh logs that are generated from code like this:

    
    
        authlog("%s %s%s%s for %s%.100s from %.200s port %d %s%s%s",
            authmsg,
            method,
            submethod != NULL ? "/" : "", submethod == NULL ? "" : submethod,
            authctxt->valid ? "" : "invalid user ",
            authctxt->user,
            get_remote_ipaddr(),
            get_remote_port(),
            compat20 ? "ssh2" : "ssh1",
            authctxt->info != NULL ? ": " : "",
            authctxt->info != NULL ? authctxt->info : "");
    
    

Which if the apis were better could have just looked like

    
    
        authlog({
            "authmsg": authmsg,
            "method": method,
            "submethod": submethod,
            "valid": authctxt->valid,
            "user": authctxt->user,
            "addr": get_remote_ipaddr(),
            "port": get_remote_port(),
            "ver": compat20 ? "ssh2" : "ssh1",
            "info": authctxt->info});
    

(obviously C would need something a bit different)

The updated(2009) syslog RFC supports structured data, but it's been 6 years
and nothing uses it:

[https://tools.ietf.org/html/rfc5424#section-6.3](https://tools.ietf.org/html/rfc5424#section-6.3)

------
AYBABTME
I think in this age, the problem of people is not that they log with too many
levels: it's that they still `printf` log. I'm always surprised how so few are
familiar with structured logging. I'm trying to spread the word:

Use Structured Logging!

[https://docs.google.com/presentation/d/1SnjZpcfJq9r6OpgsjsX9...](https://docs.google.com/presentation/d/1SnjZpcfJq9r6OpgsjsX9ExCFTQgpXsSjD
--Y3PPDiAQ/edit?usp=sharing)

------
mekazu
If your log level is at INFO it doesn't matter if you log errors, warnings or
fatal: They'll be printed the same way as info, and they give you somewhere to
start grepping.

------
bunnymancer
Our Service Monitor disagrees with the notion that log.error is useless..

------
SeriousM
I don't know if you're just ignorant or lack of experience...

~~~
bb101
I've seen this a lot with our junior developers who are just out of college.
My take on it is that the oversharing generation is encouraged to "put
something out there" and make it sound authoritative. Then it is discussed and
either supported or shouted down.

Unfortunately the blog posts stick around for a lot longer than they ought to,
making it very hard for Googling newbies to decide whether they can trust the
advice or not.

Reminds me of the old adage "the more you know, the less you know". Works in
reverse!

~~~
nothrabannosir
Dave Cheney is not exactly a junior developer. He's a prominent member in the
Go community, often considered a member of the core Go team.

I disagree with his post, too, but to brush it off as (or even relate it to)
inexperience and junior devs is slightly out of place.

~~~
SeriousM
To be honest, I judge someone on the things he/she does (write), not on the
position he/she holds. The blog post sounded like it's written by an
unexperienced guy, someone that lives in a perfect bugfree world or who didn't
dealt with long running projects yet.

