
The Five Levels of Logging - detay
https://www.aib42.net/article/five-levels-of-logging
======
js8
I increasingly feel that there should be two separate logs, one for the user
with the Error/Warning/Info information (what the user wanted to be done and
how well it went) and the other for the developer with the Debug/Trace
information (capture key information about what the program was doing,
especially on the interface - what files it was looking for, what database
calls it did etc.).

I always get complaints from users that there is too much useless stuff in the
logs through which they need to go. I also think that the developer log should
be structured differently, possibly to be easily read by some automation.

~~~
exabrial
An "audit log" or "branch log", what business decisions were made, and a
"debug log", what params drove said decision

------
testplzignore
Info: What is the program up to. Occasional messages that indicate it is alive
and doing what is expected.

Warn: Problems that you know about and wish would not happen, but do anyways.
A service call has failed, unexpected input that violates the contract of the
method, etc. You may be handling these conditions gracefully and/or returning
an error to the caller, but still want to know when they happen. Usually you
should have a ticket open in your bug tracker for these.

Error: Problems that you do not know about. When an error is logged, a
developer should look at it. Some kind of action should be taken immediately.
Put the fire out if it is an emergency. If not an emergency, create a ticket
in your bug tracker, then change the level to warn.

I keep warn and info separate so that I can easily grep/filter on one. I often
want to just see info messages to know what the status of things are.

~~~
jandrese
IMHO the difference between Warn and Error is that Warn is recoverable while
Error is not.

~~~
hinkley
In an enterprise setting, treating one level of logs as _actionable_ , as GP
mentions, is a very important for clarifying responsibilities.

If it takes the right pair of eyes on the logs to see a problem for what it
is, the whole process slows down and doesn't scale as high. All errors and
very chatty warnings should induce me to ask someone what's going on, to make
sure it hasn't slipped through the cracks. And they should _thank me_ for
pointing it out. Otherwise you reinforce some bad patterns.

------
exabrial
Talking about logging got me a rejection at Google. The very kind 22 year old
interviewing me asked what my thoughts on commenting code was and I said most
every time it's used as an excuse to show off a piece of overly clever code,
so normally I discourage it in favor of logging, as they are code comments you
can use in production. I realize now that was my first graybeard moment.

~~~
paulddraper
Without diving too far down the comment rabbit hole (complex but efficient
algorithms, external references, legal retirements, backwards compatibilty),
that's not true though.

A comment is attached to code. A log is attached to an event.

Conflating those will either give you very bad logs or very bad comments.

------
ath0
I support these 5 levels of logging, but given the discussion below I also
would like to highlight a lesson from an SRE talk I once attended[citation
needed] that really there are only three levels of alert:

1\. DEBUG - Look at this when debugging, otherwise no one will ever see it.
2\. TICKET - Open a ticket and have someone on your team look at this when it
gets to the top of your queue. 3\. PAGER - Drop what you're doing and handle
this, now.

Of course, you want to track INFO and ERROR type messages because a sufficient
number of them might cause someone to raise a ticket... but at scale, you
probably should've built that monitoring already, and just drop INFO/ERROR
down to DEBUG.

~~~
pfranz
It's nice to have articles like this to give structure to the conversation,
but there's no single solution. Python has default levels of; debug, info,
warning, error, and critical, but doesn't really give any guidance to what
each means. At one company we tried to create some structure, but 3rd party
libraries would flood messages at the "wrong" logging level.

I like these 5-levels and the 3-levels you talk about seem like a paired down
version that reenforces what each level means.

To illustrate the "no single solution," we tried to use logging for command
line output, neither of those systems really has a level for that. We probably
should have created a new level between INFO and WARNING because of chatty
libraries.

In practice I've found the DEBUG and TRACE are usually best added to code when
troubleshooting. When you're writing the code you don't have a good idea what
info you'd need when troubleshooting, so it's overly verbose but still missing
what you'd want. When trying to stash all logging either locally or centrally
(because adding logging and re-running isn't always an option) it can affect
performance.

~~~
jschwartzi
I think a good log system should give you the ability to store everything but
filter quickly. So for example I'll set up my systems to store everything down
to LOG_DEBUG because I might want that information. But when I'm actually
looking at the logs I like being able to exclude the DEBUG messages and look
at the general behavior, then drill down when I think I understand what the
sequence of events is. Good log systems should support this without making me
filter as I'm storing the data.

Also in UNIX systems I wish there was a way to generically pipe the output of
a command to the system log while preserving the return code in case the
command exits. In BASH you can configure the pipe behavior to propagate errors
to the calling shell, but in normal shells the last return code is the one
that gets stored, which is not usually what you want. It would be really cool
if the logger utility captured return codes from upstream and returned them so
that I can use it in the antiquated SystemV init scripts I have to maintain.

------
tzs
The article is a bit confusing, because the title says 5, the article gives 6
(Error, Warning, Info, Debug, Trace, and Fatal), but the article casts doubt
on whether either of those is right when it says this:

> I would like to talk about the five (or six, or four) that I find are the
> most crucial, the fundamental set to base your own levels (and your
> thinking) on.

Which are the 2 from the 6 given whose status as levels of logging is in
doubt?

~~~
Jarwain
He orders it in terms of relevance too. Note that the last two levels he
states are Trace and Fatal. He talks about how Trace may not always be
enabled, or how Fatal doesn't necessarily make sense if such an error needs to
be handled manually anyways.

------
Dowwie
Debug: If you can read this, you are standing too close to the program.

------
xtracto
This article reminded me of a writing that an ex-colleague of mine did at a
previous company: [https://nerds.kueski.com/better-
logging/](https://nerds.kueski.com/better-logging/)

Everybody developer knows that they should include logging in their systems,
however _how_ and _what_ to log is not something so well documented.

When you start working in a system (i.e. at a startup) you put the odd log
here and there, without too much attention. However, as your system(s) starts
growing, you have to start paying attention at your logs' structure and
semantics, so that they are more useful in ELK, Sumologic and similar log
searching engines.

~~~
pfranz
What's nice about something like Elasticsearch is if you categorize your
logging levels you can more easily drop logs of a certain level at varying
times. Debug/verbose/trace is a lot less useful after, say, 24hrs and takes up
a lot of space while Fatal errors should be rare and probably want to be
tracked year-over-year or between releases.

------
blaisio
I really believe only two levels are necessary most of the time: error, and
info. Error is for anything that a dev should be notified about (AKA if you
send an error log message, assume someone will be notified about it
immediately), and info is for any other useful debugging information, like "we
got a request for /foo/bar at 10 pm". I also think structured logging is
absolutely essential. Log levels are really just a way to get some of the
benefits of structured logging.

------
grkvlt
I was _really_ hoping this might continue with 'six, if you work with IoT
devices' or something like that, as in the Larry Miller[0] routine.[1] Never
mind, I'll leave the links here anyway...

0\.
[https://en.wikipedia.org/wiki/Larry_Miller_(comedian)](https://en.wikipedia.org/wiki/Larry_Miller_\(comedian\))

1\. [https://youtu.be/eceSHKqwUPY?t=18](https://youtu.be/eceSHKqwUPY?t=18)

------
kripke
> Connecting to server:4242 as "user":"hunter2"

as an example of DEBUG logging is very wrong. Passwords should not end up in
your logs, ever.

~~~
jschwartzi
What password? All I see is * * * * * * *.

~~~
sokoloff
Hah. Context for others: [http://bash.org/?244321](http://bash.org/?244321)

------
jdmichal
I have always used them as follows. Deciding between DEBUG and INFO is the
most difficult. INFO, WARN, and ERROR are simple decisions.

DEBUG: Information which is relevant for tracing activity and data through the
program.

INFO: Important workflow decisions made within the normal operation of the
program. Allows you to tell what path a particular operation flowed down, but
not with tracing every activity.

WARN: Non-fatal errors which do not prevent the operation from completing.
Example: Failing to close a file after having read from it.

ERROR: Fatal errors which prevent the operation from completing. Example:
Failing to open the file or reading from it.

~~~
delinka
What about _FATAL_ errors which prevent the entire program from continuing?

~~~
jdmichal
Error and fatal are the same thing. If an operation fails and cannot be
recovered, that's an error. There is no meaningful difference between an
operation failing and the entire program failing. Or, from another
perspective, keeping the program running is just another operation that can
have fatal or non-fatal errors.

~~~
KayEss
That depends on the nature of the program. We use error and critical for
server software. If there is a protocol violation on a particular connection,
log an error and drop the connection, but otherwise continues. If the `accept`
system call fails that's critical and the program shuts down. Depending on
what we're doing there may be other critical operations, but some software
doesn't have any.

------
dionian
trace: used for visualizing flow of program, extremely noisy and never on prod

debug: low level flow of program, inputs/outputs

info: what I want to see at runtime in prod, high level flow of program,
details about operation so I know what's going on if I want to look

warn: things that may indicate errors or other unexpected situations to be
aware of that may be relevant to problems elsewhere

error: something serious or unexpected that probably indicates a defect..
issue that needs to be looked into... unhandled/unexpected state

~~~
ohazi
trace: printf debugging

------
sagichmal
Counterpoint: [https://dave.cheney.net/2015/11/05/lets-talk-about-
logging](https://dave.cheney.net/2015/11/05/lets-talk-about-logging)

tl;dr: Only Info and Debug levels carry their weight.

~~~
philwelch
This works at a very, very small scale where you are reading your logs by
hand. At this point, you don't even need log levels.

In the real world, you're not going to read your logs unless you're trying to
investigate an issue. Instead, you're going to monitor your logs. And this
kind of monitoring is where WARN and ERROR become helpful.

WARNs are useful because if you run into a WARN-level issue every now and
again, it's no big deal, but if you run into a ton of them, that might
indicate a problem that's worth fixing. A single WARN log isn't that useful,
but a statistically significant number of WARN logs is a monitorable metric.
Being able to detect and address issues after they're serious enough to cause
elevated WARNs but before they're serious enough to cause customer impact is
worthwhile.

ERRORs are useful because then you have a one-size-fits-all way of monitoring
for error conditions, if for example, it's an offline job that doesn't result
in a 5xx response. You should still monitor for 5xx responses, pipeline job
failures, non-zero exit codes from processes, or whatever your application-
level "this didn't end well" signal is. But ERROR logging is also a good
metric to have if you want to have multiple log-based metrics all in one
place.

~~~
sagichmal
If you're parsing your logs with a machine for triage like you describe, you
should be doing structured logging, not string logging.

~~~
philwelch
And the structure will include a log level to enable this kind of
classification.

~~~
sagichmal
Sure, but then we're not talking logger.Info: it's a difference in kind.

------
hota_mazi
I've always preferred levels of logging rather than these arbitrary names
(what's the difference between trace and debug, and which one is lower
level?).

Give me three levels of logging, 1, 2, and 3, and then warning, and error.
Maybe allow to mix and match them, if this makes sense (the typical
nomenclature makes them all mutually exclusive, which is dumb).

There are your five levels.

~~~
delinka
Depending on your Coding Culture, you order them based on "priority" (FATAL
would have priority over all others, and therefore numbered as 1), or
"importance" (FATAL would be most important, and therefore numbered 5.)

Alternatively, each is a flag in a number so that they can be combined, with
FATAL being 1 or 16 depending on priority vs. importance.

I would argue that the names do indeed suffice to tell you their "level" \-
TRACE is the noisiest, least important, least priority level. DEBUG is next,
followed by INFO, WARN, ERROR, FATAL.

~~~
zoomablemind
Curiously, in the days of OpenVMS this was called 'severity' level and was a
global system concept [0] related to status code of execution.

Info, Warning, and Error are degrees of Success or Fatal/Severe.

The status would then be logged in a structured manner that shows a facility
code, serverity, message.

0\.
[https://h41379.www4.hpe.com/wizard/wiz_3582.html](https://h41379.www4.hpe.com/wizard/wiz_3582.html)

------
dstick
Thanks for this! Was just about to start implementing a logging subsystem
myself so the timing is perfect :)

------
cwingrav
I'm going to weigh in on my opinionated library for logging: [woveon-
logger]([https://www.npmjs.com/package/woveon-
logger](https://www.npmjs.com/package/woveon-logger)) (js). Sorry if that's a
no-no, but nobody seems to use my much loved library anyway.

The multiple levels are nice, especially for logging errors in production and
tracking general information of your software. But in development, logging is
often about tracking a particular aspect in your code. You need something that
logs to a particular _aspect_ that you are working on, and then can turn off
that _aspect_ once you move on. In the future, you often find yourself back to
debug and need to flip that _aspect_ back on again.

Summary: Log levels of info, warn and error all have a place. Logging to
aspects is valuable during development, easy to turn off for production and
useful when returning to debug.

~~~
hinkley
Logging is one of the weirder cross cutting concerns, for many reasons but
especially including the one you mention.

We had to roll back an upgrade last month because one nonfatal (but still
pretty bad) problem was generating a ridiculous amount of log entries. We had
no way to turn it off or throttle it, so we rolled back.

I have fantasized for years about a programming language with a small runtime
that supports instrumentation of the code, not just for debugging and
profiling, but for logging as well. Logging becomes a set of conditional
breakpoints stored with or at least near the code and you can tweak them
individually.

Something akin to applying the 80/20 rule to Aspect Oriented Programming.

We fake this by implementing logging frameworks that try to short circuit out
cheaply. But it really only pays off if you can marshal the arguments to the
logger very cheaply, and you still have problems with third party code, which
can't know that you've got workarounds for problems they consider to be fatal,
and are therefore so noisy at the WARN or INFO log levels that you can't see
your own log entries (Apache Foundation, I'm looking at you).

~~~
Rapzid
Check out log points:
[https://code.visualstudio.com/docs/editor/debugging#_logpoin...](https://code.visualstudio.com/docs/editor/debugging#_logpoints)

~~~
hinkley
It's buried but my IDE has a similar feature.

I meant support for running systems, not local development.

~~~
Rapzid
"Logpoints are especially useful for injecting logging while debugging
production servers that cannot be paused or stopped."

There is a lot more going on with them than you might realize. Not sure how
coupled it is to the nodejs/kubernetes/Azure stuff they have built into
VSCode, but it seems like a working example of the future. Sorta like time
travel debugging in node-chakracore.

Windows has this facility called EWT that lets you setup all sorts of
eventing/tracing that is actually not present(the calls) in the code unless
the EWT facility injects the code necessary to enable it into the running
process. This allows you to dynamically enable and disable event sources based
on stuff like.. whether or not something is currently subscribing to it. Worth
looking into just for the sheer amazement of it if you are not a Windows
developer and familiar with it.

------
stephengillie
Netscaler logging deserves its own level of hell. You're given 15 undocumented
logging levels, with the suggestion not to leave it running longer than
necessary, and that level 11 should be enough for anybody.

------
nwmcsween
The issue with logging levels is they only convey basic information through
the levels and the rest through the actual message which results in the mess
that is log processing.

------
yacse
Logging is very important because it also allows you to debug/resolve issues
easily on production systems. But caring about the appropriate log levels and
the ability to configure them at runtime is also super important. It's just
insane and ridiculous to log everything on a trace/debug level and stream it
to the Expensive Innovative Big Data Machine Learning Smart Log Analytics
Product. Yes, I'm looking at you Application Insights.

