I always get complaints from users that there is too much useless stuff in the logs through which they need to go. I also think that the developer log should be structured differently, possibly to be easily read by some automation.
Warn: Problems that you know about and wish would not happen, but do anyways. A service call has failed, unexpected input that violates the contract of the method, etc. You may be handling these conditions gracefully and/or returning an error to the caller, but still want to know when they happen. Usually you should have a ticket open in your bug tracker for these.
Error: Problems that you do not know about. When an error is logged, a developer should look at it. Some kind of action should be taken immediately. Put the fire out if it is an emergency. If not an emergency, create a ticket in your bug tracker, then change the level to warn.
I keep warn and info separate so that I can easily grep/filter on one. I often want to just see info messages to know what the status of things are.
If it takes the right pair of eyes on the logs to see a problem for what it is, the whole process slows down and doesn't scale as high. All errors and very chatty warnings should induce me to ask someone what's going on, to make sure it hasn't slipped through the cracks. And they should thank me for pointing it out. Otherwise you reinforce some bad patterns.
If the server process handles it without crashing, is that a recovery?
A more general distinction IMO is the one presented in the article: an Error is something that should not have happened. A Warning is something that maybe should not have happened.
1. DEBUG - Look at this when debugging, otherwise no one will ever see it.
2. TICKET - Open a ticket and have someone on your team look at this when it gets to the top of your queue.
3. PAGER - Drop what you're doing and handle this, now.
Of course, you want to track INFO and ERROR type messages because a sufficient number of them might cause someone to raise a ticket... but at scale, you probably should've built that monitoring already, and just drop INFO/ERROR down to DEBUG.
I like these 5-levels and the 3-levels you talk about seem like a paired down version that reenforces what each level means.
To illustrate the "no single solution," we tried to use logging for command line output, neither of those systems really has a level for that. We probably should have created a new level between INFO and WARNING because of chatty libraries.
In practice I've found the DEBUG and TRACE are usually best added to code when troubleshooting. When you're writing the code you don't have a good idea what info you'd need when troubleshooting, so it's overly verbose but still missing what you'd want. When trying to stash all logging either locally or centrally (because adding logging and re-running isn't always an option) it can affect performance.
Also in UNIX systems I wish there was a way to generically pipe the output of a command to the system log while preserving the return code in case the command exits. In BASH you can configure the pipe behavior to propagate errors to the calling shell, but in normal shells the last return code is the one that gets stored, which is not usually what you want. It would be really cool if the logger utility captured return codes from upstream and returned them so that I can use it in the antiquated SystemV init scripts I have to maintain.
A comment is attached to code. A log is attached to an event.
Conflating those will either give you very bad logs or very bad comments.
> I would like to talk about the five (or six, or four) that I find are the most crucial, the fundamental set to base your own levels (and your thinking) on.
Which are the 2 from the 6 given whose status as levels of logging is in doubt?
Everybody developer knows that they should include logging in their systems, however how and what to log is not something so well documented.
When you start working in a system (i.e. at a startup) you put the odd log here and there, without too much attention. However, as your system(s) starts growing, you have to start paying attention at your logs' structure and semantics, so that they are more useful in ELK, Sumologic and similar log searching engines.
as an example of DEBUG logging is very wrong. Passwords should not end up in your logs, ever.
DEBUG: Information which is relevant for tracing activity and data through the program.
INFO: Important workflow decisions made within the normal operation of the program. Allows you to tell what path a particular operation flowed down, but not with tracing every activity.
WARN: Non-fatal errors which do not prevent the operation from completing. Example: Failing to close a file after having read from it.
ERROR: Fatal errors which prevent the operation from completing. Example: Failing to open the file or reading from it.
A request that causes a 500 error response to the client, should also produce an Error log output on the backend. The request crashed. But, because this is an actor-based system, that crash didn't affect anything other than the request. It was Fatal to that interaction, but not to the system.
A request that causes the entire server process to spontaneously halt, should generate a Fatal log message. That is never supposed to be something that happens. The system is built, on a number of levels, to ensure that it doesn't happen. But it'd be really great, if it does happen, to have enough information to know why.
Ideally, a Fatal log message is accompanied by a crash dump.
debug: low level flow of program, inputs/outputs
info: what I want to see at runtime in prod, high level flow of program, details about operation so I know what's going on if I want to look
warn: things that may indicate errors or other unexpected situations to be aware of that may be relevant to problems elsewhere
error: something serious or unexpected that probably indicates a defect.. issue that needs to be looked into... unhandled/unexpected state
tl;dr: Only Info and Debug levels carry their weight.
In the real world, you're not going to read your logs unless you're trying to investigate an issue. Instead, you're going to monitor your logs. And this kind of monitoring is where WARN and ERROR become helpful.
WARNs are useful because if you run into a WARN-level issue every now and again, it's no big deal, but if you run into a ton of them, that might indicate a problem that's worth fixing. A single WARN log isn't that useful, but a statistically significant number of WARN logs is a monitorable metric. Being able to detect and address issues after they're serious enough to cause elevated WARNs but before they're serious enough to cause customer impact is worthwhile.
ERRORs are useful because then you have a one-size-fits-all way of monitoring for error conditions, if for example, it's an offline job that doesn't result in a 5xx response. You should still monitor for 5xx responses, pipeline job failures, non-zero exit codes from processes, or whatever your application-level "this didn't end well" signal is. But ERROR logging is also a good metric to have if you want to have multiple log-based metrics all in one place.
I think he is overthinking it a bit. Yes there should be very little places where an error is actually logged, yet is does not mean that it should not exist.
In web applications you very often have one instance of:
logger.exception('Something bad happened')
The real problem is that people still associate logs with lines of text in a file, whereas most logging frameworks create full objects containing a ton of information. Once you stop considering logs as text, exception tracking and logging really become the same thing.
Good program architecture in 2018 demands some degree of compromise. In my experience it's best to separate console logging (as in this article) from structured logging, from metrics, from distributed tracing, from exception tracking.
Perhaps a name change is in order; Event Tracing for Windows anyone? Eventing.. Event Tracing. IDK.
- Debug: should not show up in normal logs but you can turn it on down to a specific log message in a running process without restarting. We use structured logging so we have the additional ability to turn on debug messages down to a specific key=value pair (or set of key=value pairs).
- Info: An interesting event occurred that someone might care about. This could be an error that we were able to handle or just some important event.
- Alert: An error case that should not happen has occurred and someone needs to look into it. In addition to logging the event, an external system should be notified (e.g. create a ticket, page someone, send an email, etc).
I always turn on the highest log level unless there is a very strong performance reason against doing so. I have been able to resolve a lot of issues over the years because I had good logging when something went wrong.
I don't really see alerts as logging. They are an integral part of the application's behavior.
Give me three levels of logging, 1, 2, and 3, and then warning, and error. Maybe allow to mix and match them, if this makes sense (the typical nomenclature makes them all mutually exclusive, which is dumb).
There are your five levels.
Alternatively, each is a flag in a number so that they can be combined, with FATAL being 1 or 16 depending on priority vs. importance.
I would argue that the names do indeed suffice to tell you their "level" - TRACE is the noisiest, least important, least priority level. DEBUG is next, followed by INFO, WARN, ERROR, FATAL.
Info, Warning, and Error are degrees of Success or Fatal/Severe.
The status would then be logged in a structured manner that shows a facility code, serverity, message.
The multiple levels are nice, especially for logging errors in production and tracking general information of your software. But in development, logging is often about tracking a particular aspect in your code. You need something that logs to a particular aspect that you are working on, and then can turn off that aspect once you move on. In the future, you often find yourself back to debug and need to flip that aspect back on again.
Summary: Log levels of info, warn and error all have a place. Logging to aspects is valuable during development, easy to turn off for production and useful when returning to debug.
We had to roll back an upgrade last month because one nonfatal (but still pretty bad) problem was generating a ridiculous amount of log entries. We had no way to turn it off or throttle it, so we rolled back.
I have fantasized for years about a programming language with a small runtime that supports instrumentation of the code, not just for debugging and profiling, but for logging as well. Logging becomes a set of conditional breakpoints stored with or at least near the code and you can tweak them individually.
Something akin to applying the 80/20 rule to Aspect Oriented Programming.
We fake this by implementing logging frameworks that try to short circuit out cheaply. But it really only pays off if you can marshal the arguments to the logger very cheaply, and you still have problems with third party code, which can't know that you've got workarounds for problems they consider to be fatal, and are therefore so noisy at the WARN or INFO log levels that you can't see your own log entries (Apache Foundation, I'm looking at you).
I meant support for running systems, not local development.
There is a lot more going on with them than you might realize. Not sure how coupled it is to the nodejs/kubernetes/Azure stuff they have built into VSCode, but it seems like a working example of the future. Sorta like time travel debugging in node-chakracore.
Windows has this facility called EWT that lets you setup all sorts of eventing/tracing that is actually not present(the calls) in the code unless the EWT facility injects the code necessary to enable it into the running process. This allows you to dynamically enable and disable event sources based on stuff like.. whether or not something is currently subscribing to it. Worth looking into just for the sheer amazement of it if you are not a Windows developer and familiar with it.