
Error Handling in Node.js - lsm
https://www.joyent.com/node-js/production/design/errors
======
latch
My non-node specific suggestions:

1 - Don't catch errors unless you can actually handle them (and chances are,
you can't handle them). Let them bubble up to a global handler, where you can
have centralized logging. There's a fairly old discussion with Anders
Hejlsberg that talks about this in the context of Java's miserable checked
exception that I recommend [1]. This is also why, in my mind, Go gets it
wrong.

2 - In the context of error handling (and system quality), logging and
monitoring are the most important thing you can do. Period. Only log
actionable items, else you'll start to ignore your logs. Make sure your errors
come accompanied by a date (this can be done in your central error handler, or
at ingestion time (via logstash or what have you)

3 - Display generic/canned errors to users...errors can contain sensitive
information.

4 - Turn errors you run into and fix into test cases.

[1] -
[http://www.artima.com/intv/handcuffs.html](http://www.artima.com/intv/handcuffs.html)

~~~
btschaegg
When it comes to 1.), a better way to state things is may be that you
shouldn't _ignore_ errors unless you were able to completely handle them.

Catching exceptions to throw exceptions with better messages is something I
would stronly suggest, since almost no exceptions are useful without
contextual information. I.e. _which_ file was not found? The config, not the
input or output. Things like this. This is especially useful in C++, where you
don't get stack traces, but also in other languages you'll want to present
non-technical (i.e. non-dev) users with meaningful messages. Stack traces will
just frighten them off.

Your comment on sensitive information also plays into this.

I'll agree that just swallowing errors and going on is a recipe for disaster.
This is something that regularly bugged me in most of the C code I've
encountered so far.

~~~
narag
_Catching exceptions to throw exceptions with better messages is something I
would stronly suggest_

Also not every user in the world understand English.

I catch exceptions in two levels. One is the user initiated action. I seldom
see this specified, but it seems obvious to me: if the user decides to do X,
either X is done or a clear and meaningful message is shown, explaining what
and why could not be done.

There is another finer grained location to do what you say (collecting
details), logging and then re-raising up to the other level.

Swallowing errors is evil and no, not limited to C code.

~~~
btschaegg
Interesting point. I have to confess that I've never seen logging done usable
in a i18n sense. Of course, for UI applications, you're absolutely right.

When it comes to logging details, you have a point. But I still think a clear
final error message is something to aspire to - especally if you're writing
very busy and potentially multithreaded (or - even worse - async) services.

On this note, filtering logs according to thread IDs can be very helpful here.
I wonder if that is also easily possible with "fibers".

> Swallowing errors is evil and no, not limited to C code.

Of course, nothing ever is - it's just where I've seen this thing the most
(highly anecdotal evidence, I know :-) ). Unfortunately, the nature of many C
APIs also makes it very non-obvious if you're skipping through the code,
whereas an empty catch statement stands out somewhat.

------
hueving
Giant post about the nightmare that is making robust code in node.js. Summary,
don't use a language in large projects that makes it so easy to leak errors
and exceptions. There's something to be said about the compiler forcing you to
declare what exceptions your code can throw to force you to think about this
stuff up front.

~~~
kbart
What better alternatives do you know? I don't know any programming language
for real world projects that would make error handling trivial.

~~~
wyager
Trivial? No. But you can at least make it better than the horrible nightmare
described in the OP. Monadic error handling in a strongly typed language gives
you a huge leg up on safely and sanely managing the complexity of error
handling, because it provides simple value-level error semantics and clear
type-level indications of exactly what kinds of errors you have to deal with
and how you have to deal with them.

------
akssri
For me, error handling has a major flaw: stack unwinding - extremely annoying
thing to happen when the program state took many many hours to achieve. I
don't think there is any language other than CL that allows restarts etc. to
be defined; slime-repl too is invaluable when debugging.

[http://www.gigamonkeys.com/book/beyond-exception-handling-
co...](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-
and-restarts.html)

~~~
chaitanya
Right! A long time back we came up with a really nice way of validating CSV
files using restarts. I wrote a bit about it:
[http://lisper.in/restarts](http://lisper.in/restarts)

~~~
lispm
Nice. One thing you might want to add. From reading you present two ways to
deal with restarts:

1) interactive debugger

2) programmatical

There is another one:

3) restart dialog

The program presents you a list of restarts, for example in a GUI dialog, and
the end user can select a restart - without interacting with a debugger.

The debugger is just one program, which may display the restarts.

That's how one used it in applications on a Lisp Machine. To call a debugger
could be an option in the list of restarts. For real real end users, even the
call to the debugger might not be available and all they can do is to choose
an option from the list of restarts. Symbolics offered something called
'Firewall', which did all it can to hide the underlying Lisp system from the
end user - here the end user should not interact with a debugger or Lisp
listener.

But even in a Lisp listener, if you used the 'Copy File' command you might get
a dialog shown with the typical options: abort, try again, use other file,
enter debugger, ...

~~~
chaitanya
Nice idea! I will add this suggestion.

------
Silhouette
Some of this looks like horrible advice, particularly the defeatist attitude
towards what the article calls "programmer errors". Statements to the effect
that you can never anticipate or handle a logic error sensibly so the only
thing you should ever do is crash immediately are hard to take seriously in
2016. What about auto-saving recovery data first? Logging diagnostic
information? Restarting essential services in embedded systems with limited
interactivity? This article basically dismisses decades of lessons learned in
defensive programming with an argument about as sophisticated as "It's too
hard, we should all just give up".

As others have already mentioned, much of the rest is quite specific to
Node/JS, and many of the issues raised there could alternatively be solved by
simply choosing a better programming language and tools. The degree to which
JS has overcomplicated some of these issues is mind-boggling.

~~~
spc476
What about auto-saving recovering data?

It really depends upon the language and environment used. I work with C
(almost legacy code at this point), and if the program generates a segfault,
there is _no_ way to safely store _any_ data (for all I know, it could have
been trying to auto-save recovery data when it happened). About the best I can
hope for is that it shows itself during testing but hey, things slip into
production (last time that happened in an asynchronous, event driven C
program, the programmer maintaining the code violated an unstated assumption
by the initial developer (who was no longer with the company) and program go
boom in production). At that point, the program is automatically restarted,
and I get to pour through a core dump to figure out the problem.

I'm not a fan of defensive programming as it can hide an obvious bug for a
_long_ time (I consider it a Good Thing that the program crashed otherwise we
might have gone months, or even years, with noticing the actual bug).

Logging is an art. Too little, and it's hard to diagnose. Too much and it's
hard to slog through. There's also the possibility that you don't log the
_right_ information. I've had to go back and amend logging statements when
something didn't parse right (okay, what are our customers sending us _now?_
Oh nice! The logs don't show the data that didn't parse---the things you don't
think about when coding).

And then there are the monumental screw-ups that no one foresaw the
consequences of. Again, at work, we receive messages on service S, which
transforms and forwards the request to service T, which queries service E. T
also sends continuous queries (a fixed query we aren't charged for [1]) to E
to make sure it's up. Someone, somewhere, removed the fixed query from E. When
the fixed query to E returned "not found," the code in T was written in such a
way that failed to distinguish "not found" with "timedout" (because that fixed
query should never have been deleted, right?) and thus, T shut down (because
it had nothing to query), which in turn shut down S (because it had nothing to
send the data to), which in turn meant _many people_ were called ...

Then there was the routing error which caused our network traffic to be three
times higher than expected _and_ misrouted UDP replies ...

Error handling and reporting is hard. Maybe not cache invalidation and naming
things hard, but hard none-the-less.

[1] Enterprise system here.

~~~
MaulingMonkey
> I'm not a fan of defensive programming as it can hide an obvious bug for a
> long time (I consider it a Good Thing that the program crashed otherwise we
> might have gone months, or even years, with noticing the actual bug).

I've had segfaults "hidden" for a long time because my artist coworkers
weren't reporting crashes in their tools. They assumed a 5 minute fix was
something really complicated. Non-defensive programming is no panacea here.
Worse, non-defensive programming often meant crashes well after the initial
problem anyways, when all sane context was lost.

My takeaway here is that I need to automatically collect crashes - and other
failures - instead of relying on end users to report the problem. This is
entirely compatible with defensive programming - right now I'm looking at
sentry.io and it's competitors (and what I might consider rolling myself) to
hook up as a reporting back end for yet another assertion library (since none
of them bother with C++ bindings.) On a previous codebase, we had an assert-
ish macro:

    
    
      ..._CHECKFAIL( precondition, description, onPreconditionFailed );
    

Which let code like this (to invent a very bad example) not fatally crash:

    
    
      ..._CHECKFAIL( texture, "Corrupt or missing texture - failed to load [" << texturePath << "]", return PlaceholderTexture() );
      return texture;
    

Instead of giving me a crash deep in my rendering pipeline minutes after
loading with no context as to what texture might be missing. Make it annoying
as a crash in your internal builds and it will be triaged as a crash. Or even
more severely, possibly, if simply hitting the assert automatically opens a
bug in your DB and assigns your leads/managers to triage it and CCs QA,
whoever committed last, and everyone who reviewed last commit ;)

> Logging is an art.

You're right, and it's hard. However. It's very easy to do better than not
logging at all.

And I think something similar applies to defensive programming. You want null
to crash your program? Do so explicitly, maybe with an error message
describing what assumption was violated, preferably in release too instead of
adding a possible security vulnerability to your codebase:
[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know_14.html) . Basically, always enabled fatal asserts.

This might even be a bit easier than logging - it's hard to pack too much
information into a fatal assert. After all, there's only going to be one of
them per run.

~~~
zeeg
Please, please, don't roll your own. It seems like an easy problem at a
glance, but its far from it. The more fragmentation in these communities the
worse off we all are. Sentry's totally open source, and we have generous free
tiers on the hosted platform. Happy to talk more about this in detail, but if
there's things you dont feel are being solved let us know.

~~~
MaulingMonkey
> Please, please, don't roll your own. It seems like an easy problem at a
> glance, but its far from it. The more fragmentation in these communities the
> worse off we all are.

I've rolled my own before, for enough of the pieces involved here, to confirm
you're entirely correct. There's a reason I'm looking at your tech ;)

> Happy to talk more about this in detail, but if there's things you dont feel
> are being solved let us know.

No mature/official C or C++ SDK. Built in support for native Windows and
Android callstacks would be great - I see you've already done some work for
handling OS X symbols inside the Cocoa bindings at least. Plus hooks to let me
integrate my own callstack collection for other platforms you haven't signed
the NDAs for (e.g. consoles) and whatever scripting languages we've embedded.

All the edge cases. I want to receive events:

* When my event reports a bug in my connection loss handling logic (requiring resending it later when the connection is restored.)

* When my event reports I've run out of file handles (requiring preopening files or thoroughly testing the error handling.)

* When I run out of memory (requiring preallocating - and probably reserving some memory to free in case writing a file or socket tries to allocate...)

* When I've detected memory corruption.

* When I've detected a deadlock.

Some of these will be project specific - because it's such an impossibly broad
topic that sentry's SDKs can't possibly handle them all.

No hard crash collection - this might be considered outside of sentry.io's
scope, though? It's also hideously platform specific to the point where some
of the tools will be covered by console NDAs again. Even on windows it's
fiddly as heck - I've seen the entire pipeline of configuring registry keys to
save .mdmp s, using scripts to use ngen to create symbols for the unique-per-
machine mscorlib.ni.dll and company - so you can resolve crashdumps with mixed
C++/C# callstacks - and then using cdb to resolve the same callstack in
multiple ways... it's a mess. I could still use the JSON API to report crash
summaries, though.

On a less negative note, I see breadcrumbs support landed in unstable for the
C# SDK.

EDIT: And then there's all the fiddly nice-to-haves, ease-of-use shorcuts,
local error reporting, etc. - some of which will also be project specific -
but rest assured, the last thing I want to do is retread the same ground that
sentry.io already covers. And where there are gaps, pull requests are one of
the easier options...

------
woodruffw
The main conclusion I drew from this is that node.js has three "standard" ways
to return/propagate an error, along with "traditional" methods (return code,
global errno, etc).

What's the deal? To someone who programs primarily in C and Ruby, this feels
like a tremendous complication of the normal programming process.

~~~
treve
Part of the problem in JS is that there's pretty much 2 classes of functions.
Asynchronous functions and synchronous functions. Both are extremely common.

Async/await solves this to some extent, because you can just go back to 1 way
of error handling, which is throwing and catching exceptions.

The third way (working with EventEmitter) is an odd pattern, but it's really
more for specialized use-cases. Wouldn't really call this standard. Imagine a
long-running operation that can occasionally broadcast that a non-fatal error
occurred.

A global error number is a terrible idea, and return codes are not just not
idiomatic.

So really there's just two: one for synchronous and one for asynchronous
operations.

You'd be in a very similar situation with C. I don't know C too well, but I
imagine that most asynchronous operations would be done with threads, and for
those operations you also can't just return an error code.

Does Ruby have concurrency or async primitives? I don't know it really well.
If it doesn't, it's also obvious why you wouldn't have this problem. If it
does, how do you handle exceptions in asynchronous operations? To me it seems
that Javascript, Ruby, C, PHP, Java are all pretty similar in these regards
and JS is not at all unique.

Go gets this right. The equivalent of this ES7 function call in javascript:

await foo();

In go is a straight up regular function call:

foo();

But not waiting for the result in javascript:

foo();

Is actually handled with the go keyword:

go func();

This, to me, is the major difference in the asynchronous model between Go and
Javascript. In javascript (with ES7) blocking is opt-in, in Go it's opt-out.
Go is by far the saner model for a programming language that relies heavily on
'green threads' / reactor pattern.

~~~
prodigal_erik
The trouble with "go foo()" is that it's fire-and-forget; foo's return value
is literally discarded. When you need to know what happened (which should be
nearly always), foo and every caller all have to opt-in to passing any result
and/or error and/or panic value over a channel or something. It's one of many
places where Go gives you tiny pieces of the right thing and makes you
assemble them yourself.

~~~
deoxxa
Either that or you wrap it up in function that makes a channel, calls the
function with it, then waits on that channel for the return value. Basically
you can go back/forth between async and sync(ish) in go much more easily than
in JavaScript.

In saying that though, if you have to do it a lot it probably means some of
those functions should have been synchronous in the first place.

------
Illniyar
Why the suggestion to use an error's name rather then instaneof and the
error's class?

~~~
prodigal_erik
Error.prototype.toString() reads e.name, not e.prototype.constructor.name, so
you can't rely on everyone to have subclassed Error.

[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Refe...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Global_Objects/Error/name)

~~~
Illniyar
I'm not following, why can't I use:

    
    
       e instanceof Error

or:

    
    
       e instanceof MyError
    

why does toString() have anything to do with this?

~~~
prodigal_erik
It's very likely someone did

    
    
      const e = new Error('bad stuff happened')
      e.name = 'MyError'
    

without actually creating a MyError class to check with instanceof.

~~~
Illniyar
Unless it's common in popular libraries/packages, I don't see why I need to
take it into account.

Which popular libraries do this?

If it's just in a few places, it should be handled specifically, and use sane
choices in other places.

------
cutler
It beats me why Node.js is anywhere near as popular as Elixir if real
concurrency and error handling are a priority. Is programming just a fashion
industry? What's popular certainly doesn't seem to have any connection with
engineering principles.

~~~
romanovcode
I blame non-technical managers who push "microservices" and "node js" because
they went to some conference and heard that it's "the best".

------
jeffmcmahan
If you're going to check every argument's type and throw on failure, either
use a statically typed language or adopt a concise way of type checking. Many
of the examples have big groups of assert() calls at the top. Gross.

------
morghus
Can anyone explain why this pattern doesn't work? Or point me to some
resource?

    
    
      function myApiFunc(callback)
      {
        /*
         * This pattern does NOT work!
         */
        try {
          doSomeAsynchronousOperation(function (err) {
            if (err)
              throw (err);
            /* continue as normal */
          });
        } catch (ex) {
          callback(ex);
        }
      }

~~~
mikevin
There's a footnote about this: [https://www.joyent.com/node-
js/production/design/errors#fn:1](https://www.joyent.com/node-
js/production/design/errors#fn:1)

~~~
morghus
much obliged

------
novaleaf
Forcing the party line of callback hell as a high quality "Production
Practice" is an incredible disservice by not introducing the user to the
concept of Promises. It already assumes a basic knowledge of exception
handling, so at least they should hint at what is a saner choice.

------
djanowski
I'm surprised such an in-depth article doesn't even mention promises. Upcoming
async/await (already available via transpilation) will make error handling in
Node sane again.

------
visionscaper
This article promotes the fail-fast approach, something I very much dislike
(against popular opinion it seems).

I'm very much in favor of the opposite approach, defensive coding. Often when
I read opinion pieces about how bad defensive coding is, they almost always
seem to forget that defensive coding without proper logging, error-handling
and monitoring is _NOT_ defensive coding. It is extremely dangerous to just
detect error conditions without any feedback: you have no idea what is going
on in your system!

IMHO properly applied defensive coding, works as follows:

* Detect inconsistent situations (e.g. in a method, expected an object as input argument, but got a null)

* Log this as an error and provides feedback to the caller of the method that the operation failed (e.g. through an error callback).

* The caller can then do anything to recover, (e.g. reset a state, or move to some sort of error state, close a file or connection, etc.).

* The caller should then also provide feedback to its caller, etc. etc.

This programming methodology gives the following advantages:

* You are made to think about the different problems that can occur and how you should recover them (or not)

* Highly semantic feedback about what is going wrong when an issue occurs; this makes it very easy to pinpoint issues and fix them

* Server application keeps on running to handle other requests, or can be gracefully shut down.

* Client side application UIs don’t break, user is kept in the loop about what is happening

Of course you will need to keep a safety net to catch uncaught exceptions,
properly logging and monitoring them (and restart your application if
relevant)

The fail-fast approach, as I have seen it applied, doesn’t do any checking or
mitigation, with the effect that:

\- you are thrown out of you normal execution path, losing a lot of context to
do any mitigation (close a file, close a connection, tell a caller something
went wrong)

\- you only get a stack trace from which it can be hard to figure out what
went wrong

\- there can be big impact on user experience : UIs can stop working, servers
that stop responding (for _all_ users).

I have very good experiences with using the defensive coding paradigm, but it
takes more work to do it right; for many, especially in the communities that
use dynamic typing, such as the JS community, this seems to be a too big a
hurdle to take. This is unfortunate because it IMO it could greatly improve
software quality.

Any feedback is welcome!

(Edit: formatting to improve readability) (Edit: clarified defensive coding as
an opposite approach to fail-fast)

~~~
Silhouette
FWIW, I wouldn't suggest the term "defensive coding" as the opposite to "fail
fast". It's very similar to the established term "defensive programming",
which IMHO is more about designing systems to make fewer assumptions. How you
then handle a situation where you do detect that some expectation has not been
met, including the fail-fast strategy, seems like a related but separate
issue.

Terminology aside, though, I agree with much of what you say. The idea that
it's generally acceptable for buggy code to just crash out seems to be making
an unwelcome return recently, often among the same kinds of developers who
don't like big design up front or formal software architecture because they
want everything to be done incrementally and organically, and in the case of
web apps specifically, often among developers who also consider code that runs
for a year or two to be long-lived anyway.

~~~
visionscaper
With defensive coding I indeed meant defensive programming. You always want to
fail fast (faster also means that you can fix more bugs), but this is often
interpreted as "fail _hard_ ": no prevention or mitigation what so ever. In
this sense I meant it is the opposite of defensive programming.

What I notice is that developers who also have a background in statically
typed (system) languages, are much more disciplined when it comes to defensive
programming and logging/error handling. (I'm afraid this also correlates with
age).

BTW, I like your description, "designing systems to make fewer assumptions",
for defensive programming!

------
basicplus2
On error goto...

------
prodigal_erik
This isn't stuff _every programmer_ should know, it only concerns people who
are trying to write complex non-blocking Javascript without async/await (which
are already implemented in Babel and proposed for ES7). It also focuses on
Node-only idioms which IMHO should be deprecated in favor of ES6 Promises
(which Node's LTS release supports natively!)

~~~
Illniyar
The title of the post is just "error handling in node.js".

Probably should change the submission title.

------
shark0
This is a joke

------
wyager
I'm not really sure how to phrase this constructively, but this is horrible.
Not the article, just the fact that humans expose themselves to this sort of
stuff. Why would you choose to use a language that makes something as mundane
as error handling this ridiculous and unpleasant?

~~~
Illniyar
There is nothing mundane about error handling, in fact it's one of the hardest
things to get right in a programming language (see Rust's error handling saga
for instance).

There is no language I know of where error handling is both simple and not
overbearing.

~~~
alkonaut
The most annoying thing about all this is that the central argument of this
article "Separate recoverable errors from bugs" never made it to a widely used
imperative language. C# had the opportunity but blew it.

Java mixed the two kinds of exceptions up completely and checked exceptions
just added insult to that injury.

The best implementation I have seen for an imperative language is in Midori
(The language used in Microsofts research OS with the same name).

[http://joeduffyblog.com/2016/02/07/the-error-model/#bugs-
are...](http://joeduffyblog.com/2016/02/07/the-error-model/#bugs-arent-
recoverable-errors)

It's basically "C# done right". The blog post is well worth reading.

~~~
Silhouette
That the blog post is indeed very interesting reading.

