

Ask HN: How do large engineering teams manage application exceptions? - mickeyben

There are great services like Rollbar, Bugsnag, etc. helping find and sort our application exceptions. When our engineering team was small, exceptions were very easy to manage. Determining the priority of the exception was simple: is it affecting customers? what flow is it affecting? how many occurrences? Once you have answers to those three questions, you can quickly triage the exception.<p>Over time, the number of exceptions in our infrastructure grew, the number of 3rd-party dependencies also grew, the traffic to our website increased 10X, and as a result the quantity of exceptions has made it difficult to parse the signal from the noise.<p>My question is: how do other engineering teams manage their exceptions? I don&#x27;t want to ignore exceptions that could be real issues that we should be fixing and I also don&#x27;t want to litter our code with code that does nothing other than prevent non-harmful, non-user generated exceptions as that would be layering on complexity to our codebase.<p>We tried a new approach recently, based on a spreadsheet + a few lines of Google app script. Basically we elect a “Bug Master” every week whose responsibility is to sort the bugs. This worked ok for a while but engineers don’t all have the same involvement, aren’t always available, they might be in a rush to release something, busy with meetings, etc. And in the mean time, exceptions stack up. Potentially, important ones.<p>Any insights would be greatly appreciated as we&#x27;re trying to find a more sustainable and scalable way for our team to handle exceptions.
======
codeonfire
Usually it is an organizational issue rather than any technical or time
problem handling or classifying exceptions or errors. People have to be
convinced to not log something unless it is going to lead to some action being
taken. Second, if management has given up on quality and is not devoting
enough resources towards resolving issues then you can try to influence them
or influence your team to improve things. In some cases an exception should in
fact start a business process. Defining those business processes would be the
first step to getting buy-in to make code improvements.

------
brudgers
What is the philosophy for throwing an exception? That is to say, an exception
is just another control structure like "if". When using an API and the
documents say "throws foo.exception" then it's just another piece of the API
and the tests for application code that calls the API should include the
testing for the exception. Then it's a matter of deciding if the application
code handles it locally or passes handling higher up the call stack to a
centralized location or ignores it.

In other words, code should be tested for not handling exceptions just like it
is for handling them.

Good luck.

------
room271
For priorisation of stuff: Monitor metrics that relate to end-user experience
(i.e. latency, http exceptions, or whatever makes sense for your service).
There are lots of tools to do this kind of thing, and in particular to
aggregate it at the service level. If you see those metrics significantly
change you can then investigate lower-level problems that might explain what
you've observed (i.e. exceptions, or other software, hardware, or network
problems).

But also, and I agree with other people here, your app shouldn't really be
throwing exceptions that are uncaught full stop (unless it is not user-facing
or that important) so it's worth fixing that too. (Unless you are writing
Erlang or something where the failure model is different.)

------
weddpros
We don't know much about your environment...

Yet if you feel the need to triage exceptions, you probably have too many of
them! you should strive have 0.

If you treat every exception as a disaster, your team will have a better sense
of urgency. Have your system send an email to everyone on board.

That's how we do it. Exceptions are only acceptable if they are exceptions
(machine/network crash) and can't be recovered from... In which case we let
the process crash and restart it.

Non harmful exceptions is a concept I don't get though...

------
brown-dragon
Are you talking about java exceptions?

