
Exceptions in Elixir - ck3g
http://whatdidilearn.info/2017/11/19/exceptions-in-elixir.html
======
nerdponx
How does "fail fast" look in practice? Is there a "master" thread catching and
handling exceptions thrown by other threads?

~~~
Ankhers
Not quite. Erlang's VM (BEAM) uses supervision trees in order to keep track of
the various processes and their crashes. Each supervisor and worker in the
tree is it's own process. So, if you have 5 supervisors, and each of those
start 2 worker processes, you will have 15 processes in your supervision tree
(the VM will spawn a bunch for itself to use, so do not think you only have 15
processes running in your system).

Basically, at the top level of your application, you will have a supervisor
that will look after all of the processes that are important to your
application. Each of these processes could have any kind of functionality
(e.g., database connection, HTTP server, etc), or be another supervisor. When
you start these applications, they too may start a supervision tree of
processes that are important to them (e.g., the database connection may
actually start a pool of processes).

In "fail fast" or "let it crash", only the process that actually threw the
exception will die. The supervisor that is looking after that process will be
notified of it being killed and, depending on how you have the supervisor
configured, it may or may not start a new process to replace the one that
died.

Another thing to note, depending on how the supervisor is configured, it may
actually crash if a particular process it is monitoring crashes too many
times. This will make the supervisor crash and it should bubble up to it's
supervisor. Unfortunately, it is possible to take down your entire application
this way.

TLDR: There is no master process that does all of this. Though, each
supervisor is sort of a master process for each of its supervisors and workers
and the processes a supervisor watches may or may not be restarted upon
failure.

~~~
amigoingtodie
So, is there a particular strategy to organizing code in order to 'hot-swap'
it (failing code) out, while keeping a production system up and running?

~~~
amorphid
Pretty much! I haven't played with that feature myself, but Erlang's telecom
origins help explain this feature. If you're upgrading a telephone switch with
N live calls, it'd be optimal to not have to kill those calls just to upgrade
some software. There's more nuance to it than that, but "little-to-no
downtime", or hot swappable code, is a language feature. Pretty neat idea in
an era of "throw away the whole VM/container" to push a config update.

