
Erlang and code style - kungfooguru
https://medium.com/p/b5936dceb5e4
======
rdtsc
Great work Loius! These essays should become a little book some day.

> This is called intentional programming.

I am stealing that phrase. Often when trying to explain the coding style in
Erlang, I kind of describe what you don't do "don't handle error" in band, but
this makes it into a positive prescription -- do what the intent of the code
is.

And note that these things are very easy and straight forward in Erlang.
Erlang is one of the few language run-time built with this in mind. Fault
tolerance was at the top of the todo list. That is what makes it stand out of
the crowd.

But if you want, you can still copy this pattern in our system. For example in
Python use a green thread and a queue to emulate an actor. If an exception is
thrown (and you don't use linked custom C modules, which will screw you over
in Erlang as well!), signal a supervisor thread and let it restart the
original thread.

You can apply this to a large system. Use OS processes. This is the good 'ol
Unix way. Build watch dogs that watch your processes and restart them on
failure. But now you'll be building also the messaging system. So there is
some work you must do. But you can if you want to.

You know how they say "learn functional programming because it will help you
program better". Well the same can be said about Actor and Fault Tolerant
programming. Learn it because it will help you program better. Even if you
don't end up using Erlang.

~~~
jlouis
I stole the "intentional" part from Joe Armstrong in turn :)

~~~
loxs
I have been writing Erlang professionally for 3 years now. Your writings have
helped me immensely. Thanks!

------
thinkpad20
Excellent writing; it really makes me want to pick up some Erlang at some
point. It's also giving me some interesting ideas in regards to the language
I'm currently working on.

I'll also echo a sentiment brought up by another poster which is that the Go
style of error handling seems really unattractive. I'm not sure I'm totally
on-board with the Erlang style of "just let it die" (although it seems to work
great for Erlang), but being forced to deal with exceptions _right away_ in
all cases seems like it would clutter your code and cause a lot of headache.
I'm also curious as to what happens if you neglect to handle the error? If you
wrote something like

    
    
        file, err := os.Open("file.go")
        return file.Read()
    

What would happen if an error occurred? Would `file` be nil? If so, wouldn't
that easily propagate up, even to functions which, on their surface, shouldn't
be expected to fail?

~~~
a_bonobo
First of all, that wouldn't work in go, it would complain that you don't use
the err variable, you could change it to:

    
    
        file, _ := os.Open("file.go")
    
    

The underscore tells the compiler to ignore the error. And yes, file is <nil>
now!

As a sidenote, it would be more idiomatic to do this:

    
    
        if _, err := os.Open("file.go"); err != nil {
          //handle error
        }
        // Do stuff with file
    

This has bitten me before - if you use a non-existent key on a map of strings
(or dictionary, or hash, however you call it) you get "" and an error back.

    
    
        my_hash := make(map[string]string)
        value, ok := my_hash["no_key"]
    

value is "", ok is 'false'.

What type you get back exactly from a bad request depends on the type of call,
but you always get an error back! Maps storing ints return 0, maps storing
strings return "", etc.

Go always forces you to look at each error, since errors are always different.

~~~
jamwt

        file, err := os.Open("file1")
        // complains if you don't check err
        file2, err := os.Open("file2")
        // no problem, no need to check err.

------
rubiquity
Woah, this article is full of awesome bits.

Coding towards a single flow path and crashing whenever you go off that path
has been the hardest and most enjoyable adjustment I've made as I've been
learning the Erlang VM (through Elixir). It's definitely weird at first, but
very refreshing when you get to leave out all those if/elses and exception
handling. I'm still quite bad at Erlang/Elixir, but I'm enjoying it a lot
more.

~~~
dozzie
I find myself writing libraries (like xmerlrpc for XML-RPC). The rule of "just
let it die" doesn't apply too much in such code.

You typically don't want to _decide_ to die in library code. You want to
postpone the decision until the actual application/service is written.

It's easy to convert error reported by returned value to error reported by
exception: just cause badmatch with guard. But to convert an exception to a
value is more troublesome. You need to use try..catch.

To properly catch errors in library, one could either use global try..catch
hidden in the library code or be much more precise on errors and carefully
intercept all the errors that could happen. The latter approach gives an
opportunity to give very specific error messages and allows hypothetical bugs
in code used by the library to bubble up instead of being disguised as
"invalid argument".

As the article says, there are cases when rule of a thumb needs to be broken.
Writing a library is such case.

~~~
jacquesm
The idiomatic erlang way to deal with such problems is to generate an
exception so that the caller can deal with the problem at the next level up
that is set up to handle trouble. This could be the level immediately above
the library, but conceivably it is much higher up, or the (erlang)process may
in fact terminate due to an uncaught exception.

The decision to bail or try to recover should be made at the lowest level
capable of making that decision and libraries are simply not empowered to make
that decision. You also don't want to force people to catch exceptions around
every call to every function in the library.

See section 3.5:

[http://www.erlang.org/download/armstrong_thesis_2003.pdf](http://www.erlang.org/download/armstrong_thesis_2003.pdf)

~~~
lostcolony
Is it? A lot of the library code I've seen in Erlang (especially in the
standard library) is to avoid exceptions, and instead return either {ok, Val},
or {error, Reason}.

Edit: To clarify tone, I actually would like to know, because I've run into
this exact dilemma before. Per the original poster, it seems bad form to throw
an exception that has to be handled somewhere in the calling process if I want
to be able to send that data to another process. That is, process A makes a
library call, cares nothing about the return, just wants to pass it to process
B. If it's an error tuple, it can just do that. If it's an exception, it has
to explicitly handle it, wrap it in an error tuple, and pass ~that~ along
instead, which seems inelegant from library code. It also prevents the library
from being able to declare what it returns in the event of a problem via a
Dialyzer spec; an error tuple you can fully enumerate what sorts of errors you
can return.

~~~
rdtsc
Sometimes Erlang standard library code is not the prototypical Erlang
development code. For example file_open() is a library function returning an
error tuple since it is one of the obvious results file_open() will return.

Now in your code it depends on what you expect or what your guarantees are. If
this is a library that expects to open a config file to that is always there,
well then a match is better and if it doesn't work an exception is thrown,
report written and maybe your process gets restarted.

But now imagine your are writing a configuration file parser. Now you code
acts like a library so the input file not being there is a common case and
maybe code above needs to decide if it should blow up or not on it.

Anyway that is my amateurish understanding of it. Maybe someone who knows more
can chime in and correct me.

------
plainOldText
On a tangent line, one thing I dislike about Go is the error handling. I find
it really annoying to have to check for errors at every single step. Erlang
has probably the most elegant error handling mechanism. Just let if fail, and
then restart the whole process/subsystem.

~~~
thezilch
Certainly you can do this with Go by letting it panic and having your
preferred supervisor (eg. supervisord) supervise the process?

~~~
jacquesm
Yes, but that kills all the other go-routines in that process as well. In
Erlang that would just affect the _one_ thread.

~~~
lgas
You could build an erlang/OTP style supervision tree using recover at the top
level to restart panic'd goroutines.

~~~
jacquesm
You're going to have a hard time maintaining consistent state but if you
manage your bookkeeping _very_ well that might work in a long lived process.
If not you might end up leaking resources. Go routines and erlang processes
are not mapping 1:1 with each other and I (maybe naively) assume that
resources such as file descriptors and other subtle state modifications could
easily survive a naive implementation of such a scheme causing eventual
resource depletion.

Erlang is bulletproof in this respect.

~~~
pushrax
Defers are still called when the stack is unwound due to a panic. It's
standard practise to clean up resources with a defer, so maybe it wouldn't be
too unreasonable if you're mainly dealing with the standard libraries.

~~~
jacquesm
Would you bet your phone switchboard, nuclear plant or assembly line to that
strategy?

Go is very much a work in development, it is a vast improvement over C but I
highly doubt that this strategy will get you out of every corner case. It
might get you from 'crash now' to 'crash a (little) while later' but I think
you will still end up having an unpredictable element in there. It all depends
on how ugly the crash is and once things become unpredictable leakage (even
between go-routines) is not to be ruled out categorically.

Think of erlang processes to be about as well isolated as unix processes and
think of go processes to be a bit more isolated than unix threads but not much
more. The trick here is that erlang is essentially an OS inside a process and
that go-routines are co-operative multi-tasking inside a process aided by one
or more cpu threads. That's a lot closer to the C multi-threading model than
erlang and that implies there are some risks.

~~~
troutwine
It's worth noting that Erlang is only a soft-realtime language. I wouldn't bet
my nuclear plant and only some assembly lines on it. (There's plenty I do bet
on Erlang, though.)

~~~
phamilton
I wonder whether Erlang insome form could provide hard realtime guarantees.
Something along the lines of LING.

~~~
troutwine
It's an interesting question. I think you'd need at least:

    
    
      * a way to define scheduling requirements on a process level (beyond current, coarse, priority settings),
      * bounded process message queues (this would be handy in soft-realtime Erlang),
      * deadline accurate receives, 
      * accurate control over resource allocation
    

I've been meaning to sit down with some theory and think through this more
analytically, but the above are my BART thinking-time derived set of
requirements. I'm sure there are cases--some which would be obvious, in
retrospect--that are not covered by the above.

~~~
jacquesm
You'd have to run erlang directly on the hardware without an intermediary OS
and you'd have to schedule the individual erlang processes using pre-emption
and by giving them priorities. You'd also have to add a ready-list per
priority.

After all, as long as BEAM is running as a child process of a host os that is
not hard realtime you can't guarantee much of anything.

~~~
peerst
Well Erlang is running on RTEMS in production in industrial control systems
[http://www.grisp.org](http://www.grisp.org)

Hard realtime processes to come

------
jcizzle
Great article, very much appreciate this guy and his blog as I continue to
learn Erlang.

A question: the pattern of let it crash makes sense to me. However, I struggle
with it when implementing RESTful web services. Letting the process crash will
typically yield a 500 - the monitor on the connection process in the web
server library ensures that. Clearly though, a status code and some additional
information is a more appropriate response to the client. An example is
returning a 400 for "missing arguments". In a contrived example of idiomatic
Erlang, I feel like I'd write this:

    
    
        handle_request(Request) ->
            % username and password are required
            {ok, Username} = request:get_argument(<<"username">>, Request),
            {ok, Password} = request:get_argument(<<"password">>, Request),
    
            UserRecord = db:find(my_database, users, #{<<"username">> => Username}),
     
            Password = maps:get(<<"password">>, UserRecord),
    
            request:respond(200, UserRecord, Request).
    

And if the key username did not exist in Request, request:get_argument/2 would
return undefined or {error, Reason}, giving me a bad match error. Or if the
password didn't match, I'd get the same. In order to intercept that and return
a reasonable status code, I would have to catch that in handle_request. My
question, then, is am I missing a best practice on how to handle this? Or is
this just the place where I do need to catch errors and process them? And if
that is so, isn't it at odds with the whole concept of writing intentional
code?

~~~
jlouis
A couple of hints:

1\. Use a finite-state-machine REST framework like webmachine or cowboy_rest.
This will help you in the long run once you grok how they work.

2\. Your intention here is that the user might have done something silly.
Write a helper which can load arguments from the request and fail if some of
them are missing. The best approach is to shuffle as much as possible into a
routing layer and then let the routing layer return the 4xx responses. This
only leaves up optional arguments, where an undefined option _is_ what you
want to handle. Look at how, e.g., cowboy is doing this.

You can essentially avoid all of this boilerplate, if you construct your HTTP
RESTful API correctly.

db:find/3 should probably return either {ok, UserRec} or not_found. Something
along the lines of

    
    
      case db:find(my_db, users, #{<<"username">> => Username}) of
        {ok, #{<<"password">> := Password} = UserRec}} -> respond(200, ...);
        {ok, UserRec} -> respond(401, ...);
        not_found -> respond(404) % or something more appropriate
      end.

~~~
jcizzle
Thanks for the followup. Item #2 is a point very well taken and I'll be
exploring that route, appreciate the response.

------
davidw
These are excellent points, and I have passed them along to my colleagues.
There's something that a lot of these essays and tutorials about Erlang kind
of gloss over that I'd like to see, though:

Ok, I let it crash. Now what? So many of them seem to stop here and think that
everything is great. It isn't: if I have some kind of long-running program,
like, say, a web site, I should probably do more than just happily let
everything crash as the error propagates its way up the call chain (no, it
doesn't do that immediately, but as each thing starts and fails again enough
times, it does propagate). For instance, displaying something human readable
on a UI, or sending email or something other than logging a difficult to read
Erlang error.

More examples of what real world programs do _after_ the crash, please!

~~~
exo762
Crashing in Erlang is a strategy that lets you handle unexpected errors. In
your web server example it corresponds to HTTP 500. Other problems (e.g. 404)
you need to handle yourself, not relying on crashing.

~~~
davidw
Ok, right, but that means that _something_ is catching that crash and
returning a 500, and not just crashing the entire web server. I'd like to see
more people delve into that aspect of Erlang architecture.

It's covered some here:

[http://jlouisramblings.blogspot.it/2010/11/on-erlang-
state-a...](http://jlouisramblings.blogspot.it/2010/11/on-erlang-state-and-
crashes.html) (by the same author, jlouis!) and here:

[http://learnyousomeerlang.com/building-applications-with-
otp](http://learnyousomeerlang.com/building-applications-with-otp)

with the term "error kernel", but not a lot of space is dedicated to practical
examples, compared to the amount of writing dedicated to how great it is to
let things crash.

~~~
loxs
In fact, it is actually crashing the whole "web server". As the whole "web
server" in Erlang is an Erlang process. And you can have millions of them.
Every single request is handled by its own process "web server" spawned on
demand for that request. And it's still reasonably fast.

~~~
davidw
I get what you are trying to say, but: not really. Something like cowboy is an
Erlang "application", and cowboy most assuredly does not crash on one bad
request: it has a try/catch in order to deal with errors without tearing
everything down.

Somewhere in an Erlang system, there has to be a judicious use of try/catch in
order to keep things running when some kind of persistent error occurs.

~~~
loxs
You are technically correct and wrong in the same time. Cowboy is an
application, but cowboy is not the "web server". It is an application that
spawns and supervises multiple "web servers". Every one of them can crash,
without bringing cowboy down, because that's the way its supervisor is
configured. It's called simple_one_for_one strategy.

And no, it is not mandatory to have "try/catch". Well, if you mean in the
general sense, yes of course. But that is done by OTP or cowboy (I'm not sure
about cowboy). What is significant though is that the programmer does not have
to deal with that at all. There is a huge difference in the way you write
Erlang code and PHP code. And that's the whole point of jlouis' article.

We try to not have "persistent errors". Like things that crash thousands of
times. We usually crash only in processes that don't live forever. In fact we
try to have most our processes as disposable.

Also, what he refers to as "kernel" is a bit hard to explain to non-erlangers.
It is usually either a library or a process/server that is strictly
deterministic and testable. So we don't expect to have bugs/crashes there and
we almost never have. If we cannot make it this way, then we have several
communicating kernels, that are again deterministic per se.

We try to have the "crashing parts" in non-persistent processes. Like the http
handlers. That is usually the huge part of the code and there we can safely
"let it crash". But for example we don't want to have shaky code in the parts
dealing with the database. So we have something like:

1\. db communicators. Can't crash. Very little code without bugs. This is one
kernel

2\. data converters. Usually deterministic with unit tests. Does not crash but
is written in "intentional style" as jlouis calls it. If this crashes, then we
have a bigger problem. We should fix the bug

3\. Data dispatchers. For example, per user database lock (to a non ACID data
store in our case). This is a process that does not crash (because of problems
in its code). A second kernel

4\. http handlers. Can crash and _should_ crash, especially on unexpected
events, for example - corrupted user input. Written in very aggressive
intentional style

So what we usually do is in 4. we make sure to crash if something unexpected
happens. Then 4. sends a message to 3. like "execute function A (part of 2.)
with user's data, here are the arguments, we are pretty sure they are
correct". 3. retrieves user data from 1., applies it (with arguments) to
function A of 2., saves the result and returns a reply to 4.

Currently we don't shield 3. from errors in 2. If we have a bug in 2. we _want
to know about it_. And especially we don't want 3. to write corrupted data,
which is very possible if we catch errors from 2. So if something unexpected
happens in 2., data dispatcher (3.) will crash, we see the logs and fix the
bug.

And please note. All this is (except for 1.) one user only! Even if we have
some persistent problem with that user, others are usually not affected. We
fix the user's account and write some edge case code to fix the bug.

This is very successful strategy so far.

~~~
davidw
Let's see:

Cowboy uses try/catch because even with supervisors, if the child crashes
enough, the supervisor will eventually crash too! LYSE talks about using try
catch in the error kernel.

> db communicators. Can't crash. Very little code without bugs. This is one
> kernel

DB connections most certainly can go down. I do know a thing or two about
that:

[http://journal.dedasys.com/2014/04/27/an-erlang-postgres-
dri...](http://journal.dedasys.com/2014/04/27/an-erlang-postgres-driver-
refurbishing-open-source/)

In any event, I think this is an under-documented part of Erlang: how to go
beyond just letting everything crash, and design the whole system.

------
mrcwinn
Great job on the write-up! I have precisely no experience with Erlang but I
always enjoy a good programming article. I won't speak to Erlang's
philosophies or paradigms, but I would like to respond to the bit about Go's
handling of errors.

I recall Rob Pike making an argument that errors are not, or should not be,
special things, and so Go doesn't treat them as such. If you don't care about
an error and don't want to deal with it, simply ignore it:

    
    
      image, _ = jpeg.Decode(r)
    

If you care about an error and need to do something to handle it (a log, a
panic, or something more complicated), you can signal that to the compiler.

One advantage of this is you still have access to the returned bit of data,
however mangled or incomplete it _might_ be, which can be useful in certain
instances.

I'm not sure I consider Go errors "silly" unless all you're doing is a panic
every few lines. In actual practice, though, I have found I can pick and
choose when I care about an error, and if I do care about it, there's an
action specific to that error I want to perform. Hardly silly - cumbersome
perhaps.

As for your statement, "On the Go side, it mostly sticks errors in your face
and makes damn sure you do something sensible with them ... " I reject the
notion that in Go you are forced to deal with errors, but I also reject the
notion you shouldn't do something sensible when you are dealing with them. :)

Great work all the same! I come in peace.

------
_mhr_
How does Erlang's error mechanism compare to Lisp's condition/restart system
(this question from someone who knows very little about both)?

~~~
ketralnis
> this question from someone who knows very little about both

Then I'm not sure anyone can answer this adequately without also explaining
both.

But the shortest answer is that they're totally unlike each other in almost
every way.

------
thrillscience
Another great Erlang essay by jlouis.

The Erlang practice of letting things fail is also SOP for experienced
implementors of any large SOA system. You just let things fail. If some
process getting messages from a queue can't talk to the Database, don't retry;
just exit and let the supervisor deal with it.

Erlang is great for writing robust software because this type of error
handling is a first-class feature of the language and runtime.

