
Storing state in Erlang with processes (2014) - brudgers
https://dantswain.herokuapp.com/blog/2014/09/27/storing-state-in-erlang-with-processes/
======
rdtsc
That's a good introductory article in that it starts from a very basic idea of
just an assignment and ends up with a gen_server.

This explicit state management and immutability is very nice in large
applications. You can see what is happening in a piece of code just by looking
at the local context (a function or a module) since you'd have the initial
state, the update operation, and the new state. If something goes wrong,
tracing just that function can often reveal the error. Contrast that to an
implicit "this" object with multiple levels of class inheritance, then things
become a lot more complicated.

Further, when it comes to state management, process heap isolation is
invaluable. If something goes wrong, you can safely restart just some parts of
the application without leaving others in an unknown state. Operating systems
have figured this out many decades ago, and going back to concurrency units
that share a single heap feels like we are back in the Windows 3.1 world,
where your calculator crashing also took down your editor.

~~~
yomly
>You can see what is happening in a piece of code just by looking at the local
context

^ This. A thousand times this.

When I read this article, all I could think was _this_ is what object oriented
code is meant to be about, aka message passing.

I have been working on a Rails codebase and it's an absolute nightmare -
without knowing everything all at once you can come to a tiny class that has a
bunch of identifiers set via inheritance or somehow used via inflection. It's
completely mad.

Not meaning to start a flame-war but it confuses me how OO went from a simple
idea of encapsulated "objects" having well defined interfaces to this crazy
spaghetti mess of inheritance etc.

That said, bad code is bad code and you can ship a ball of mud in anything...

------
jleang
hmm, its better to store state in ETS so when it does crash it can be
restarted by the supervisor without losing the state.

~~~
filmor
This depends highly on what you are storing. Letting a process reuse "old"
state after a crash can easily cause a restart loop.

------
derefr
There’s a higher, systems-architecture-level equivalent to this, too (which
Erlang is also designed to take advantage of, re: the Erlang distribution
protocol, and the Mnesia DBMS), and that’s the idea that “durability _is_
replication.”

That is, there’s no semantic difference between “a disk” and “the in-memory
heap of another node on the network”, in terms of what fault-tolerance
guarantees you get by adding one or the other to your system. Machines can
crash? So can disks. Back up to a disk? Back up to another node. Restore from
on-disk state? Restore by streaming state from another node.

You can build a DBMS cluster, or even something “ultra-durable” like S3, with
clusters of disk servers, sure. _Or_ — providing memory is cheap-enough for
your use-case — you can build it with clusters of RAM nodes, with no disks at
all (and thus one tenth the maintenance costs.) Power outage wipes the whole
cluster? Not if you’ve got a UPS, a generator, and a whole lotta gas (like
telecom base-station switches are stocked with.) Or if you’ve got backup nodes
— which are also just RAM nodes! — to stream to in another region. (Your
system itself doesn’t need to be multi-region distributed; it can treat these
nodes the same way a regular persist-to-disk system treats tape backup.)

And, conveniently, it’s not just memory accesses that are faster with such an
architecture. If you build your cluster architecture as a set of services
where each service isn’t just a single node (or just a load-balanced set of
nodes), but rather a “distribution set” of nodes—i.e. one “transaction router”
node that _multicasts_ commands to a set of equivalent worker nodes, where the
worker nodes all deterministically do the same things in response; then,
rather than one master, you have N copies of your node, _all_ of which are
equivalent hot-standbys of your “master”! (You don’t have to use them as such;
you can route client requests to just one “master” in the distribution set,
making the others merely _warm_ standbys.) Compared to streaming replication,
where some of your nodes are always “behind” their master, this approach is
far less risky in terms of data loss. This isn’t streaming replication; it’s
RAID for memory writes!

(And yes, this is the precise use-case that Erlang’s distribution protocol is
designed for: nodes in the same “distribution set” working as equivalent,
deterministic warm-standby _alternatives_ to one-another, presenting as a
single virtual “node” in a larger cluster architecture, almost always
connected by a non-partitionable network backplane, like a single top-of-rack
network switch. Of course, nobody _uses_ Erlang’s distribution this way other
than Ericsson... but that’s because nobody but Ericsson has needs that _force_
them to avoid disks—and avoiding disks entirely isn’t cheap, at least today.
Still, it’s important to keep in mind, because the Erldist protocol works best
when you hew to this design, and actively fights you when you don’t; and
because Mnesia was never designed for anything other than being a memory-to-
memory replicated DBMS—and it works great for that!—but it really kind of
sucks once you introduce any disk copies to its replication strategy.)

