

Erlang is a hoarder - skeltoac
http://andy.wordpress.com/2012/02/13/erlang-is-a-hoarder/

======
matthiasl
I think two things are interesting here. 1: "when does Erlang GC a process'
heap?" and 2: "where does Erlang keep a process' data?".

1: Erlang GCs a process' heap whenever that process' heap gets full, or when
you call erlang:garbage_collect() explicitly.

2: Erlang stores most data associated with a process on the process heap,
there's one such heap per erlang process. Binaries are a special case. If
they're large (> 64 octets), the heap only contains a reference to the binary,
the binary itself is stored in an area specifically for binaries.

1+2 can result in a lot of binary garbage left lying around. Here's how it can
happen:

a) A process creates a relatively large amount of unused heap space. This
could happen by temporarily using a large amount of heap, e.g. by calling
binary_to_list() on a large-ish binary, doing something with the list then
dropping it. For arguments' sake, let's say we made a heap (for one Erlang
process) with 30M free.

b) Now the process moves on to a new phase: creating large but short-lived
binaries. Let's say 1M each. Those binaries don't live on the heap, only a
reference to them does. So they only consume 8 (?) octets on the heap.

c) If the references are the only thing using the heap, then you can make 4M
of them before filling the process heap. But since they're 1M each, they'll
eat 4T of the binary heap. i.e. you'll run out of memory.

As you found out, setting 'fullsweep_after' to 0 doesn't help, since a GC is
never triggered. But explicitly calling erlang:garbage_collect() does.

You can investigate a bit more using tracing. erlang:trace/3 can generate a
message whenever the target process is GCed (the garbage_collection entry in
flaglist). If my guess is right, then you should see that your processes
holding all the binaries are never (or rarely) GCed. Tracing the GC is cheap,
you won't notice any performance difference if you only do it on a few
processes.

The process_info BIF can also tell you quite a bit, e.g. the process heap
size.

Disclaimer #1: my knowledge about the details above may be wrong or out of
date. But the mechanism is known. I've seen it in embedded systems and handled
it in similar ways to your approach.

Disclaimer #2: obviously I'm taking a guess. It's possible you've run into
something else entirely. That's why I suggested some things to look at to
confirm or reject my suspicion.

~~~
ams6110
Might this be considered a symptom of an architecture problem? I'm far from an
Erlang expert but my understanding is that processes generally should be short
lived, except for supervisors, which should only supervise. Instead of having
a single long-lived process handling a lot of large binaries would a better
design be to have separate processes handling each binary?

~~~
mononcqc
It depends. Some state needs to live, some needs to go. Long-lived processes
should ideally do few state manipulations, or be easy to replace (so they
store less state). Risky or frequent operations should be done far down in the
supervision tree.

The processes with complex state, things that can't be lost, might tend to be
long-lived. In these cases, they should either only do very simple operations,
or be isolated from the operations on that state.

These processes will generally live higher up in the supervision tree
structure and delegate the risky work to processes lower in the hierarchy;
these short-lived workers will thus have their impact limited, but will also
have their state known before some unit of work, a bit like an invariant. If
the short-lived worker dies, then restarting it with its short-lived state is
a cheap operation.

Restarting the long-lived process is a difficult thing because the state might
be a) possible to re-compute, but complex to do so, or b) bound to events that
cannot be repeated, and can't be lost.

------
cpleppert
as described here:
[http://www.erlang.org/doc/efficiency_guide/binaryhandling.ht...](http://www.erlang.org/doc/efficiency_guide/binaryhandling.html)

Binaries are either stored on a private heap or in a global area where they
are reference counted. Binary reference counting depends on ProcBin objects
stored on a process heap.

Reference counting is only effective when garbage collection occurs, thus
forcing an explicit collection on an process removes the ProbBin objects.

The system has no way of knowing if a collection should be triggered to free
ref counted memory. Collection is local to a process.

------
damienkatz
Uhh, I've seen this too. Part of the reason we are migrating some code from
Erlang to C.

~~~
dizzyd
It could be argued that this is a poor reason to switch from Erlang to C --
this is a problem with how binaries are handled in user code, not something
intrinsic to Erlang VM.

~~~
Saavedro
Eh, every factor involved is implementation detail. The threshold at which
binaries are heap-alloc'ed, gc behavior, etc. I really fail to see how this is
user error.

~~~
frooby
_I really fail to see how this is user error._

That isn't what the parent comment said.

