

Erlang vs Java memory architecture - javacodegeeks
http://www.javacodegeeks.com/2011/04/erlang-vs-java-memory-architecture.html

======
mryall
Very interesting. I'm surprised that there's no way to limit the size of
Erlang queues and therefore memory usage. Surely simple limits like that are a
necessity for any production environment? Can anyone with deeper knowledge
about Erlang comment on that?

~~~
cmullaparthi
The idea is that there is something very wrong with your design if you allow
that to happen. In telco networks, once a message is accepted into a system,
it is expected to be processed. You should throw away any messages you can't
handle (because of overload) at the edge of the system. The same principle can
be applied for any system which is serving any sort of network traffic.

If this unbounded message generation is happening within the system, again
there is something wrong. You shouldn't be in a situation where a process is
producing messages which other parts of the system cannot consume.

~~~
anamax
> In telco networks, once a message is accepted into a system, it is expected
> to be processed. You should throw away any messages you can't handle
> (because of overload) at the edge of the system.

That's attractive in general, but is it necessarily possible?

Assume a planar grid with O(N __2) nodes of which O(N) edge nodes that can
accept a message for a given node somewhere near the center. In other words,
that "given node" is O(N) steps away from any edge node.

Suppose that said given node has limited capacity. How can it keep those edge
nodes aware of its capacity without wasting capacity or requiring O(N __2)
queuing at said given node? Remember - it takes O(N) time to get a message
from the given node to any edge node and during that time and there are O(N
__2) places for a message to be in the network at any moment in time.

Now repeat for the other O(N __2) interior nodes.

~~~
cmullaparthi
Yes, it is possible. You have to ensure that every edge node is configured in
such a way that it allows only a certain amount of traffic to a "inner" node.
And if it finds that the inner node isn't as responsive as it should be, it
has to scale back how much traffic it will accept.

For real time traffic, where queuing doesn't make sense, there isn't anything
else you can do really.

~~~
anamax
> You have to ensure that every edge node is configured in such a way that it
> allows only a certain amount of traffic to a "inner" node.

Let's assume that we have 5 edge nodes and that each is configured to send t
traffic. The total amount of traffic that they can send is 5t. If the inner
node can handle 5t, you won't need to drop traffic in the network. However,
you're likely to drop traffic unnecessarily at an edge node.

Suppose that you have 4t worth of traffic continuously. Clearly the inner node
can handle that.

However, consider what happens if all of the traffic during hour n tries to
enter the network at edge node n.

For example, during hour 0, node 1-5 see no traffic and node 0 sees 4t worth
of traffic. By configuration, node 0 drops 3t and sends t to the inner node.
Since none of the other nodes are sending any traffic, the inner node only
sees t worth of traffic even though it could handle 5t.

You can try to get around this problem by dynamically configuring the edge
nodes, but since they're as much as N apart and they're responding to
presented demand, that configuration is always stale.

You can guarantee that you don't overload the inner node, but only if you're
willing to waste some of its capability in some circumstances.

~~~
cmullaparthi
Yes, I agree. There are a few ways around that:

\- Some out of band signalling between the edge and the core to indicate
current load levels at the core.

\- Implement similar form of overload protection at the core. You might now
say, "Aha! But you're throwing away messages you've accepted". The answer is
no: the edge nodes are typically just relaying messages after some checks,
application layer processing is typically done in the core.

This thread is becoming slightly hand-wavy but I hope you get the gist of what
I'm saying :)

~~~
anamax
> Some out of band signalling between the edge and the core to indicate
> current load levels at the core.

In-band/out-of-band doesn't make any difference - the edge nodes can't
communicate fast enough. Remember, they're O(N) from each other. (Yes, two
neighbors are adjacent, but there are nodes on the other side of the grid.)

Long wires doesn't help either - distance always matters.

> You might now say, "Aha! But you're throwing away messages you've accepted".
> The answer is no

Actually the answer is "yes" - The original claim was "In telco networks, once
a message is accepted into a system, it is expected to be processed. You
should throw away any messages you can't handle (because of overload) at the
edge of the system."

> I hope you get the gist of what I'm saying

I get the gist - the problem is that you must either waste capacity, drop in
the interior, or both because of distance and the fact that there are more
interior nodes than there are edge nodes. (And, you can't have all edge nodes
because of distance and fan-in.)

One of the subtle ways to waste capacity is to run the control signals faster
than the data signals.

------
Egregore
So the memory in Erlang is isolated per thread basis, and it makes Erlang more
scalable. While in Java we have only common/shared memory.

~~~
hassy
The GC in Erlang VM is also per-process which allows it to be soft real-time.
An application can be using gigabytes of memory, but it it's split between
thousands of processes, garbage collection can be very very fast because the
GC only needs to work through a small heap at a time.

~~~
silentbicycle
You can also measure the amount of memory a process tends to use in typical
operation, then spawn it with an appropriately sized heap. It never needs to
do GC at all, it just dies and frees its heap when complete. Look at
min_heap_size in spawn_opt/* .

It's the same concept as giving a copying garbage collector really large
subspaces - GC only happens when one fills. Giving each process its own heap
means that the GC pauses will still be brief.

------
ww520
Java has thread-local-storage which is the same as Erlang's private heap in
term of concurrency isolation.

~~~
BarkMore
The variables themselves have concurrency isolation, but not the objects
referenced from the variables. The thread local storage implementation in some
JVMs uses the shared global heap.

There's no sharing at the application level or the implementation level in
Erlang.

~~~
ww520
If you don't explicitly pass the objects referenced in TLS to outside or to
other threads, I don't see it breaks the concurrency isolation.

For some JVM implementation that use a shared global heap, that may just
impact the performance due to lock. The concurrency isolation contract is not
broken. The app still has full confidence that its TLS values are not changed
in other threads. OT: if performance comes into picture, one can claim
excessive copying of immutable objects cause performance problem as well.

~~~
angusgr
_If you don't explicitly pass the objects referenced in TLS to outside or to
other threads, I don't see it breaks the concurrency isolation._

Sure. That's the same as if you never share any resource in any shared global
heap between threads.

The point is that Erlang's concurrency model allows the programmer to never
make that mistake, because you're storing immutable resources not references
to mutable data. See my longer post on the subject, further down the page. :)

 _OT: if performance comes into picture, one can claim excessive copying of
immutable objects cause performance problem as well._

When you have a share-nothing immutable-data concurrency model like Erlang's,
heap type is an implementation detail (and you'd probably choose it based on
your performance requirements, as you've alluded to.) To wit, as TFA says, you
can run the Erlang VM with either a shared heap or a separate heap model.
Language semantics stay the same.

------
kotedo
I am not sure this article has some real bearing for someone who wants to
learn about Erlang. It actually might lead to more confusion that it's worth.

~~~
rick_bc
It looks like an entry about what Java can learn from Erlang.

It's not about learning Erlang.

I am pretty sure Java guys know quite well about various memory models. But
early Java was kind of experimental, so backward compatibility affects what
they can change now.

