

Flow-based programming and Erlang-style message passing - samuell
http://bionics.it/posts/flowbased-vs-erlang-message-passing

======
jerf
"E.g. backpressure is a thing that is not easily supported implicitly in
Erlang/Elixir in the same way as in FBP languages where the bounded-buffer
channels provide implicit backpressure by blocking the sends on the outport
they are connected to when the buffer is full."

This is one thing that I've learned from using both Go and Erlang now... I
like asynchronous message passing as a better _default_ , but if I had to
choose between the core language supporting _either_ synchronous message
passing with Go-like channels _or_ async message passing like Erlang, I'll
take the Go case, because it's _way_ easier to implement async on top of what
Go provides than it is to implement sync on top of what Erlang provides. And
while sync may be the exception, when you want it, you _really_ want it, and
Erlang doesn't offer anything for it that I can see, at least at user level.

As the phrasing above may suggest, though, what I'd like is both. Of course
sync can only be used in the local OS process and async works across
networks... that's just something you'd have to work with as a fundamental
constraint. That's life. Even with a conventional manifest type system (i.e.,
C++/Java), if you treat the two types of communication as different types, you
won't mix the two up.

If were going to try to do flow-based with Erlang, I'd look at moving code to
the data rather than the other way around. That has its own negative tradeoffs
too, but I think in the end they'd be less bad than trying to send data
through Erlang's system. As good as Erlang is at message passing, it's very
not designed for pushing lots of data repeatedly through the messaging system.
If you use granular flow constructs than involve pushing data through a couple
dozen flow elements and you model those as processes, you're going to get
_hammered_ on all the copying that implies. (For anything other than binaries,
but what kind of flows do you have that involve sending a binary through it
such than none of the flow elements modify it?) Though I'd say in the end,
even with moving code around, the conclusion stands and Erlang isn't really
good at flow-based itself.

~~~
e_proxus
What do you mean by sync being hard to implement in Erlang? Simple synchronous
requests can easily be implemented:

    
    
        Server ! {request, Request},
        receive
            {reply, Reply} -> {ok, Reply}
        end.
    

This is also the default way when using the built-in gen_server library.

~~~
jerf
In Go, sending a message on a channel is actually a sync point between the
sender and the receiver; a channel blocks until it has both. (Buffered
channels may not necessarily block, but you have to treat them as if they do
anyhow in general.) Thus, the send itself is synchronous, not just the fact
that we're "waiting until we get an answer". If the sender proceeds past the
line that sends on a channel, we know a receiver has received it.

I don't know of an efficient way to emulate this in Erlang. Suppose you have a
pool of 10 undistinguished server processes in Erlang and you just want "a"
process to answer. If you use Go channels, you have all 10 listen on a
channel, and you automatically get the behavior where as long as one is
available, that one will pick up the request. You also automatically get
backpressure; if none are available, the attempt to send will hang. (There are
ways of turning that into a timeout if you need to.) In Erlang, you could try
to create an 11th coordinator process, but you'll create a lot of additional
traffic and take a latency hit on every request. (Plus, some of the naive ways
you might implement that will get you in further performance trouble with
mailbox scanning. This is going to be a non-trivial thing to build correctly.)
Or you could just randomly pick and send, but if you pick one that's currently
answering a slow request while others are available, you get spurious latency.

It hasn't been a killer in my system, but it's been an annoyance.

And to be clear, let me reiterate that having using both systems, I actually
still like Erlang's behavior as a better _default_. It's really hard to
deadlock Erlang... it's much easier to deadlock Go. The async message passing
is, in my experience, much easier to use correctly without much thinking. But
it would be nice to also have a "channel" in Erlang, despite the limitations
it would have to have with not being able to go across nodes. In fact,
technically speaking, subject to that limitation (and there are already a
couple of other functions limited to the current node), there's no reason I
could see why this couldn't be added to Erlang.

See for instance:
[https://github.com/inaka/worker_pool](https://github.com/inaka/worker_pool) ,
particularly the section discussion "strategy", and the discussion about
"available_worker" and its discussion on performance implications, and its
specific references to the other tricky edge cases to consider. Of course,
programming languages being programming languages, someone else can do the
work and you can just use it, but in Go, this behavior is quite trivial. (By
contrast, in Go, proper _asynchronous_ messaging in the Erlang style is a
challenge. Whacking together a "slice" and some "messages" isn't that hard,
but you've got _other_ edge cases to consider... to say nothing of network
transparency!)

~~~
josevalim
I was not the who asked but thanks for the reply jerf, indeed I thought you
had in mind more complex cases than the GenServer one! It is not that sync is
hard in Erlang/Elixir but once you want to involve multiple processes, it may
be the case.

For example, in pipeline parallelism, we don't want to be completely sync
between the stages, because it would mean you would always have at most N
"events" in the pipeline at any moment (where N is the number of stages). If
you want to have some sort of bound, you need to devise a message protocol
between the stages. That's what the folks behind Reactive Streams are doing
with their back-pressure work.

This is exactly the kind of situations I want to make easier in future Elixir
versions. I talked about them from the perspective of collections (and not
FBP) in my Elixirconf keynote:
[https://youtu.be/EaP0y4pdKD0?t=1846](https://youtu.be/EaP0y4pdKD0?t=1846). We
are experimenting with the addition of a GenRouter that would specify how
processes communicate (with back-pressure) eventually supporting custom
dispatch rules. We are also looking into projects like Basho's sidejob
([https://github.com/basho/sidejob](https://github.com/basho/sidejob)) and
Ulf's jobs ([https://github.com/uwiger/jobs](https://github.com/uwiger/jobs))
to ensure we can also provide sampling based routing.

On top of that, we need to discuss data-based parallelism (where we
split/spread the data once instead of passing it around) and how it would all
fit with supervision trees. It is a lot of work and we likely won't be able to
solve all cases but I am very excited about the possibilities if we get it
right.

~~~
jerf
I would encourage you to consider whether there may be more useful primitives
that the Erlang VM could support for this case, if you aren't doing that
already, which I can't think of how to quickly check for so apologies if I'm
late to that party. Hopefully you'll have more pull on the matter than Some
Schmoe like me would. :) A lot of these things strike me as either inefficient
or even infeasible to emulate via strict user-level Erlang-style message
passing, but are really easy with some well-chosen primitives added in, even
if they are only exposed very carefully at the Elixir and Erlang levels.

Some of this is personal development bias, I have to admit, in that I don't
really "believe" in APIs that try to abstract away the difference between
local and network traffic. Making the difference _small_ may be advantageous,
but I actually don't like it when it's reduced to zero. Usually, of course, a
language goes the direction of making local traffic easy and network traffic
unbelievably hard, Erlang makes network traffic easy, but then makes it
somewhat more difficult to know whether you're going across the network or
not. That's really useful for a lot of things, but sometimes if you're willing
to agree that you won't need the network stuff you can get some big wins; in
Erlang-land, for instance, consider the way larger binaries are shared instead
of copied, an optimization that works locally and is mostly meaningless across
a network.

~~~
josevalim
> I would encourage you to consider whether there may be more useful
> primitives that the Erlang VM could support for this case, if you aren't
> doing that already

It is a chicken-egg issue though. Before proposing anything like this to the
OTP team, I need to prove it is useful and have folks build something relevant
with it. So I need to make the best with the abstractions we have today and
then make a case.

For this reason, a lot of the challenge will be in optimizing the "topologies"
to avoid copying as much as possible. Data-based algorithms for parallelism
(farm, pmap, etc) will likely be more useful and that would be the next
milestone.

> Some of this is personal development bias, I have to admit, in that I don't
> really "believe" in APIs that try to abstract away the difference between
> local and network traffic.

That's a very good point. I was also warned by Konrad from the Akka team that,
if we rely on ack messages for back-pressure, they likely won't work on a
distributed setup as messages have no delivery guarantees. This means we
wouldn't be able to abstract the network anyway.

------
tunesmith
Dealing with back pressure in general is a huge subject. How do you deal with
a producer that produces more than a consumer can handle? I had just finished
reading an excellent interview with the collaborators of the Reactive Streams
API that talked about wrestling with this:

[https://medium.com/@viktorklang/reactive-
streams-1-0-0-inter...](https://medium.com/@viktorklang/reactive-
streams-1-0-0-interview-faaca2c00bec)

~~~
josevalim
Indeed! I have been following the efforts behind Reactive Streams with a
special focus on how they are tackling back-pressure.

------
murbard2
An interesting approach is join calculus. This tutorial builds on the same
"chemical" metaphor used in the article

[https://sites.google.com/site/winitzki/tutorial-on-join-
calc...](https://sites.google.com/site/winitzki/tutorial-on-join-calculus-and-
its-implementation-in-ocaml-jocaml)

~~~
samuell
Thanks, will check this!

