Hacker News new | past | comments | ask | show | jobs | submit login
Flow-based programming and Erlang-style message passing (bionics.it)
69 points by samuell on June 15, 2015 | hide | past | web | favorite | 22 comments

"E.g. backpressure is a thing that is not easily supported implicitly in Erlang/Elixir in the same way as in FBP languages where the bounded-buffer channels provide implicit backpressure by blocking the sends on the outport they are connected to when the buffer is full."

This is one thing that I've learned from using both Go and Erlang now... I like asynchronous message passing as a better default, but if I had to choose between the core language supporting either synchronous message passing with Go-like channels or async message passing like Erlang, I'll take the Go case, because it's way easier to implement async on top of what Go provides than it is to implement sync on top of what Erlang provides. And while sync may be the exception, when you want it, you really want it, and Erlang doesn't offer anything for it that I can see, at least at user level.

As the phrasing above may suggest, though, what I'd like is both. Of course sync can only be used in the local OS process and async works across networks... that's just something you'd have to work with as a fundamental constraint. That's life. Even with a conventional manifest type system (i.e., C++/Java), if you treat the two types of communication as different types, you won't mix the two up.

If were going to try to do flow-based with Erlang, I'd look at moving code to the data rather than the other way around. That has its own negative tradeoffs too, but I think in the end they'd be less bad than trying to send data through Erlang's system. As good as Erlang is at message passing, it's very not designed for pushing lots of data repeatedly through the messaging system. If you use granular flow constructs than involve pushing data through a couple dozen flow elements and you model those as processes, you're going to get hammered on all the copying that implies. (For anything other than binaries, but what kind of flows do you have that involve sending a binary through it such than none of the flow elements modify it?) Though I'd say in the end, even with moving code around, the conclusion stands and Erlang isn't really good at flow-based itself.

What do you mean by sync being hard to implement in Erlang? Simple synchronous requests can easily be implemented:

    Server ! {request, Request},
        {reply, Reply} -> {ok, Reply}
This is also the default way when using the built-in gen_server library.

In Go, sending a message on a channel is actually a sync point between the sender and the receiver; a channel blocks until it has both. (Buffered channels may not necessarily block, but you have to treat them as if they do anyhow in general.) Thus, the send itself is synchronous, not just the fact that we're "waiting until we get an answer". If the sender proceeds past the line that sends on a channel, we know a receiver has received it.

I don't know of an efficient way to emulate this in Erlang. Suppose you have a pool of 10 undistinguished server processes in Erlang and you just want "a" process to answer. If you use Go channels, you have all 10 listen on a channel, and you automatically get the behavior where as long as one is available, that one will pick up the request. You also automatically get backpressure; if none are available, the attempt to send will hang. (There are ways of turning that into a timeout if you need to.) In Erlang, you could try to create an 11th coordinator process, but you'll create a lot of additional traffic and take a latency hit on every request. (Plus, some of the naive ways you might implement that will get you in further performance trouble with mailbox scanning. This is going to be a non-trivial thing to build correctly.) Or you could just randomly pick and send, but if you pick one that's currently answering a slow request while others are available, you get spurious latency.

It hasn't been a killer in my system, but it's been an annoyance.

And to be clear, let me reiterate that having using both systems, I actually still like Erlang's behavior as a better default. It's really hard to deadlock Erlang... it's much easier to deadlock Go. The async message passing is, in my experience, much easier to use correctly without much thinking. But it would be nice to also have a "channel" in Erlang, despite the limitations it would have to have with not being able to go across nodes. In fact, technically speaking, subject to that limitation (and there are already a couple of other functions limited to the current node), there's no reason I could see why this couldn't be added to Erlang.

See for instance: https://github.com/inaka/worker_pool , particularly the section discussion "strategy", and the discussion about "available_worker" and its discussion on performance implications, and its specific references to the other tricky edge cases to consider. Of course, programming languages being programming languages, someone else can do the work and you can just use it, but in Go, this behavior is quite trivial. (By contrast, in Go, proper asynchronous messaging in the Erlang style is a challenge. Whacking together a "slice" and some "messages" isn't that hard, but you've got other edge cases to consider... to say nothing of network transparency!)

I was not the who asked but thanks for the reply jerf, indeed I thought you had in mind more complex cases than the GenServer one! It is not that sync is hard in Erlang/Elixir but once you want to involve multiple processes, it may be the case.

For example, in pipeline parallelism, we don't want to be completely sync between the stages, because it would mean you would always have at most N "events" in the pipeline at any moment (where N is the number of stages). If you want to have some sort of bound, you need to devise a message protocol between the stages. That's what the folks behind Reactive Streams are doing with their back-pressure work.

This is exactly the kind of situations I want to make easier in future Elixir versions. I talked about them from the perspective of collections (and not FBP) in my Elixirconf keynote: https://youtu.be/EaP0y4pdKD0?t=1846. We are experimenting with the addition of a GenRouter that would specify how processes communicate (with back-pressure) eventually supporting custom dispatch rules. We are also looking into projects like Basho's sidejob (https://github.com/basho/sidejob) and Ulf's jobs (https://github.com/uwiger/jobs) to ensure we can also provide sampling based routing.

On top of that, we need to discuss data-based parallelism (where we split/spread the data once instead of passing it around) and how it would all fit with supervision trees. It is a lot of work and we likely won't be able to solve all cases but I am very excited about the possibilities if we get it right.

I would encourage you to consider whether there may be more useful primitives that the Erlang VM could support for this case, if you aren't doing that already, which I can't think of how to quickly check for so apologies if I'm late to that party. Hopefully you'll have more pull on the matter than Some Schmoe like me would. :) A lot of these things strike me as either inefficient or even infeasible to emulate via strict user-level Erlang-style message passing, but are really easy with some well-chosen primitives added in, even if they are only exposed very carefully at the Elixir and Erlang levels.

Some of this is personal development bias, I have to admit, in that I don't really "believe" in APIs that try to abstract away the difference between local and network traffic. Making the difference small may be advantageous, but I actually don't like it when it's reduced to zero. Usually, of course, a language goes the direction of making local traffic easy and network traffic unbelievably hard, Erlang makes network traffic easy, but then makes it somewhat more difficult to know whether you're going across the network or not. That's really useful for a lot of things, but sometimes if you're willing to agree that you won't need the network stuff you can get some big wins; in Erlang-land, for instance, consider the way larger binaries are shared instead of copied, an optimization that works locally and is mostly meaningless across a network.

> I would encourage you to consider whether there may be more useful primitives that the Erlang VM could support for this case, if you aren't doing that already

It is a chicken-egg issue though. Before proposing anything like this to the OTP team, I need to prove it is usefulĀ and have folks build something relevant with it. So I need to make the best with the abstractions we have today and then make a case.

For this reason, a lot of the challenge will be in optimizing the "topologies" to avoid copying as much as possible. Data-based algorithms for parallelism (farm, pmap, etc) will likely be more useful and that would be the next milestone.

> Some of this is personal development bias, I have to admit, in that I don't really "believe" in APIs that try to abstract away the difference between local and network traffic.

That's a very good point. I was also warned by Konrad from the Akka team that, if we rely on ack messages for back-pressure, they likely won't work on a distributed setup as messages have no delivery guarantees. This means we wouldn't be able to abstract the network anyway.

I have no experience with either Go or Erlang but if you have blocking reads, it doesn't seem too complicated to get an efficient blocking write to a multiplexed channel.

I guess I should be a little more clear on my assumptions going into this, chiefly that "you can send enough information to get a worker to non-blocking-send an acknowledgement back to you telling you that they accepted the workload." If you can't do that then I'm not sure what would happen, and that seems like a legitimate language problem. Call this ACK-mechanism a "private channel", say.

A blocking write consists of opening a private channel, issuing a write to the multiplexed channel that contains some sort of "address" for your private channel, and then issuing a blocking read on the private channel. To do this, workers need to be designed so that the first thing they do when they pick up a work order involves sending the ACK containing the ID from the work order and the ID of the worker. Then, when there aren't enough workers to pick up your request immediately, you get to block on your private read until someone picks up your workload and messages you, "hey, I'm just getting around to this...".

Maybe a framework like this one: http://chrisb.host.cs.st-andrews.ac.uk/skel-test-master/tuto... could help with this?

What in particular about gen_server:call is not working for you?

All I can really do in reply to this is ask you to reread the post I wrote more carefully. If you have a more specific clarification, let me know.

I read it, I'm asking pragmatically; although in theory using worker pools can have problems at the edges, in practice erlang's message processing speed is quite high, and, e.g., worker_pool in production at many thousands of messages per second is working great for me. The latency of a single message pass is submillisecond. You're having problems, despite that?

I covered that. If you have high variance in how long message processing takes, which is very easy to do accidentally, then if you something simple like "random" you can easily block up the message latency. (Of course, "don't do that". Easier said than done.)

If you're at a scale at which you're seriously using Erlang, "fast" isn't really a concept that applies anymore. You really end up with "enough performance" or "not enough performance", with an at-times surprisingly sharp transition from one to the other, and while the base performance of the underlying language is certainly a major factor in whether you have "enough", it isn't the only factor, or even necessarily the most important. (Which is good for Erlang, because it's actually quite slow at most conventional things.) It doesn't matter if I've got "sub-millisecond message processing" if I'm trying to run more messages through a box than it can handle, or if I've got an external response handler that sometimes clogs up.

There's no way that Erlang can possibly be "fast" enough that somebody won't throw a task at it that's still a performance challenge.

Yes, you don't get sync for free when using the ! operator in Erlang, but that's why you use the OTP constructs like handle_call (sync) and handle_cast (async) to model that sort of behavior...

Throwing the ! operator around haphazardly can be useful, but generally it's a code smell, as things can usually be encapsulated more effectively using OTP constructs.

Edit: saw you addressed this in another comment.

That's an interesting way to put it! And yes, I agree one would like both, optimally.

Btw, Robert Aloi linked some kind of simpler back-pressure implementation available already today in Erlang, on G+: https://plus.google.com/u/0/+SamuelLampa/posts/Q1XLTpk4KQC

... though as he says, might not be sufficient for whats typically needed in flow-based, or CSP-style programs.

And yea, I very loosely treat Go as more or less a Flow-based programming system ... especially after I realized one can write more or less FBP network semantics in vanilla, framework-less Go [1].

[1] http://blog.gopheracademy.com/composable-pipelines-pattern


Curious. Have you used systems that provide fibres ? The idea is that blocks of code exchange control whenever a fibre is written to or read through. I have not used Erlang or Go so am far from being an expert in these things.

You might find this interesting http://felix-lang.org/share/src/web/tut/fibres_index.fdoc the site is undergoing reorganization so expect (and pardon) broken links. BTW I am not the one who develops Felix.

I wonder if fibres are not quite very similar to the CSP (communicating sequential processes) paradigm of Go..

Indeed they are. Compared to Felix, Go is quite new though.

Dealing with back pressure in general is a huge subject. How do you deal with a producer that produces more than a consumer can handle? I had just finished reading an excellent interview with the collaborators of the Reactive Streams API that talked about wrestling with this:


Indeed! I have been following the efforts behind Reactive Streams with a special focus on how they are tackling back-pressure.

An interesting approach is join calculus. This tutorial builds on the same "chemical" metaphor used in the article


Thanks, will check this!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact