
Parallel Map in Elixir - scandox
http://www.selectedintelligence.com/post/116327140769/parallel-map-in-elixir
======
rdtsc
Elixir is great. It is built on an awesome industrial grade, battle tested VM
with really unique features like lightweight but memory safe processes. Has a
friendly syntax (although I personally like Erlang's syntax).

Another great thing about Elixir is the community -- it is helpful, friendly,
oriented for learning and newcomers. Also Jose Valim is really an incredible
guy.

~~~
davidw
I'm really surprised at how _fast_ the Elixir folks are moving. They're
building all kinds of stuff that took Erlang a long time to get. First and
foremost the Phoenix web framework.

~~~
yellowapple
It helps that Erlang already did the heavy lifting for us Elixir folks (or
what heavy lifting _wasn 't_ already done was then done by Valim and friends
in Elixir, Plug, Ecto, etc.). A pretty hefty amount of Elixir code (including
most frameworks, like Phoenix, Sugar, Weber, etc.) leverages the same old
Erlang/OTP, just with some funny-looking (in some cases better, in some cases
perhaps worse, depending on perspective) syntax and Plug. Even the dominant
web server in the Elixir realm - Cowboy - is actually entirely written in
Erlang. Compatibility with existing Erlang code (and vice versa) makes for
fewer wheels being reinvented.

It also helps that Elixir came about right around the time when folks were
starting to realize that this old telecom language called "Erlang" might
actually be precisely what we need for modern web development, hence why web
frameworks ended up evolving pretty quickly relative to Erlang (which also saw
a growth in available web frameworks at around the same time; Chicago Boss is
a particularly-excellent example).

------
RootDynasty
Here is another cool version of parallel map in Erlang that uses a fold
instead of two parallel maps.

    
    
      parallel_map(Fun, List) ->
          Last = lists:foldl(fun(Value, Parent) ->
              spawn(fun() ->
                  MappedValue = Fun(Value),
                  receive
                      Rest -> Parent ! [MappedValue|Rest]
                  end
              end)
          end, self(), List),
          Last ! [],
          receive
              Result -> Result
          end.
    

It essentially sets up a chain of processes which pass their results to their
parent when they finish their mapping operation.

~~~
mononcqc
This implementation does rely on a lot of memory copying going on there.
Rather than copying the original element to the new process, then the new
mapped element back to the original process (giving you the original list's
worth of memory copied twice), you'll be copying each list element once to
every worker you spawn, and then you'll copy the parts of the list worked on
to the next worker, on and on, until the full list is finally copied back to
the caller.

So instead of copying 2n elements over the processes, you'll be copying
roughly 2nlogn (or is it n(n+1)/2 ?) elements around.

~~~
RootDynasty
One thing to note is that the the parallel map consisting of two mapping
operations has a different hidden overhead. If a message is received by a
process that does not pattern match with any clauses in the receive block, the
message is stored in a queue. When a new receive block is entered, all
messages in the queue are pattern matched against the new receive block. In
the worst case scenario, the worker process mapping the elements will finish
in list order, so a great many pattern matches will be tried.

The solution to this problem is just to store messages as they are received
inside of a map data structure, so that there is no overhead for receives.
This requires indexing the list which makes the code a lot more inelegant.

Edit: Given n processors and an input of size n, I believe the time
complexities are: Two map solution: O(n^2)+O(f) Fold solution: O(n^2)+O(f) Two
map with map data structure: O(n)+O(f)

Where O(f) is the asymptotic upper bound of the function being mapped

Edit 2: This has me thinking about what will be the most efficient way to
assign an element from the list to worker processes. In Erlang, it's clear
that there isn't a way to avoid the O(n) overhead since the original process
must reconstruct the list in order using cons.

In an imperative language this isn't necessary. Perhaps the original process
can recursively assign indicies by spawning two children worker processes
which are given a range of indicies to work on (who then create their own
assignment processes). I believe the overhead is then only O(log n)...

~~~
lostcolony
Just to comment, you're worrying about big O time complexities and
that...really isn't what is going to dominate anything that is sufficiently
complex to warrant a pmap rather than just a map. The constants on your
computations will almost assuredly dwarf it.

The only O(n) operations are, yes, a completely degenerate case of when
messages get sent back (which is -incredibly- unlikely; with Erlang's task
scheduling you're likely only going to ever have a max message queue length of
a few items, so it's more likely a constant factor. To get the degenerate case
you would need it to finish in -reverse- list order, that is, would need to
finish with the last one first, then the next to last one, etc), and when
reversing the list(s) built up from the map at the end (as under the covers
I'm pretty sure map is written to be tail recursive), which while technically
O(n), is still incredibly fast.

~~~
RootDynasty
I agree that worrying about the time complexity of the non-parallel portions
of pmap is unlikely to be an issue for most use cases. It's still interesting
to think about the tradeoffs though.

Hitting the degenerate case depends on the function you're computing in
question. It's quite possible that the tasks will complete in the given order.
I think you're giving too much credit to Erlang's task scheduler.

Also I'm not sure how one would even implement a tail recursive map function
on a singly linked list. The cons operation can only add elements to the front
of the list. I looked up how the map operation is implemented in Erlang. It
isn't tail recursive:

[https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7...](https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7604d4098cdc9de4/lib/stdlib/src/lists.erl#L1236)

I'm interested to know how you'd implement a tail-recursive version of map
(continuations aren't allowed).

~~~
lostcolony
Interesting that it's not; I would have expected it to be. Given that, though,
it's spawning tasks in list order, then receiving in list order; if the tasks
complete in the order you spawned them, the first thing in the queue/arriving
is always the item you're receiving on. That's the ideal case; more likely
they'd be nearly in the order you spawned them, in which case you'd only have
a few items to check through before you found the one you're receiving on.

I'm not saying the task scheduler is perfect, but I'd be really, really
weirded out if it gave priority to the final process spawned, and worked its
way backwards, which would be necessary for the degenerate case (that is, we
spawned off items 1,2,3,4 in that order, but they completed 4,3,2,1. I would
expect them to finish in close to 1,2,3,4 order, which would leave it at O(1)
on each receive).

I'd implement a tail recursive map as -

    
    
      map(F, L) -> map(F, L, []).
    
      map(_, [], Acc) -> lists:reverse(Acc);
      map(F, [H|T], Acc) -> map(F, T, [F(H) | Acc]).

------
wz1000
Here is a parallel map in Haskell

    
    
        pmap f xs = map f xs `using` parList rdeepseq
    

The cool thing about this is that `map` is just the regular map function.
However, as Haskell is lazy, `map f xs` returns a thunk and is not immediately
evaluated. `using` allows you to evaluate the resultant thunk using any
"evaluation strategy", and `parList rdeepseq` describes a strategy for
evaluating lists in parallel!

Also, like the Erlang VM, GHC has fast, lightweight "green threads".

Edit: Another way to write it would be

    
    
        pmap _ [] = []
        pmap f (x:xs) = runEval $ do
                             x' <- rpar (f x)
                             xs' <- rpar (pmap f xs)
                             return $ (x':xs')

------
sirclueless
Is this piece of code idiomatic Elixir/Erlang?

    
    
        receive do { ^pid, result } -> result end
    

Without looking up what the "receive" procedure does, it suggests that the
current ("main"?) process is going to block waiting for an incoming message
that matches the pattern. Done in a loop as I presume the Enum.map function
does, it looks like the main process might be blocked waiting for the matching
message while some internal mailbox fills up. In the worst case I would expect
this to have an O(n^2) running time unless there is something fancy going on
when pattern matching against a full mailbox.

Please correct me if I'm wrong. This looks elegant, but I would expect it to
be considerably less performant than, say, the equivalent Go program which
would issue a select on a channel of incoming messages, thus only blocking
until _any_ message is received, rather than a specific message in series.

~~~
johnzim
Not the case here, my friend.

From the horse's mouth:

"Each process has its own input queue for messages it receives. New messages
received are put at the end of the queue. When a process executes a receive,
the first message in the queue is matched against the first pattern in the
receive, if this matches, the message is removed from the queue and the
actions corresponding to the the pattern are executed.

However, if the first pattern does not match, the second pattern is tested, if
this matches the message is removed from the queue and the actions
corresponding to the second pattern are executed. If the second pattern does
not match the third is tried and so on until there are no more pattern to
test. If there are no more patterns to test, the first message is kept in the
queue and we try the second message instead. If this matches any pattern, the
appropriate actions are executed and the second message is removed from the
queue (keeping the first message and any other messages in the queue). If the
second message does not match we try the third message and so on until we
reach the end of the queue. If we reach the end of the queue, the process
blocks (stops execution) and waits until a new message is received and this
procedure is repeated."

The Erlang VM accomplishes this by providing computation credits to each
process, which are spent to perform actions ensuring the complete, non-
blocking, parallel system that guzzles information at a startling rate all
over the globe as we type this :)

~~~
thyrsus
Please correct me, but as best I can tell, the Enum.map will, sequentially,
start the first receive, and that receive will operate against every incoming
message until ^pid matches, adding its result as the first member of the
collection while the unmatched messages accumulate in the mailbox; then it
will do the second receive, which will scan the mailbox for its ^pid, and
either find it or wait until the matching message arrives, adding its result
as the second member of the collection; then it will do the third receive,
etc. Assuming a random distribution of message arrivals, the last message to
arrive will on average be the n/2 message, so for (on average) n/2 messages,
the n/2 length mailbox (full of the (n/2) through n messages) will need to be
scanned n/2 times, e.g., O(n*n). There might be some way to reduce the mailbox
scanning time to O(1), but my shallow dive in the source code didn't show it.

~~~
SEMW
Since the reason for the 'waiting till the one with the right order to come
in' behaviour is the use of ^pid, presumably you could get rid of mailbox
scanning time by getting rid of the caret (matching any message), so the
mailbox never has >1 message.

This means you get results filling up the array in the order they come rather
than the original order, but if order's important, you can always pass around
an index and sort by that after the fact.

------
davidw
Erlang actually has this in the rpc module, in the standard library:

[http://erlang.org/doc/man/rpc.html#pmap-3](http://erlang.org/doc/man/rpc.html#pmap-3)

So you could call that from Elixir if you wanted.

~~~
phamilton
One difference is that the Enum library works on anything that implements
Enumerable. rpc.pmap only works on lists.

------
rvirding
They don't handle errors very well.

------
Aarvay
When choosing to apply parallelism, it should never be about the length of the
collection but only the computational intensity of the function that's being
applied. There's a considerable overhead in spawning processes and collecting
the result from them

~~~
ismaelga
There's a considerable overhead unless they are _lightweight_ processes.

------
psylence519
Seems silly to use the pipe to feed the collection into the first map call.
Otherwise pretty nifty, I’ve been enjoying learning Elixir the last few
months.

------
xfalcox
Any recomendations on Elixir learning resources?

~~~
scandox
Dave Thomas' book is an easy read:
[https://pragprog.com/book/elixir/programming-
elixir](https://pragprog.com/book/elixir/programming-elixir)

------
Ono-Sendai
Looks somewhat ugly and also slow. (each element is processed in a separate
process)

~~~
bismark
These are Erlang VM processes which are actually quite fast.

Re ugly: well, eye of the beholder I suppose.

~~~
Ono-Sendai
Here's a map in my functional language (winter). It's not parallel yet, that
shouldn't affect the syntax though. (map is already manifestly parallel)

def square(float x) float : x*x def main(array<float, 256> a) array<float,
256> : map(square, a)

~~~
nine_k
Now make it efficiently parallel and show us the implementation. Then we could
compare it with the Elixir snippet, which does just that.

~~~
DoggettCK
Not just efficiently parallel, but easily distributable to multiple nodes.

~~~
scandox
Exactly. With a tiny change you can be spawning those processes to any number
of other nodes.

