
MapReduce in Erlang - 13ren
http://weblambdazero.blogspot.com/2008/08/mapreduce-in-erlang.html
======
KirinDave
Erlang is not a new language, but its ability to do share-nothing distributed
computation has only been really useful and topical for the last few years as
computing power and network presence become increasingly commoditized.

If you're a software engineer who works below the UI layer and you don't know
Erlang yet, odds are you're cheating yourself and doing a lot more work than
you have to for the same distributed effects.

What's also very refreshing about learning Erlang is that it is already _very_
mature and reliability is a feature built into the runtime over years of
practical experience. Unlike switching to a language based around convenience
or readability (e.g., Ruby and Python respectively), Erlang's implementation
is so solid that you can give it the kind of trust you give–say–your C++
compiler (which is not absolute, but definitely higher than you'd assign to
the ruby or python interpreters).

If we as software engineers are lucky, Erlang will become as common as C++ is
now over the next 10 years.

~~~
ajross
_Erlang [...] is already very mature and reliability is a feature built into
the runtime over years of practical experience._

According to wikipedia, it didn't support scaling to multiple physical
CPU/cores until 2006. I'm sure it's reliable. I just think you might be
overstating the case here.

~~~
a-priori
Yes, Erlang only got SMP support in 2006. Before then, you had to start one
node per CPU to get a similar effect.

But regardless, I fail to see how this implies it's not a reliable platform.

------
jmtulloss
Some of these facts are misleading. He refers to a Java implementation of
MapReduce developed by Google. I believe that Google's implementation is in
C/C++ and has never been released to the public. The Java implementation is
released by the Hadoop project and is mostly sponsored by Yahoo.

While that doesn't really take away from the article that much, it adds to the
rushed feeling and my impression that this guy is not much of an authority on
the topic.

~~~
globalrev
You are correct.

There is a pdf released about it from Google here:
<http://labs.google.com/papers/mapreduce.html>

------
jlouis
Erlang may be fine, for some things, but notice that _parallel_ computation
isn't one of them yet. It is nice for concurrency, but the interpreter is
pretty slow. Running the pmap example on my machine takes 80 seconds. A
naively implemented Ocaml version takes 7.

This tells you that Erlang currently has a performance-gap of about 10 in
speedup before it begins being viable. The next problem is that not all things
are MapReducible. Note that the current problem is trivially parallelizable.
Hence it gives you some of the _best_ speedup you might expect in a program.
In general, you won't be this lucky.

Erlang is mostly interesting because of its concurrent abilities, not for its
abilities with parallel computation.

~~~
KirinDave
It's mostly interesting because its concurrency, parallelization, and
distribution primitives are _the same thing_ semantically.

Speed is something we can work on, eventually everything can be natively and
efficiently compiled if there is enough interest. What's more important is its
ability to flexibly deploy to a variety of configurations. Your 8-thread
piplelined dataflow in Erlang could easily be moved to a 8-machine EC2 deploy,
with minimal code modifications. That's far more interesting than an order of
magnitude in performance, which could easily be addressed.

------
babo
It's quite a naive implementation, nice to blog about but from practical point
of view it's just noise.

~~~
KirinDave
Sure, but the point is to say, "This small body of code gives you a naive but
fully functional mapreduce." You have a lot of time to give it all the
trimmings while other possible implementations just get off the ground and get
tested.

~~~
babo
The first 40 Fibonacci numbers are calculated on my laptop under 5
milliseconds using the Erlang code below, i.e. more then 2000 times faster
then his measured 10.xxx second on an 8 core server. Erlang has an edge for
sure, but you need to implement that few lines in a proper and efficient way
to prove it and a meaningful test is a must.

    
    
      ffib(0) -> [0];
      ffib(1) -> [0, 1];
      ffib(N) when N > 0 -> ffib([1, 1, 0], 2, N).
    
      ffib(R, N, N) -> lists:reverse(R);
      ffib([H1, H2 | _] = Prev, C, N) ->
        ffib([H1+H2 | Prev], C+1, N).

