
Google's Introduction to Parallel Programming and MapReduce - ahalan
http://code.google.com/edu/parallel/mapreduce-tutorial.html
======
gruseom
So the map worker saves map results to its local disk, and eventually a reduce
worker does an RPC call to copy that data over the network so it can perform
the reduce. What is the advantage of doing this over having the map worker do
the reduce part itself?

~~~
moultano
The mapper only has a small fraction of the data for each reduce key.

~~~
gruseom
Couldn't it at least perform the reduce on the portion it does have? i.e.
transfer partial reductions from M to R, instead of the raw map results?
Presumably there are many problems where this would be a win.

~~~
moultano
Yes that's sometimes an option, but very often the reduce isn't associative.

~~~
gruseom
Ah, so that's the reason. Do you ever have the problem that the reduce
computation, for a given key, is too big for one reduce worker and itself
needs to be parallelized? Or does this boil down to the same thing as my
previous question - i.e. if the reduce computation isn't associative, you
couldn't parallelize it anyway, and if it is then you can just do partial
reductions on M.

~~~
moultano
Yes that is often a problem and is probably my biggest criticism of mapreduce
as a framework. Extreme reduce keys often have to be handled differently and
in a way dependent on the nature of the problem. Mapreduce doesn't provide
great tools for handling it, though I'm not even sure what abstraction you
could build that would be comprehensive enough.

------
oniTony
Concurrency (single CPU context switching) is "easy". Parallel programming
(multiple tasks on multiple CPUs) is _hard_. I'm currently studying the
internals of parallel programming, I am amazed by how much magic MapReduce
abstracts away.

~~~
barrkel
I don't agree with your terminology. Concurrency is having more than one task
in flight; it's inherently non-deterministic when the tasks can interact via
shared resources. Parallelism is having more than one operation happening
simultaneously; it can be a way of implementing concurrency, but it is not
necessarily non-deterministic, depending on how it is exposed in the model.
And indeed, map-reduce with pure functions is a way of using deterministic
parallelism.

Concurrency is a high-level concept; it comes in at the architectural layer.
Servers open for multiple clients, where those clients are asking the server
to operate on shared mutable memory, are non-deterministic; small differences
in timing make all the difference.

Parallelism in an implementation-level concept. Depending on how it is put to
use, it can be merely a way of speeding up deterministic computations; or it
can be directly harnessed to implement concurrency.

~~~
oniTony
Right. Those are stricter definitions.

Although wouldn't Parallelism's multiple operations each be "in-flight"? That
is, it's pretty clear that one can have concurrency without parallelism, but
you seem to suggest that one can have parallelism without concurrency
("deterministic parallelism"). Which doesn't _sound_ right... Even with
MapReduce, the order in which tasks are complete are not deterministic
(different hardware, network latency, etc), so I don't see how you could
determine in which order mappers are passed on to reducers.

~~~
0x12
SIMD is an example of parallelism without concurrency.

~~~
oniTony
my RISC-architectured world has been shattered. Neat example.

------
mark_l_watson
I have found the best resource for map reduce algorithms to be "Data-Intensive
Text Processing with MapReduce" by Chris Dyer and Jimmy Lin - a short and very
useful book that characterizes map reduce problems and their solutions.

------
gruseom
The article says that Fibonacci can't be parallelized. But I seem to recall
that there is a true data-parallel way of doing Fibonacci - one of those
virtuoso tricks where something that seems intrinsically sequential gets
transformed into a parallel computation. Does anyone know what I'm talking
about?

Edit: to be clear, I don't mean the obvious but useless trick where you can
compute F(n-1) and F(n-2) recursively in parallel, which is just redoing most
of the work. I mean a way to model the problem as operations on data-parallel
vectors.

~~~
defen
[1 1; 1 0]^n = [Fn+1 Fn; Fn Fn-1]

So you could rewrite n as a sum of m powers of 2 (just write it in binary),
then for each "1" there, send it to a separate processor to compute [1 1; 1
0]^k using exponentiation by squaring (where k is the corresponding power of
2). Your reduce step multiplies the resulting matrices together.

Note: this is just the first thing that comes to mind, I have no idea if this
is what you were thinking of, or if it's even a good idea.

~~~
SamReidHughes
Since computing [1 1; 1 0]^(2k) where k is a power of 2 would involve
computing [1 1; 1 0]^k first, you'll end up reproducing the exact same work on
all your machines, and you might as well just use the one.

------
NickBomb
Cool, I actually attended a talk about MapReduce hosted by Google a few weeks
ago. Sadly, I have to say that this page explains it better than their
engineer did, though.

------
thy
doodlin' <http://wonderfl.net/c/zzQD>

