

Improving MapReduce with HashFold - sgk284
http://stevekrenzel.com/improving-mapreduce-with-hashfold

======
sgk284
Hey guys, I've been playing with this concept for a little now and thought
this might be a good forum for discussing the idea.

I've got a small prototype that I used to solve a few problems (most notably:
<http://www.facebook.com/careers/puzzles.php?puzzle_id=8>).

Any criticisms are welcome. If you have any recommendations on presenting the
concepts clearer, I'm open to those as well as I'm not sure if I did the
explanation justice.

~~~
rw
> "There is a lot more to this, but I'll stop there."

Go on with your explanation. I'm not grokking it yet.

~~~
sgk284
Does my response to jganetsk help at all?

~~~
sqs
For the record, I found the explanation sufficient to make me understand and
interest me, but I'm not qualified to assess HashFold w.r.t. MapReduce. Keep
writing and keep us (on HN) updated.

------
jganetsk
I'm very excited to see work like this. There's no reason to accept MapReduce
as the best type signature for structuring distributed, parallelizable
algorithms. Most of the justification is that it "feels right", and "just
works".

~~~
sgk284
Thanks jganetsk, words of encouragement are always appreciated.

------
anamax
> So I claimed that HashFold can be more memory efficient than MapReduce. I
> make this claim because MapReduce needs to store all of the key-value pairs
> generated by the mapper, whereas HashFold only needs to store one key-value
> pair at any given time (in addition to the hash-table).

Not so fast. Yes, mapreduce stores all "unapplied" key-value pairs at the
reducer. However, HashFold does as well, the big difference being that
HashFold will start applying pairs as it sees them. While that's a win on
associative functions, it's at best a tie on unassociative functions.

~~~
sgk284
Completely agree. I should have elaborated more on that, thanks for bringing
it up.

In many cases you can have a significantly lower memory profile. In a worst
case it's the same as MapReduce (as you said). I think having the additional
flexibility with memory, in addition to the simpler architecture and
performance attributes, makes HashFold an attractive alternative.

